Contrastive Learning in NLP

In this post, I would like to introduce a tutorial in NAACL 2022 named Contrastive Data and Learning for Natural Language Processing. The tutorial introduce some recent works in NLP using contrastive learning techniques. Fore more details, I’d recommend you refer to the tutorials website and the paper list of contrastive learning. In addition, I also recommend the readers to read this survey paper.

What is Contrastive Learning?

Contrastive learning is one such technique to learn an embedding space such that similar data sample pairs have close representations while dissimilar samples stay far apart from each other. While it has originally enabled the success for vision tasks, recent years have seen a growing number of publications in contrastive NLP.

The first NLP paper (Smith and Eisner, 2005) introducing ‘contrastive estimation’ as an unsupervised training objective for log-linear models. And the most sucessful example of contrastive learning for NLP is word2vec (Mikolov et al., 2013) for word embeddings.

Foundations of Contrastive Learning

Basiclly, there are two elements of contrastive learning: Contrastive Learning = Contrastive Data Creation + Contrastive Objective Optimization

1. Contrastive Learning Objectives

There are different contrastive learning objectives:

Contrastive Loss (Chopra et al., 2005)
- minimizes the embedding distance when they are from the same class and maximizes the embedding distance when they are from the different class;
Triplet Loss (Schroff et al., 2015)
- push the distance between positive and anchor + margin to be smaller than the distance between negative and anchor;
Lifted Structured Loss (Oh song et al., 2016)
- Take into account all pairwise edges within the batch
N-pair Loss (Sohn, 2016)
- compare with Triplet loss, N-pair loss extended to N-1 negative examples; it is similar to multi-class classification; the total loss = inner product similarity + softmax loss
Noise Contrastive Estimation (NCE) (Guttmann and Hyvärinen, 2010)
- use logistic regression with cross-entropy loss to differentiate positive samples (i.e., target distribution) and negative samples (i.e., noise distribution)
InfoNCE (van den Oord et al., 2018)
- use softmax loss to differentiate a positive sample from a set of noise examples
Soft-Nearest Neighbors Loss (Salakhutdinov and Hinton, 2007 and Frosst et al., 2019)
- Extend to different numbers of positive (M) and negative examples (N)

The summaries of differnet contrastive learning objectives is as follows:

2. Data Sampling and Augmentation Strategies

Self-Supervised Contrastive Learning
- Data Augmentation
  - Text Space
    - Lexical Editing (token-level)
    - Back-Translation (sentence-level)
  - Embedding Space
    - Dropout
    - Cutoff
    - Mixup
- Sampling Bias
  - Debased Contrastive Learning
    - Assume a prior probability between positive and negative, then approximate the distribution of negative examples to debias the loss.
- Hard Negative Mining
  - importance sampling
    - if this negative sample is close to the anchor sample, then up-weight its probability of being selected
  - Adversarial Examles
    - create adversarial examples that are positive but conduses the model
- Large Batch Size
Supervised Contrastive Learning
- SimCSE (Gao et al., 2021)
- CLIP (Radford et al., 2021)

3. Analysis of Contrastive Learning

Geometric Interpretation
- when the class label is used, then supervised contrastive learning will converge to class collapse to a regular simplex.
Connection to Mutual Information
Theoretical Analasis
Robutness and Security

Contrastive Learning for NLP

Contrastive learning has shown success in many NLP tasks.

For the paper in different fileds, please refer to the paper link in the original tutorials paper.

BibTeX Reference

@inproceedings{zhang-etal-2022-contrastive-data,
    title = "Contrastive Data and Learning for Natural Language Processing",
    author = "Zhang, Rui  and
      Ji, Yangfeng  and
      Zhang, Yue  and
      Passonneau, Rebecca J.",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-tutorials.6",
    doi = "10.18653/v1/2022.naacl-tutorials.6",
    pages = "39--47",
}