Documentation for SWAG contradicts itself when constructing the first sentence.

### System Info

Not relevant.

### Who can help?

@stevhliu @ArthurZucker 

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

The [docs for multiple choice](https://huggingface.co/docs/transformers/tasks/multiple_choice) use SWAG as an example, which is the task of selecting the next sentence given a context. Somewhat strangely, rather than being given in the format `(sentence1, [sentence2a, sentence2b, sentence2c, sentence2d])`, the dataset is given in the format `(sentence1, sentence2_start, [sentence2_endA, sentence2_endB, sentence2_endC, sentence2_endD])`.

The code given in the docs basically turns the dataset into the first format, where **sentence 1 is kept intact** and the start of sentence 2 is concatenated to each ending:
https://github.com/huggingface/transformers/blob/a06a0d12636756352494b99b5b264ac9955bc735/docs/source/en/tasks/multiple_choice.md?plain=1#L96-L100

Yet, the docs say:
https://github.com/huggingface/transformers/blob/a06a0d12636756352494b99b5b264ac9955bc735/docs/source/en/tasks/multiple_choice.md?plain=1#L85-L88

What is being described is formatting the dataset as `(sentence1 + sentence2_start, [sentence2_start + sentence2_endA, sentence2_start + sentence2_endB, sentence2_start + sentence2_endC, sentence2_start + sentence2_endD])`, where **there is overlap between the first and the second sentence** (namely `sentence2_start`).

### Expected behavior

Either the code is wrong or the description is wrong. 

If the description is wrong, it should be:

> The preprocessing function you want to create needs to:
> 1. Make four copies of the sent1 field.
> 2. Combine sent2 with each of the four possible sentence endings.

If the code is wrong, it should be:
```python
    first_sentences = [[f"{s1} {s2_start}"] * 4 for s1,s2_start in zip(examples["sent1"], examples["sent2"])]
    second_sentences = [
        [f"{s2_start} {examples[end][i]}" for end in ending_names] for i, s2_start in enumerate(examples["sent2"])
    ]
```

	... first_sentences = [[context] * 4 for context in examples["sent1"]]
	... question_headers = examples["sent2"]
	... second_sentences = [
	... [f"{header} {examples[end][i]}" for end in ending_names] for i, header in enumerate(question_headers)
	... ]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Documentation for SWAG contradicts itself when constructing the first sentence. #35095

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	The preprocessing function you want to create needs to:

	1. Make four copies of the `sent1` field and combine each of them with `sent2` to recreate how a sentence starts.
	2. Combine `sent2` with each of the four possible sentence endings.

Documentation for SWAG contradicts itself when constructing the first sentence. #35095

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions