-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Description
System Info
Not relevant.
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
The docs for multiple choice use SWAG as an example, which is the task of selecting the next sentence given a context. Somewhat strangely, rather than being given in the format (sentence1, [sentence2a, sentence2b, sentence2c, sentence2d]), the dataset is given in the format (sentence1, sentence2_start, [sentence2_endA, sentence2_endB, sentence2_endC, sentence2_endD]).
The code given in the docs basically turns the dataset into the first format, where sentence 1 is kept intact and the start of sentence 2 is concatenated to each ending:
transformers/docs/source/en/tasks/multiple_choice.md
Lines 96 to 100 in a06a0d1
| ... first_sentences = [[context] * 4 for context in examples["sent1"]] | |
| ... question_headers = examples["sent2"] | |
| ... second_sentences = [ | |
| ... [f"{header} {examples[end][i]}" for end in ending_names] for i, header in enumerate(question_headers) | |
| ... ] |
Yet, the docs say:
transformers/docs/source/en/tasks/multiple_choice.md
Lines 85 to 88 in a06a0d1
| The preprocessing function you want to create needs to: | |
| 1. Make four copies of the `sent1` field and combine each of them with `sent2` to recreate how a sentence starts. | |
| 2. Combine `sent2` with each of the four possible sentence endings. |
What is being described is formatting the dataset as (sentence1 + sentence2_start, [sentence2_start + sentence2_endA, sentence2_start + sentence2_endB, sentence2_start + sentence2_endC, sentence2_start + sentence2_endD]), where there is overlap between the first and the second sentence (namely sentence2_start).
Expected behavior
Either the code is wrong or the description is wrong.
If the description is wrong, it should be:
The preprocessing function you want to create needs to:
- Make four copies of the sent1 field.
- Combine sent2 with each of the four possible sentence endings.
If the code is wrong, it should be:
first_sentences = [[f"{s1} {s2_start}"] * 4 for s1,s2_start in zip(examples["sent1"], examples["sent2"])]
second_sentences = [
[f"{s2_start} {examples[end][i]}" for end in ending_names] for i, s2_start in enumerate(examples["sent2"])
]