This repository contains the annotation framework, dataset and code used for the resource paper "TACO -- Twitter Arguments from COnversations". To use the baseline model, please visit Hugging Face.
Table of Contents:
- data
- README.md: A data-specific README for TACO and its annotation process.
- annotation_framework.pdf: The annotation framework for TACO.
- conversations.csv: Having stored the structure of all collected conversations.
- majority_votes.csv: All the majority votes, which serve as the labeled ground truth.
- worker_decisions.csv: All individual expert decisions.
- notebooks
- dataset_statistics.ipynb: For comparing the dataset statistics.
- classifier_cv.ipynb: For training and evaluating the baseline model.
- outputs
- bertweet_cv_predictions.csv: The ground truth and cross-validation results of the baseline model.
Language Sample Total Query-Time Key-Date
Abortion English 486 (26.8%) 29,939 (5.0%) 2021/08/15-10/16 S.B.8 took effet on 2021/09/01.
Brexit English 535 (29.5%) 427,260 (70.9%) 2020/01/01-03/01 Brexit took effect on 2020/02/01.
GoT English 192 (10.6%) 61,705 (10.2%) 2019/04/01-05/01 GOT S8 premiered (HBO-US) on 2019/04/14.
LOTRROP English 209 (11.5%) 14,014 (2.3%) 2022/02/01-03/01 LOTRROP teaser trailer was released on 2022/02/13.
SquidGame English 226 (12.5%) 51,215 (8.5%) 2021/09/10-10/10 Squid Game was released (Netflix worldwide) on 2021/09/17.
TwitterTakeover English 166 (9.1%) 18,531 (3.1%) 2022/04/01-05/01 Elon Musk offers $43 billion to purchase Twitter on 2022/04/14.
Argument No-Argument
865 (49.88%) 869 (50.12%)
Reason Statement Notification None
581 (33.50%) 284 (16.38%) 500 (28.84%) 369 (21.28%)
Reason Statement Notification None
Reason 0.51 0.12 0.31 0.06
Statement 0.38 0.21 0.33 0.08
Notification 0.26 0.08 0.57 0.09
None 0.26 0.08 0.44 0.22
precision recall f1-score support
Reason 0.7369 0.7522 0.7445 581
Statement 0.5437 0.5915 0.5666 284
Notification 0.7902 0.7760 0.7830 500
None 0.8387 0.7751 0.8056 369
accuracy 0.7376 1734
macro avg 0.7274 0.7237 0.7249 1734
weighted avg 0.7423 0.7376 0.7395 1734
precision recall f1-score support
No-Argument 0.8666 0.8297 0.8477 869
Argument 0.8359 0.8717 0.8534 865
accuracy 0.8506 1734
macro avg 0.8513 0.8507 0.8506 1734
weighted avg 0.8513 0.8506 0.8506 1734
Reason Statement Notification None
Average Length 213 122 156 63
URLs 34.6% 8.1% 71.6% 7.6%
external URLs 41.8% 17.4% 49.7% 17.9%
Emojis 11.9% 14.1% 16.0% 35.8%
Hashtags 45.8% 38.7% 60.0% 12.2%
Users 65.9% 68.0% 56.4% 91.3%
Discourse Marker 32.9% 19.0% 11.4% 8.7%
Reason Statement Notification None
Reason 437 76 66 2
Statement 73 168 13 30
Notification 63 26 388 23
None 20 39 24 286
TACO -- Twitter Arguments from Conversations by Marc Feger is licensed under CC BY-NC-SA 4.0
Please contact [email protected] or [email protected].
We thank Aylin Feger, Tillmann Junk, Andreas Burbach, Talha Caliskan, and Aaron Schneider for their contributions to the annotation process in this paper.
@inproceedings{feger-dietze-2024-taco,
title = "{TACO} {--} {T}witter Arguments from {CO}nversations",
author = "Feger, Marc and
Dietze, Stefan",
editor = "Calzolari, Nicoletta and
Kan, Min-Yen and
Hoste, Veronique and
Lenci, Alessandro and
Sakti, Sakriani and
Xue, Nianwen",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
month = may,
year = "2024",
address = "Torino, Italia",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.lrec-main.1349",
pages = "15522--15529"
}