- two paralle sentence representation learning modules:
a) att-lstm b) tfidf prior vae - full-connected GAT
IMDB, YELP, Guardian News
- soft metrics: Completeness and Sufficiency measuring the prediction changes after removing the important words.
we create two folds for the text after removing the important words as cset, and another fold for extracted important words as sset. Calculation found in here - hard metrics/aggrement with human annotated rationales.
Input 1: annotated text spans Input 2: identified important words by model. Details found in Here These metrics include partial span-level match and token-level match.
We also conduct human evaluation for the model generated interpretations and they are displayed in seperate pdf (generated by latex) for each document. It contains explains at the word level, where the label-dependent words extracted by Context Representation Learning are highlighted in yellow, while the topic-associated words identified by Topic Representation Learning are highlighted in blue; alsp at the sentence level . So the python code for generate attention-highlight text latex can be as a good template. Code is vis_sine.py