Approaches
1. Lexicon-based (L)
Using polarity lexicons, classify into one class or the other
2. Binary Classification: Bag-of-words
Build a classifiers using labeled data, where the features are simple bag of words
3. Binary Classification: Bag-of-words + Ngrams
Same as above with addition of bigram features
4. MultiClass Classification (Pos,Neg,Neut): Bag-of-words + Ngrams
Same as above with addition of bigram features and also modeling for Neutral Class along with Positive and Negative
5. DeepLearning (RAE) Classification (Pos,Neg,Neut): Bag-of-words
Using deep learning (Recursive auto encoder) techniques to train classifiers for sentiment
6. Semi-supervised Learning based Classification (Pos,Neg,Neut): Bag-of-words + Ngrams
Can we use semi-supervised learning approaches to enhance either LEXICON , or TRAINING data for the above classifiers
other ideas
- Look at Modeling Neutral class in a better way
- Spelling correct (“swweeeettt” -> sweet)
- New Features
- Handle negation
- Stemming the words for better match with lexicon
- Emoticon and distance from keyword “google”
- What was the social network dynamics of the tweet (who , how many times?)
- Phrasal Lexicons vs. Unigram BOW models (RAE does a bit of that)
- Detect marketing campaign tweets
- Semi-supervised learning to get more labels
- Identify polarity (subjectiveness) in tweets followed by detection of negative vs. positive
- Target dependent twitter sentiment (http://www.aclweb.org/anthology-new/P/P11/P11-1016.pdf) – Is google keyword the central focus of the tweet?
- Joint Topic detection (Aspect) + Sentiment
Readings