This repository covers three main subprojects regarding Reddit communities, their similarity and interactions. This is part of a project for a class of Web Information Retrieval and the main inspiration for these works is this paper from Leskovec et al. from Stanford University.
This project is composed of three different tasks:
- Attackers and Defenders PageRank
- Subreddit Recommendation System
- Cross-linking posts sentiment analysis
We reproduced some results of the paper answering the question on how users taking parts in conflicts interact to each other. The researchers from Stanford proposed the Attackers and Defenders PageRanks.
This score is based on the graph of users replies to comments of members of their community or of the enemy community.
We expanded the work of the original authors about similarities between communities to build a system that suggests the user subreddits related to the ones he has been most active in its recent past. We carried out our experiments using data of users in the interval of a month and we considered the top 5000 subreddits.
A post with a link to some content on a different subreddit may contain words that instigate a conflict on that target subreddit. We analysed the sentiment of the text of such posts and of its top comments trying to classify post with bad intents to neutral posts just sharing contents across communities.
We got on such task an F1 score of 90%
The code for this par of the project can be found in this repository.
Davide Spallaccini | Beatrice Bevilacqua | Anxhelo Xhebraj |
---|---|---|