Reddit is an American social news aggregation, and discussion website. Registered members submit content to the site such as links, text posts, and images, which are then voted up or down by other members. Posts are organized by subject into boards called "subreddits", which cover a variety of topics including news, science, movies, video games, music, books, fitness, food, and image-sharing. Registering an account with Reddit is free and does not require an email address to complete. Since Reddit is an open platform and anyone is free to post anything, there is no (or very limited) censorship. Due to this, we have come across many offensive posts which are filled negative comments.
Racism and hate speech can cause a lot of damage to both individuals and communities. A study of 800 Australian secondary school students discovered that racism had huge mental health impacts on students who experience it. We have built a plugin to battle this hate speech on reddit.
Collection of data using Reddit API and then cleaning the data. The data was collected from subreddits such as r/ImGoingToHellForThis/ , r/Incels/ etc.
The text from the posts and their comments are extracted which becomes the training part of our machine learning model. Each comment was annotated by 3 members. If the comment was considered offensive it was marked as 1 else 0.
A machine learning model based on SVM was trained using the collected dataset (Tfidvectorizer was used instead of feature set).
The model was tested on the posts of few subreddits like :/r/gaming, /r/Iamgoingtohellforthis, /r/aww, /r/MadeMeSmile.
Top 200 comments from a post are scanned, and if the number of offensive comments crosses a certain threshold the post will be labelled as offensive.
Final Plugin and poster presentation:
![]() |
How is Reddit structured? |
Racism and hate speech can cause a lot of damage to both individuals and communities. A study of 800 Australian secondary school students discovered that racism had huge mental health impacts on students who experience it. We have built a plugin to battle this hate speech on reddit.
Methodology:
Final Plugin and poster presentation:
- We used a django server to communicate with the plugin and the python machine learning model.
- Whenever a user opens a post, a HTTP POST request is sent to the server. The score is calculated and returned as a response (A nudge for the user).
![]() |
response to a non-offensive post |
![]() |
response to an offensive post
Some pics from the poster presentation
|
Group Members:
Ashutosh Batabyal (Group Leader)
Shreya Sharma
Abhishek Chauhan
Shivani Raina
Aarushi Arya
Sarthak Jindal
References:
https://en.wikipedia.org/wiki/Reddit
https://www.reddit.com/
Comments
Post a Comment