Datasets used on the paper ``Analyzing the Traits and Anomalies of Political Discussions on Reddit''. This data comprises features from a sample of reply paths, collected from the Politics and WorldNews subreddit. Post text and other details (eg. author) can be retrieved from Reddit, but are not needed to replicate our analyses. Each line consists of a path of direct replies, formatted as a JSON, with the following fields: submission: submission id (can be used to access the original submisison, if still available, through reddit.com/r//comments/) label: harmony, discrepancy, disruption, dispute, or other, according to our definitions path_length: number of posts in the path new_sentiment: compound sentiment value of the text in the news article referred to by the submission path: an iterable list of all posts in the path, where each post contains the fields: timestamp: times and date when comment was originally made post_id: comment id (can be used to access the original comment, if still available, through reddit.com/r//comments//_/) type: x_post or normal, according to our definitions score: upvotes-downvotes, at time of crawl controversiality: 0 or 1, at time of crawl num_replies: number of replies the comment has received sentiment: compound sentiment value of the comment text, given by VADER news_sim: similarity between the textual content of the comment and the textual content of the news article it referenced post_sim: highest similarity between the textual content of the comment and previous comments in the same path If you are using this data, please cite us as: @inproceedings{Guimaraes_ICWSM2019, TITLE = {Analyzing the Traits and Anomalies of Political Discussions on {R}eddit}, AUTHOR = {Guimar{\~a}es, Anna and Balalau, Oana and Terolli, Erisa and Weikum, Gerhard}, BOOKTITLE = {Proceedings of the Thirteenth International Conference on Web and Social Media {ICWSM}, June 11-14, 2019}, PUBLISHER = {{AAAI} Press}, ADDRESS = {Munich, Germany}, YEAR = {2019} }