George Panagopoulos

ICWSM in the summer of COVID-19

Every conference I know took place virtually this summer, and ICWSM was no exception. Although I was nervous about presenting in/attending a virtual conference, the organization made a tremendous effort and adopted a very efficient format. Anyone interested in a paper could see the presentation online, familiarize with it, and ask questions during zoom meetings. This gave me time and freedom to understand the papers equally if not better then from up close. I was presenting our work on influence maximization using representations learned from diffusion cascades. The presentation can be found online, as for the rest of the papers.

Quick disclaimer: I will be focusing more on technical papers here, but the conference included numerous exciting talks addressing online racism, fairness and how to provide solutions using computational social science, a subject as timely as ever. One example is the inspiring keynote of Charlton McIlwain who shared the role of technology in oppressing minorities throughout history and its ominous potential.

An Experimental Study of Structural Diversity in Social Networks

The best paper award went to a novel framework to perform experimental studies addressing the structure of the network. The task is rather hard because obviously exact interventions and random assignment to treatment and control groups are not possible. The authors propose to change the social recommendations to the users to shape the diversity of the groups. This allows us to examine interesting questions such as how the diversity of the user’s network is correlated with her total engagement. Paper Presentation

Detecting Troll Behavior via Inverse Reinforcement Learning: A Case Study of Russian Trolls in the 2016 US Election

The paper addresses the problem of identifying trolls in online campaigns. The method uses inverse reinforcement learning to infer the rewards using the actions (retweet, post, mention etc.) of the twitter accounts. The rewards correspond to all possible combinations of state-actions and the classification takes as input these rewards to classify a user as troll or not. Results in the annotated dataset of twitter activity for the 2016 US Election showcase the effectiveness of the method. Paper Presentation

Unsupervised User Stance Detection on Twitter

Stance detection is correlated with sentiment analysis but relies heavily on the reshare patterns between users and the content they reshare to extract the different stances on a subject. This method creates user embeddings based on who and what they retweet. Then they cluster them to find different stances on multiple datasets, hence the method is totally unsupervised and provides a high precision clustering in multiple datasets (labeled manually). Paper Presentation

Characterizing the Social Media News Sphere through User Co-Sharing Practices

In this analysis, the authors clustered news based on how many common people reshare them in Twitter and proceed to label the news based on the subject and the impartiality of the sources. Given the fact-checking of the sources, this analysis can reveal if certain clusters are more prone to reshare and cultivate fake opinions. Paper Presentation

Generating Realistic Interest-Driven Information Cascades

Including the content of a cascade is a challenging problem in many social network applications that rely on spreading. The authors use the topic-aware linear threshold epidemic model to simulate the spreading of items in a network and separate cases where a node adopts an item due to influence, pure interest or an exogenous factor. The model is run first in synthetic graphs controlling for homophily and find out that low cascade size is correlated with high homophily and virality. Finally, comparing with the Digg dataset, the authors find that high homophily and virality gives a more realistic cascade size/ depth ratio. Paper Presentation

Empirical Analysis of Multi-Task Learning for Reducing Identity Bias in Toxic Comment Detection

Toxic comment classification is biased when classifying comments that include identity-based words (such as black or gay) as toxic, because of the context some people use it. The proposed model is a multitask neural network that regresses the toxicity score of the comment and gives probabilities for the identities. The dataset comes from the recent Kaggle jigsaw competition. Paper Presentation

Measuring Edge Sparsity on Large Social Networks

An interesting method using edge sparsity was used to evaluate the coherency of friendship networks. The results showed that interaction networks tend to express close relationships more effectively than regular friendship networks. Paper Presentation

Hierarchical Propagation Networks for Fake News Detection: Investigation and Exploitation

In this paper, the authors construct the propagation network of diffusion cascades and the micro-level propagation network through the replies on the initial post. Subsequently, they extract several features (temporal, structural, linguistic) from each type of network for real and fake news and find which of them have a statistically significant difference between these types. They also experiment with classifiers to predict which tweet is false, underlining that temporal features are the most effective. The dataset is open in github. Paper Presentation

This large scale study that addresses the globalization of music using the economic formula that measures the trade flow between countries. A trade is formed when a song from a country is streamed by a user in another country. It includes also different covariates such as similarity in language and geographical distance. Amongst other findings, the results indicate that these covariates are indeed of great importance in shaping the flow of music between countries. Paper Presentation

Health misinformation is an important problem that has been highlighted recently during the COVID19 pandemic. The authors gathered tweets regarding cancer and analyzed them using a machine learning pipeline, which included a two-headed deep neural network to classify the relevance of the tweets with cancer and an attention bi-lstm to identify the most important keywords. The misinformation was detected by finding cures that are not substantiated medically. The results show that fake tweets used more attractive keywords and real news is more “sad”. Paper Presentation

Gossip and Attend: Context-Sensitive Graph Representation Learning

This paper proposes a method to perform representation learning on graphs where a node’s representation can include the contexts it resides in, apart from its positional information. To define the context, the method first computes node representations based on the neighborhood’s representations, and then an alignment between these representations for each pair of nodes. The experiments showcase superiority compared to baselines in well-known datasets. Paper Presentation

A quantitative approach to understanding online antisemitism

The authors collected a large scale dataset of social media posts and built a weighted graph of words to extract knowledge regarding antisemitic practices. The results indicated that certain communities in 4chan represent mostly ethnic groups. Another analysis on image posts detected instances of antisemitic memes in mainstream memes and hence more mainstream social media. Paper Presentation

Generalized Euclidean Measure to Estimate Network Distances

The paper proposes a new distance measure between networks using a metric similar to heat kernel. The distance is intuitively similar to the squared number of edges between nodes, and it is computed by including the Laplacian matrix in the euclidean distance. Compared to earth mover’s distance and GFT, the proposed metric seems more robust and it is highly correlated with the activation threshold of the linear threshold epidemic model. Paper Presentation

Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repository

ICWSM contains lots of interesting dataset publications with great potential. The one I distinguished was addressing fake news on health issues. The dataset is quite big and divergent. Most importantly, it includes tweets, retweets and follow relationships which allows to examine the diffusion properties of the fake news. Paper Presentation

Conclusion:

Overall, I believe ICWSM 2020 was a very fruitful conference, though virtual. ICWSM generally has many interesting contributions due to its diversity. The organization was excellent and the site was based on the ICLR template which is very user friendly. Having said that, I think physical conferences are equally important and hope to meet in person the authors in the next ICWSM!