George Panagopoulos

The first virtual KDD

KDD taking place virtually gave me the chance to attend a conference I was longing to attend the past years. The conference was very well organized, with three online platforms (whova, vfair and crossminds) to attend the talks, chat with participants and view the videos. Moreover, I was happy that I got a registration award and I had a couple of workshop papers accepted. I had a paper on “Graph Neural Networks with Extreme Nodes Discrimination” (paper, video) at DLG and “Performance in the Courtroom” (paper at NLLP). Similar to the ICWSM2020 post there were some amazing talks regarding diversity and inclusion in data mining, from keynotes on women inclusion, to papers regarding overall fairness, which I will not address here but you can check in the conference proceedings in crossminds.

Tutorials

Learning with Small Data

Lots of amazing tutorials were going on, but since I had to choose one, I attended the one closer to my recent research objectives, Learning with Small Data, which included meta, transfer and multi-task learning. The presentation started with an introduction on examples of auxillary tasks that can assist few-shot learning of tasks with limited training data. Starting with transfer learning, the presenters explained how to use the labels of a source classifier to refine the initial representations of a target classifier. In unsupervised setting, one can use MMD (maximum mean discrepancy) to minimize the difference of the representations derived from the source and target input data to improve the latter representations. Subsequently, they distinguished the different types of multi-task learning depending on the parameter sharing. Hard parameter sharing adhers to sharing the same initial layers of a neural network and leaving the final layers to be task specific. Soft sharing means using different initial layers but with an additional constraint to make them similar. The biggst part of the tutorial was devoted to meta-learning, and especially gradient based meta-learning such as MAML. In MAML (Model-Agnostic Meta-Learning), a joint parameter is learnt from the source datasets through sequential one-epoch updates, and is used as initialization for the train and test set of the target dataset. In contrast, in gradient preconditioning, you learn a transformation of the gradient itself from the source domain, in order to change the parameters through gradient steps at that space in the target task. In metric-based meta-learning we compare the samples in the target set with the ones in the source set to map them to the right class. For example in ProtoNet, we compute the mean value of the representations of each class in the source set, and map to that class the closest representations from the target set. Subsequently, the authors underlined several challenges in meta-learning, such as heterogeneity in the task’s dstributions and overfitting in the task generalization. The rest of the talk focused on several real-world applications and how to incorporate prior domain knowledge in the task at hand. The presentation can be found here. It should be noted that it overlaps with another meta learning lecture I had the luck to attend earlier this summer, in MLSS.

Industry tutorials

  • A lecture-style tutorial with live coding on “Deep Learning for Search and Recommender Systems” from Linkedin. That included a retrieval system based on document and user embeddings, personalization, recommendation and query completion.
  • An overview for “Advances in Recommender Systems” from Spotify.
  • The “Deep Graph Learning” which included a presentation by Tyler Derr on informative talk on self-supervision for Graph Neural Networks.
  • A presentation of Amazon’s deep graph library for scalable graph neural networks.
  • A tutorial on “Building recommender systems with PyTorch” from Facebook.
  • A presentation on online user engagement.

Workshops

I was quite lucky KDD was virtual this year, because I was able to interchange the presentations between DLG x MLG and Epidamik workshops. Epidamik workshop has been around for 3 years and includes research on data mining for epidemiology.
I attended the keynote of Prof. Milind Tambe, which included influence maximization in an unknown network for HIV prevention and algorithms for robust IM as well as a strategy for intervention scheduling with scarce data using restless bandits. Other very interesting talks included “Machine-Learned Epidemiology” by Adam Sadilek and a talk on Graph Neural Networks to predict COVID-19 cases by Google, which is very close to our recent work.
The posters from Epidamik can be found online.

DLG and MLG got fused for this year and included numerous interesting talks:

  • Graph Structure of Neural Networks: Good Neural Networks Are Alike from Jure Leskovec.
  • The Power of Summarization in Network Analysis from Danai Koutra video.
  • Self-supervised Learning on Graphs: Deep Insights and New Directions from Tyler Derr.
  • Learning Attribute-Structure Co-Evolutions in Dynamic Graphs paper
  • Understanding and Evaluating Structural Node Embeddings video
  • Heterogeneous Threshold Estimation for Linear Threshold Modeling video
  • Karate Club and Little Ball of Fur
  • Mining Persistent Activity in Continually Evolving Networks video.

MLG posters can be found online

Apart from these two, there were lots of other cool workshops such as ”Humanitarian mapping”, ”Knowledge Graphs and E-Commerce”, “Machine Learning in Finance” and “AdKDD“.

Special Presentations

Some presentations I distinguished from Deep Learning Day, Graph Mining and Reinforcement learning was Yizhou sun’s, Will Hamilton’s and Csaba Szepesvari’s respectively.

Regarding applications to COVID-19, I found the best paper presentation from the health day “Data-driven Simulation and Optimization for COVID-19 Exit Strategies” really cool. They used an LSTM to predict the R number using demographics and mobility from Google. R is then used for a simulation based on compartmental model. I attended the KDD Research Panel on “Fighting a Pandemic: Convergence of Expertise, data science” as well as the Keynote from Allessandro Vespignani on “Computational Epidemiology at the time of COVID-19”. Among others, both underlined the problems of gathering the right data to detect/predict COVID-19 from social media posts, google search queries etc. In addition, Prof. Vespignani highlighted the pitfalls of using the reported number of cases/deaths, as the number of COVID-19 tests and the treatment of cases in danger become different throughout the course of the pandemic.

Papers:

From Facebook research:

  • Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction paper

  • TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook paper

  • Embedding based retrieval in in FB search paper

From Amazon reserch:

  • Temporal-contextual recommendation in real-time paper

  • MultiImport: inferring node importance in a knowledge graph from multiple input signals paper

From Baidu research:

  • ConSTGAT: Contextual Spatial-Temporal Graph Attention Network for Travel Time Estimation at Baidu Maps

  • Geodemographic Influence Maximization

Individual papers:

  • Neural Subgraph Isomorphism Counting

  • Dynamic Knowledge Graph based Multi-Event Forecasting

  • Connecting the Dots: Multivariate Time Series Forecasting with GNNs

  • GPT-GNN: Generative Pre-Training of Graph Neural Networks

  • Meta-learning on Heterogeneous Information Networks for Cold-star

  • Task-Adaptive Graph Meta-learning

  • Combinatorial Black-Box Optimization with Expert Advice

  • From Online to Non-i.i.d. Batch Learning

  • Mining Persistent Activity in Continually Evolving Networks

  • Interactive Path Reasoning on Graph for Conversational Recommendation

  • Neural Dynamics on Complex Networks

  • TinyGNN: Learning Efficient Graph Neural Networks

  • How to count triangles, without seeing the whole graph

  • In Search for a Cure: Recommendation With Knowledge Graph on CORD-19

  • Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades

  • Scaling Graph Neural Networks with Approximate PageRank