📄 Page
1
(This page has no text content)
📄 Page
2
Deep Learning on Graphs Deep learning on graphs has become one of the hottest topics in machine learning. The book consists of four parts to best accommodate the readers with diverse backgrounds and purposes of reading. Part I introduces basic concepts of graphs and deep learning; Part II discusses the most-established methods from the basic to advanced settings; Part III presents the most typical applications including natural language processing, computer vision, data mining, biochemistry, and health care; and Part IV describes advances of methods and applications that tend to be important and promising for future research. The book is self-contained, making it accessible to a broader range of readers including (1) senior undergraduate and graduate students; (2) practitioners and project managers who want to adopt graph neural networks into their products and platforms; and (3) researchers without a computer science background who want to use graph neural networks to advance their disciplines. Yao Ma is a PhD student of the Department of Computer Science and Engineering at Michigan State University (MSU). He is the recipient of the Outstanding Graduate Student Award and FAST Fellowship at MSU. He has published papers in top conferences such as WSDM, ICDM, SDM, WWW, IJCAI, SIGIR, and KDD, which have been cited hundreds of times. He is the leading organizer and presenter of tutorials on GNNs at AAAI’20, KDD’20 and AAAI’21, which received huge attention and wide acclaim. He has served as Program Committee Members/Reviewers in many well-known conferences and magazines such as AAAI, BigData, IJCAI, TWEB, TKDD, and TPAMI. Jiliang Tang is Assistant Professor in the Department of Computer Science and Engineering at Michigan State University. Previously, he was a research scientist in Yahoo Research. He received the 2020 SIGKDD Rising Star Award, 2020 Distinguished Withrow Research Award, 2019 NSF Career Award, the 2019 IJCAI Early Career Invited Talk and 7 best paper (runner-up) awards. He has organized top data science conferences including KDD, WSDM, and SDM, and is an associate editor of the TKDD journal. His research has been published in highly ranked journals and top conferences, and received more than 12,000 citations with h-index 55 and extensive media coverage.
📄 Page
3
(This page has no text content)
📄 Page
4
Deep Learning on Graphs YAO MA Michigan State University JILIANG TANG Michigan State University
📄 Page
5
University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 103 Penang Road, #05–06/07, Visioncrest Commercial, Singapore 238467 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781108831741 DOI: 10.1017/9781108924184 © Yao Ma and Jiliang Tang 2021 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2021 Printed in the United Kingdom by TJ Books Ltd, Padstow Cornwall A catalogue record for this publication is available from the British Library. ISBN 978-1-108-83174-1 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
📄 Page
6
Contents Preface page xiii Acknowledgments xvii 1 Deep Learning on Graphs: An Introduction 1 1.1 Introduction 1 1.2 Why Deep Learning on Graphs? 1 1.3 What Content Is Covered? 3 1.4 Who Should Read This Book? 6 1.5 Feature Learning on Graphs: A Brief History 8 1.5.1 Feature Selection on Graphs 9 1.5.2 Representation Learning on Graphs 10 1.6 Conclusion 13 1.7 Further Reading 13 Part I Foundations 15 2 Foundations of Graphs 17 2.1 Introduction 17 2.2 Graph Representations 18 2.3 Properties and Measures 19 2.3.1 Degree 19 2.3.2 Connectivity 21 2.3.3 Centrality 23 2.4 Spectral Graph Theory 26 2.4.1 Laplacian Matrix 26 2.4.2 The Eigenvalues and Eigenvectors of the Laplacian Matrix 28 2.5 Graph Signal Processing 29 v
📄 Page
7
vi Contents 2.5.1 Graph Fourier Transform 30 2.6 Complex Graphs 33 2.6.1 Heterogeneous Graphs 33 2.6.2 Bipartite Graphs 33 2.6.3 Multidimensional Graphs 34 2.6.4 Signed Graphs 35 2.6.5 Hypergraphs 36 2.6.6 Dynamic Graphs 37 2.7 Computational Tasks on Graphs 39 2.7.1 Node-Focused Tasks 39 2.7.2 Graph-Focused Tasks 41 2.8 Conclusion 42 2.9 Further Reading 42 3 Foundations of Deep Learning 43 3.1 Introduction 43 3.2 Deep Feedforward Networks 44 3.2.1 The Architecture 46 3.2.2 Activation Functions 47 3.2.3 Output Layer and Loss Function 50 3.3 Convolutional Neural Networks 51 3.3.1 The Convolution Operation and Convolutional Layer 52 3.3.2 Convolutional Layers in Practice 56 3.3.3 Nonlinear Activation Layer 57 3.3.4 Pooling Layer 58 3.3.5 An Overall CNN Framework 58 3.4 Recurrent Neural Networks 59 3.4.1 The Architecture of Traditional RNNs 60 3.4.2 Long Short-Term Memory 61 3.4.3 Gated Recurrent Unit 63 3.5 Autoencoders 63 3.5.1 Undercomplete Autoencoders 65 3.5.2 Regularized Autoencoders 66 3.6 Training Deep Neural Networks 67 3.6.1 Training with Gradient Descent 67 3.6.2 Backpropagation 68 3.6.3 Preventing Overfitting 71 3.7 Conclusion 71 3.8 Further Reading 72
📄 Page
8
Contents vii Part II Methods 73 4 Graph Embedding 75 4.1 Introduction 75 4.2 Graph Embedding for Simple Graphs 77 4.2.1 Preserving Node Co-occurrence 77 4.2.2 Preserving Structural Role 86 4.2.3 Preserving Node Status 89 4.2.4 Preserving Community Structure 91 4.3 Graph Embedding on Complex Graphs 94 4.3.1 Heterogeneous Graph Embedding 94 4.3.2 Bipartite Graph Embedding 96 4.3.3 Multidimensional Graph Embedding 97 4.3.4 Signed Graph Embedding 99 4.3.5 Hypergraph Embedding 102 4.3.6 Dynamic Graph Embedding 104 4.4 Conclusion 105 4.5 Further Reading 106 5 Graph Neural Networks 107 5.1 Introduction 107 5.2 The General GNN Frameworks 109 5.2.1 A General Framework for Node-Focused Tasks 109 5.2.2 A General Framework for Graph-Focused Tasks 110 5.3 Graph Filters 112 5.3.1 Spectral-Based Graph Filters 112 5.3.2 Spatial-Based Graph Filters 122 5.4 Graph Pooling 128 5.4.1 Flat Graph Pooling 129 5.4.2 Hierarchical Graph Pooling 130 5.5 Parameter Learning for Graph Neural Networks 135 5.5.1 Parameter Learning for Node Classification 135 5.5.2 Parameter Learning for Graph Classification 136 5.6 Conclusion 136 5.7 Further Reading 137 6 Robust Graph Neural Networks 138 6.1 Introduction 138 6.2 Graph Adversarial Attacks 138 6.2.1 Taxonomy of Graph Adversarial Attacks 139 6.2.2 White-Box Attack 141 6.2.3 Gray-Box Attack 144
📄 Page
9
viii Contents 6.2.4 Black-Box Attack 148 6.3 Graph Adversarial Defenses 151 6.3.1 Graph Adversarial Training 152 6.3.2 Graph Purification 154 6.3.3 Graph Attention 155 6.3.4 Graph Structure Learning 159 6.4 Conclusion 160 6.5 Further Reading 160 7 Scalable Graph Neural Networks 162 7.1 Introduction 162 7.2 Node-wise Sampling Methods 166 7.3 Layer-wise Sampling Methods 168 7.4 Subgraph-wise Sampling Methods 172 7.5 Conclusion 174 7.6 Further Reading 175 8 Graph Neural Networks for Complex Graphs 176 8.1 Introduction 176 8.2 Heterogeneous Graph Neural Networks 176 8.3 Bipartite Graph Neural Networks 178 8.4 Multidimensional Graph Neural Networks 179 8.5 Signed Graph Neural Networks 181 8.6 Hypergraph Neural Networks 184 8.7 Dynamic Graph Neural Networks 185 8.8 Conclusion 187 8.9 Further Reading 187 9 Beyond GNNs: More Deep Models on Graphs 188 9.1 Introduction 188 9.2 Autoencoders on Graphs 189 9.3 Recurrent Neural Networks on Graphs 191 9.4 Variational Autoencoders on Graphs 193 9.4.1 Variational Autoencoders for Node Representation Learning 195 9.4.2 Variational Autoencoders for Graph Generation 196 9.5 Generative Adversarial Networks on Graphs 199 9.5.1 Generative Adversarial Networks for Node Representation Learning 200 9.5.2 Generative Adversarial Networks for Graph Generation 201
📄 Page
10
Contents ix 9.6 Conclusion 203 9.7 Further Reading 203 Part III Applications 205 10 Graph Neural Networks in Natural Language Processing 207 10.1 Introduction 207 10.2 Semantic Role Labeling 208 10.3 Neural Machine Translation 211 10.4 Relation Extraction 211 10.5 Question Answering 213 10.5.1 The Multihop QA Task 213 10.5.2 Entity-GCN 214 10.6 Graph to Sequence Learning 216 10.7 Graph Neural Networks on Knowledge Graphs 218 10.7.1 Graph Filters for Knowledge Graphs 218 10.7.2 Transforming Knowledge Graphs to Simple Graphs 219 10.7.3 Knowledge Graph Completion 220 10.8 Conclusion 221 10.9 Further Reading 221 11 Graph Neural Networks in Computer Vision 222 11.1 Introduction 222 11.2 Visual Question Answering 222 11.2.1 Images as Graphs 224 11.2.2 Images and Questions as Graphs 225 11.3 Skeleton-Based Action Recognition 227 11.4 Image Classification 229 11.4.1 Zero-Shot Image Classification 230 11.4.2 Few-Shot Image Classification 231 11.4.3 Multilabel Image Classification 232 11.5 Point Cloud Learning 233 11.6 Conclusion 234 11.7 Further Reading 235 12 Graph Neural Networks in Data Mining 236 12.1 Introduction 236 12.2 Web Data Mining 236 12.2.1 Social Network Analysis 237 12.2.2 Recommender Systems 240
📄 Page
11
x Contents 12.3 Urban Data Mining 244 12.3.1 Traffic Prediction 244 12.3.2 Air Quality Forecasting 246 12.4 Cybersecurity Data Mining 247 12.4.1 Malicious Account Detection 247 12.4.2 Fake News Detection 249 12.5 Conclusion 250 12.6 Further Reading 251 13 Graph Neural Networks in Biochemistry and Health Care 252 13.1 Introduction 252 13.2 Drug Development and Discovery 252 13.2.1 Molecule Representation Learning 253 13.2.2 Protein Interface Prediction 254 13.2.3 Drug–Target Binding Affinity Prediction 256 13.3 Drug Similarity Integration 258 13.4 Polypharmacy Side Effect Prediction 259 13.5 Disease Prediction 262 13.6 Conclusion 264 13.7 Further Reading 264 Part IV Advances 265 14 Advanced Topics in Graph Neural Networks 267 14.1 Introduction 267 14.2 Deeper Graph Neural Networks 268 14.2.1 Jumping Knowledge 270 14.2.2 DropEdge 270 14.2.3 PairNorm 270 14.3 Exploring Unlabeled Data via Self-Supervised Learning 271 14.3.1 Node-Focused Tasks 271 14.3.2 Graph-Focused Tasks 274 14.4 Expressiveness of Graph Neural Networks 275 14.4.1 Weisfeiler–Lehman Test 276 14.4.2 Expressiveness 278 14.5 Conclusion 279 14.6 Further Reading 279 15 Advanced Applications in Graph Neural Networks 281 15.1 Introduction 281 15.2 Combinatorial Optimization on Graphs 281
📄 Page
12
Contents xi 15.3 Learning Program Representations 283 15.4 Reasoning Interacting Dynamical Systems in Physics 285 15.5 Conclusion 286 15.6 Further Reading 286 Bibliography 289 Index 315
📄 Page
13
(This page has no text content)
📄 Page
14
Preface Graphs have been leveraged to denote data from various domains ranging from social science and linguistics to chemistry, biology, and physics. Meanwhile, numerous real-world applications can be treated as computational tasks on graphs. For examples, air quality forecasting can be regarded as a node classification task, friends, recommendations in social networks can be solved as a link prediction task, and protein interface prediction can be regarded as a graph classification task. To better take advantage of modern machine learning models for these computational tasks, effectively representing graphs plays a key role. There are two major ways to extract features to represent graphs: feature engineering and representation learning. Feature engineering relies on hand-engineered features, which is time consuming and often not optimal for given downstream tasks. Representation learning is to learn features automatically, which requires minimal human effort and is adaptive to given downstream tasks. Thus, representation learning on graphs has been extensively studied. The field of graph representation learning has been greatly developed over the past decades and can be roughly divided into three generations: traditional graph embedding, modern graph embedding, and deep learning on graphs. As the first generation of graph representation learning, traditional graph embed- ding has been investigated under the context of classic dimension reduction techniques on graphs such as IsoMap, LLE, and eigenmap. Word2vec is to learn word representations from a large corpus of text, and the generated word representations have advanced many natural language processing tasks. The successful extensions of word2vec to the graph domain have started the second generation of representation learning on graphs, that is, modern graph embedding. Given the huge success of deep learning techniques in representation learning in the domains of images and text, efforts have been xiii
📄 Page
15
xiv Preface made to generalize them to graphs, which have opened a new chapter of graph representation learning, that is, deep learning on graphs. More and more evidence has demonstrated that the third generation of graph representation learning, especially graph neural networks (GNNs), has tremen- dously facilitated computational tasks on graphs including both node-focused and graph-focused tasks. The revolutionary advances brought by GNNs have also immensely contributed to the depth and breadth of the adoption of graph representation learning in real-world applications. For the classical application domains of graph representation learning, such as recommender systems and social network analysis, GNNs result in state-of-the-art performance and bring them into new frontiers. Meanwhile, new application domains of GNNs have been continuously emerging, such as combinational optimization, physics, and health care. These wide applications of GNNs enable diverse contributions and perspectives from disparate disciplines and make this research field truly interdisciplinary. Graph representation learning is a rapidly growing field. It has attracted significant amounts of attention from different domains and consequently accumulated a large body of literature. Thus, it is a propitious time to systematically survey and summarize this field. This book is our diligent attempt to achieve this goal by taking advantage of our many years of teaching and research experience in this field. In particular, we aim to help researchers acquire essential knowledge of graph representation learning and its wide range of applications and understand its advances and new frontiers. An Overview of the Book. This book provides a comprehensive intro- duction to graph representation learning with a focus on deep learning on graphs, especially GNNs. It consists of four parts: Foundations, Methods, Applications, and Advances. The Foundations part introduces the necessary background and basic concepts of graphs and deep learning. Topics covered by the Methods part include modern graph embedding, GNNs for both simple and complex graphs, the robustness and scalability issues of GNNs, and deep graph models beyond GNNs. Each of these topics is covered by a chapter with fundamental concepts and technical details on representative algorithms. The Applications part presents GNN applications in the most typical domains, including natural language processing, computer vision, data mining, biochemistry, and health care. One chapter is used to cover the most representative subfields advanced by GNNs for each domain. New emerging methods and application domains are discussed by the Advances part. For each chapter, further reading is included at the end for readers who are interested in more advanced topics and recent trends.
📄 Page
16
Preface xv Target Audience. This book is designed to be as self-contained as possible, though the basic background of graph theory, calculus, linear algebra, prob- ability, and statistics can help better understand its technical details. Thus, it is suitable for a wide range of readers with diverse backgrounds and different purposes of reading. This book can serve as both a learning tool and a reference for students at the senior undergraduate or graduate levels who want to obtain a comprehensive understanding of this research field. Researchers who wish to pursue this research field can consider this book as a starting point. Project managers and practitioners can learn from GNN applications in the book on how to adopt GNNs in their products and platforms. Researchers outside of computer science can find an extensive set of examples in this book on how to apply GNNs to different disciplines.
📄 Page
17
(This page has no text content)
📄 Page
18
Acknowledgments We start by acknowledging our families to whom this book is dedicated. This book would not have been possible without their selfless support and encouragement. Graph representation learning has grown tremendously in the past decade from traditional graph embedding to modern graph embedding and graph neu- ral networks. We have had the fortune to witness these three generations. This evolution is impossible without pioneering research performed by numerous researchers. Meanwhile, increasing efforts have been made to take advantage of graph representation learning to advance applications from many domains. Graph representation learning has become a truly interdisciplinary research field. We would like to acknowledge all people contributing to this field. Their efforts not only enable us to have a book on this field but also make it one of the most popular topics in machine learning. The idea of writing this book is strongly inspired by Huan Liu (Arizona State University). He has been working on feature selection for decades. We began the journey on graph representation learning by his ingenious suggestion on feature selection on graphs in 2010. We would like to thank Charu Aggarwal (IBM Research) and Shiyu Chang (University of California, Santa Barbara) with whom we started the research of modern graph embedding in 2014. We have worked in this field since 2010 and we got immense support from our collaborators. In particular, we would like to express our tremendous gratitude to Salem Alelyani (King Khalid University), Yi Chang (Jilin University), Ken Frank (Michigan State University), Huiji Gao (LinkedIn), Xia Hu (Texas A&M University), Anil Jain (Michigan State University), Shuiwang Ji (Texas A&M University), Jundong Li (University of Virginia), Zitao Liu (TAL Education Group), Sinem Mollaoglu (Michigan State University), Shin-Han Shiu (Michi- gan State University), Kai Shu (Illinois Institute of Technology), Pang-Ning Tan (Michigan State University), Lei Tang (Lyft), Guanhua Tu (Michigan xvii
📄 Page
19
xviii Acknowledgments State University), Suhang Wang (Pennsylvania State University), Lingfei Wu (JD.com), Yuying Xie (Michigan State University), Ming Yan(Michigan State University), Dawei Yin (Baidu, Inc.), Mi Zhang (Michigan State University) and Jiayu Zhou (Michigan State University). We would like to thank the current and former members of the Data Science and Engineering Laboratory at Michigan State University. They include Meznah Almutairy, Norah Alfadhli, Aaron Brookhouse, Jamell Dacon, Daniel K.O-Dankwa, Tyler Derr, Jiayuan Ding, Wenqi Fan, Haoyu Han, Jiangtao Huang, Hamid Karimi, Wei Jin, Juanhui Li, Yaxin Li, Haochen Liu, Hua Liu, Xiaorui Liu, Jie Ren, Namratha Shah, Harry Shomer, Yuxuan Wan, Wentao Wang, Yiqi Wang, Xochitl Weiss, Hongzhi Wen, Xiaoyang Wang, Xin Wang, Zhiwei Wang, Han Xu, and Xiangyu Zhao. They provided invaluable comments and feedback for the early drafts of this book. We would like to thank Xavier Sumba (Heyday.ai), Shunshi Hu (Hunan Normal University), Kannan Presanna Kumar (SAP), and Chen Chen (Ningxia Normal University) for their suggestions for the preprint of the book. The computer science and engineering department at Michigan State Uni- versity provided us a fantastic atmosphere for this book. Our research on graph representation learning has been, in part, supported by National Science Foundation, Army Research Office, NEC Labs America, Snap Inc., The Ford Motor Company, JD.com, Criteo Labs, and TAL Education Group. In particular, Hector Munoz-Avila, Zhengchang Chen, Wei Ding, Purush S. Iyer, Balakrishnan Prabhakaran, Neil Shah, Finbarr Sloane, Maria Zemankova, and Aidong Zhang have been supportive of our research in graph representation learning. It was a pleasure working with Cambridge University Press. We would like to acknowledge Lauren Cowles, a senior editor of Mathematics and Computer Sciences at Cambridge, for her support, and the helpful staff at Cambridge, Amy He and Mark Fox, as well as Harshavardhanan Udhayakumar and his colleagues at SPi Global for their efforts on the production of the book.
📄 Page
20
1 Deep Learning on Graphs An Introduction 1.1 Introduction We start the chapter by answering a few questions about the book. First, we discuss why we should pay attention to deep learning on graphs. In particular, why do we represent real-world data as graphs, why do we want to bridge deep learning with graphs, and what are the challenges for deep learning on graphs? Second, we introduce the content that will be covered by this book, specifically, which topics we will discuss and how we organize these topics. Third, we provide guidance about who should read this book; in particular, who our target audience is and how to read this book for readers with different backgrounds and purposes for reading. To help better understand deep learning on graphs, we briefly review the history under the more general context of feature learning on graphs. 1.2 Why Deep Learning on Graphs? Because data from real-world applications have very diverse forms, from matrix and tensor to sequence and time series, a natural question that arises is why we attempt to represent data as graphs. There are two main motivations. First, graphs provide a universal representation of data, as shown in Figure 1.1. Data from many systems across various areas can be explicitly denoted as graphs such as social networks, transportation networks, protein–protein interaction networks, knowledge graphs, and brain networks. In addition as indicated by Figure 1.1, numerous other types of data can be transformed into the form of graphs. Second, a huge number of real-world problems can be addressed as a small set of computational tasks on graphs. For example, infer- ring node attributes, detecting anomalous nodes (e.g., spammers or terrorists), 1