Graph Neural Networks in Action (Keita Broadwater, Namid Stillman) (Z-Library)

M A N N I N G Keita Broadwater Namid Stillman Foreword by Matthias Fey

Mental model of the GNN project Graph Representations (ch. 2) Preprocessed Data (ch. 8) Scaling Training for Large Data (ch. 7) Training Loop (ch. 3–7) Node Embeddings (ch. 2) Trained Model Untrained Model Structural Data Sources (ch. 8) Node Features Edge Features The steps involved for a GNN project are similar to many conventional machine learning pipelines, but we need to use graph-specific tools to create them. We start with raw data, which is then transformed into a graph data model and that can be stored in a graph database or used in a graph processing system. From the graph processing system (and some graph databases), we can do exploratory data analysis and visualization Finally, for the graph machine learning, we preprocess the data into a format that can be submitted for training and then train our graph machine learning model. In our examples, these will be GNNs.

Graph Neural Networks in Action KEITA BROADWATER NAMID STILLMAN FOREWORD BY MATTHIAS FEY MANN I NG SHELTER ISLAND

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2025 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. The authors and publisher have made every effort to ensure that the information in this book was correct at press time. The authors and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Manning Publications Co. Development editor: Frances Lefkowitz 20 Baldwin Road Technical development editor: Frances Buontempo PO Box 761 Review editors: Radmila Ercegovac and Shelter Island, NY 11964 Aleksandar Dragosavljević Production editor: Keri Hales Copy editor: Julie McNamee Proofreader: Jason Everett Technical proofreader: Kerry Koitzsch Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781617299056 Printed in the United States of America

This book is dedicated to my son, Akin. —Keita Broadwater This book is dedicated to my wife, for her patience as I worked through the night, and to my dog, who kept me company during those same hours. —Namid Stillman

(This page has no text content)

v brief contents PART 1 FIRST STEPS ..................................................................1 1 ■ Discovering graph neural networks 3 2 ■ Graph embeddings 30 PART 2 GRAPH NEURAL NETWORKS..........................................69 3 ■ Graph convolutional networks and GraphSAGE 71 4 ■ Graph attention networks 123 5 ■ Graph autoencoders 159 PART 3 ADVANCED TOPICS ....................................................197 6 ■ Dynamic graphs: Spatiotemporal GNNs 199 7 ■ Learning and inference at scale 244 8 ■ Considerations for GNN projects 284 A ■ Discovering graphs 321 B ■ Installing and configuring PyTorch Geometric 350

contents foreword xi preface xiii acknowledgments xiv about this book xvi about the authors xix about the cover illustration xx PART 1 FIRST STEPS ........................................................1 1 Discovering graph neural networks 3 1.1 Goals of this book 5 Catching up on graph fundamentals 6 1.2 Graph-based learning 6 What are graphs? 6 ■ Different types of graphs 8 ■ Graph-based learning 13 ■ What is a GNN? 13 ■ Differences between tabular and graph data 15 1.3 GNN applications: Case studies 17 Recommendation engines 17 ■ Drug discovery and molecular science 18 ■ Mechanical reasoning 20 1.4 When to use a GNN? 20 Implicit relationships and interdependencies 21 ■ High dimensionality and sparsity 22 ■ Complex, nonlocal interactions 23vi

CONTENTS vii1.5 Understanding how GNNs operate 23 Mental model for training a GNN 24 ■ Unique mechanisms of a GNN model 25 ■ Message passing 26 2 Graph embeddings 30 2.1 Creating embeddings with Node2Vec 32 Loading data, setting parameters, and creating embeddings 35 Demystifying embeddings 36 ■ Transforming and visualizing the embeddings 38 ■ Beyond visualization: Applications and considerations of N2V embeddings 41 2.2 Creating embeddings with a GNN 43 Constructing the embeddings 43 ■ GNN vs. N2V embeddings 48 2.3 Using node embeddings 49 Data preprocessing 50 ■ Random forest classification 52 Embeddings in an end-to-end model 53 2.4 Under the Hood 58 Representations and embeddings 58 ■ Transductive and inductive methods 60 ■ N2V: Random walks across graphs 62 Message passing as deep learning 63 PART 2 GRAPH NEURAL NETWORKS................................69 3 Graph convolutional networks and GraphSAGE 71 3.1 Predicting consumer product categories 73 Loading and processing the data 74 ■ Creating our model classes 75 ■ Model training 78 ■ Model performance analysis 79 ■ Our first product bundle 83 3.2 Aggregation methods 86 Neighborhood aggregation 86 ■ Advanced aggregation tools 89 Practical considerations in applying aggregation 93 3.3 Further optimizations and refinements 94 Dropout 95 ■ Model depth 95 ■ Improving the baseline model’s performance 98 ■ Revisiting the Marcelina product bundle 101 3.4 Under the hood 102 Convolution methods 103 ■ Message passing 107 ■ GCN aggregation function 110 ■ GCN in PyTorch Geometric 112 Spectral vs. spatial convolution 115 ■ GraphSAGE aggregation function 115 ■ GraphSAGE in PyTorch Geometric 117 3.5 Amazon Products dataset 118

CONTENTSviii4 Graph attention networks 123 4.1 Detecting spam and fraudulent reviews 124 4.2 Exploring the review spam dataset 126 Explaining the node features 127 ■ Exploratory data analysis 129 ■ Exploring the graph structure 130 Exploring the node features 133 4.3 Training baseline models 136 Non-GNN baselines 136 ■ GCN baseline 141 4.4 Training GAT models 144 Neighborhood loader and GAT models 145 ■ Addressing class imbalance in model performance 149 ■ Deciding between GAT and XGBoost 152 4.5 Under the hood 153 Explaining attention and GAT models 153 ■ Over- smoothing 155 ■ Overview of key GAT equations 156 5 Graph autoencoders 159 5.1 Generative models: Learning how to generate 161 Generative and discriminative models 161 ■ Synthetic data 162 5.2 Graph autoencoders for link prediction 163 Review of the Amazon Products dataset from chapter 3 164 Defining a graph autoencoder 167 ■ Training a graph autoencoder to perform link prediction 169 5.3 Variational graph autoencoders 173 Building a variational graph autoencoder 174 ■ When to use a variational graph autoencoder 177 5.4 Generating graphs using GNNs 177 Molecular graphs 178 ■ Identifying new drug candidates 180 VGAEs for generating graphs 182 ■ Generating molecules using a GNN 187 5.5 Under the hood 190 Understanding link prediction tasks 190 ■ The inner product decoder 191

CONTENTS ixPART 3 ADVANCED TOPICS...........................................197 6 Dynamic graphs: Spatiotemporal GNNs 199 6.1 Temporal models: Relations through time 201 6.2 Problem definition: Pose estimation 202 Setting up the problem 203 ■ Building models with memory 206 6.3 Dynamic graph neural networks 211 Graph attention network for dynamic graphs 211 6.4 Neural relational inference 218 Encoding pose data 222 ■ Decoding pose data using a GRU 228 Training the NRI model 233 6.5 Under the hood 237 Recurrent neural networks 237 ■ Temporal adjacency matrices 240 ■ Combining autoencoders with RNNs 240 Gumbel-Softmax 242 7 Learning and inference at scale 244 7.1 Examples in this chapter 246 Amazon Products dataset 246 ■ GeoGrid 248 7.2 Framing problems of scale 249 Root causes 249 ■ Symptoms 250 ■ Crucial metrics 250 7.3 Techniques for tackling problems of scale 254 Seven techniques 254 ■ General Steps 255 7.4 Choice of hardware configuration 256 Types of hardware choices 256 ■ Choice of processor and memory size 257 7.5 Choice of data representation 260 7.6 Choice of GNN algorithm 262 Time and space complexity 262 7.7 Batching using a sampling method 265 Two concepts: Mini-batching and sampling 266 ■ A glance at notable PyG samplers 267 7.8 Parallel and distributed processing 270 Using distributed data parallel 270 ■ Code example for DDP 271 7.9 Training with remote storage 275 Example 276

CONTENTSx7.10 Graph coarsening 278 Example 280 8 Considerations for GNN projects 284 8.1 Data preparation and project planning 285 Project definition 286 ■ Project objectives and scope 286 8.2 Designing graph models 288 Get familiar with the domain and use case 289 ■ Constructing the graph dataset and schemas 290 ■ Creating instance models 295 Testing and refactoring 297 8.3 Data pipeline example 298 Raw data 300 ■ The ETL step 302 ■ Data exploration and visualization 306 ■ Preprocessing and loading data into PyG 310 8.4 Where to find graph data 317 appendix A Discovering graphs 321 appendix B Installing and configuring PyTorch Geometric 350 further reading 352 references 355 index 361

foreword Our world is highly rich in structure, comprising objects, their relations, and hierar- chies. Sentences can be represented as sequences of words, maps can be broken down into streets and intersections, the world wide web connects websites via hyperlinks, and chemical compounds can be described by a set of atoms and their interactions. Despite the prevalence of graph structures in our world, both traditional and even modern machine learning methods struggle to properly handle such rich structural information: machine learning conventionally expects fixed-sized vectors as inputs and is thus only applicable to simpler structures such as sequences or grids. Conse- quently, graph machine learning has long relied on labor-intensive and error-prone handcrafted feature engineering techniques. Graph neural networks (GNNs) finally revolutionize this paradigm by breaking up with the regularity restriction of conven- tional deep learning techniques. They unlock the ability to learn representations from raw graph data with exceptional performance and allow us to view deep learning as a much broader technique that can seamlessly generalize to complex and rich topologi- cal structures. When I began to dive into the field of graph machine learning, deep learning on graphs was still in its early stages. Over time, dozens to hundreds of different methods were developed, contributing incremental insights and refreshing ideas. Tools like our own PyTorch Geometric library have expanded significantly, offering cutting-edge graph-based building blocks, models, examples, and scalability solutions. Reflecting on this growth, it’s clear how overwhelming it can be for newcomers to navigate the essentials and best practices that have emerged over time, as valuable information isxi

FOREWORDxiiscattered across theoretical research papers or buried in implementations in GitHub repositories. Now that the power of GNNs has been widely understood, this timely book pro- vides a well-structured and easy-to-follow overview of the field, providing answers to many pain points of graph machine learning practitioners. The hands-on approach, with practical code examples embedded directly within each chapter, invaluably demystifies the complexities, making the concepts tangible and actionable. Despite the success of GNNs across all kinds of domains in research, adoption in real-world applications remains limited to companies that have enough resources to acquire the necessary knowledge for applying GNNs in practice. I’m confident that this book will serve as an invaluable resource to empower practitioners to overcome that gap and unlock the full potential of GNNs. —MATTHIAS FEY, creator of PyTorch Geometric and founding engineer, Kumo.AI

preface My journey into the world of graphs began unexpectedly, during an interview at LinkedIn. As the session wrapped up, I was shown a visualization of my network—a mes- merizing structure that told stories without a single word. Organizations I had been part of appeared clustered, like constellations against a dark canvas. What surprised me most was that this structure was not built using metadata LinkedIn held about my connec- tions; rather, it emerged organically from the relationships between nodes and edges. Years later, driven by curiosity, I recreated that visualization. I marveled once again at how the underlying connections alone could map out an intricate picture of my professional life. This deepened my appreciation for the power inherent in graphs—a fascination that only grew when I joined Cloudera and encountered graph neural net- works (GNNs). Their potential for solving complex problems was captivating, but div- ing into them was like trying to navigate an uncharted forest without a map. There were no comprehensive resources tailored for nonacademics; progress was slow, often cobbled together from fragments and trial and error. This book is the guide I wish I had during those early days. It aims to provide a clear and accessible path for practitioners, enthusiasts, and anyone looking to under- stand and apply GNNs without wading through endless academic papers or fragmented online searches. My hope is that it serves as a one-stop resource for you to learn the fundamentals and paves the way for deeper exploration. Whether you’re here out of professional necessity, sheer curiosity, or the same kind of amazement that first drew me in, I invite you to embark on this journey. Together, let’s bring the potential of GNNs to life. —KEITA BROADWATERxiii

acknowledgments Many people brought this book to life. Thanks to the development and editorial staff at Manning, especially Frances Lefkowitz (development editor) and Frances Buontempo (technical development editor). In addition, thanks to the production staff for all the hard work behind the scenes to shepherd this book into its final format. Thanks to all the reviewers whose suggestions helped make this a better book: Abe Taha, Adi Shavit, Aditya Visweswaran, Alain Couniot, Allan Makura, Amaresh Rajasekharan, Andrew Mooney, Ariel Gamino, Atilla Ozgur, Atul Saurav, Ayush Bihani, Cosimo Attanasi, Daniel Berecz, Davide Cadamuro, Fernando García Sedano, Gautham K., George Loweree Gaines, Giampiero Granatella, Gourav Sengupta, Igor Vieira, Ioannis Atsonios, John Powell, Karrtik Iyer, Keith Kim, Maciej Szymkiewicz, Maxime Dehaut, Maxim Volgin, Mikael Dautrey, Ninoslav Čerkez, Noah Flynn, Or Golan, Peter Henstock, Richard Tobias, Rodolfo Allendes, Rohit Agarwal, Sadhana Ganapathiraju, Sanjeev Kilarapu, Sergio Govoni, Simona Russo, Simone Sguazza, Sowmya Vajjala, Sri Ram Macharla, Thomas Joseph Heiman, Tymoteusz Wołodźko, Vidhya Vinay, Viton Vitanis, Vojta Tuma, and Wei Luo. KEITA BROADWATER: I thank my mother and father for instilling within me a love of books and learning. I thank my friends Jaz and Mindy for their encouragement. I thank the team at Cloudera and Fast Forward Labs where the seed of this book began. I thank Jeremy Howard for changing my perspective about deep learning. Many thanks to Frances Lefkowitz who was a steady guide in creating this book. And I thank my co-author, Namid, for sharing this journey with me.xiv

ACKNOWLEDGMENTS xvNAMID STILLMAN: I thank my family for fostering my desire to learn about the world and encouraging my inclination to bring others with me as I do. I thank my academic mentors, especially Ollie, Martin, Roberto, and Gilles, who gave me the tools to think technically and the encouragement to go out and use them. And I thank my co- author, Keita, for bringing me on this journey.

about this book Graph Neural Networks in Action is a book designed for people to jump quickly into this new field and start building applications. At the same time, we try to strike a balance by including just enough critical theory to make this book as standalone as possible. We also fill in implementation details that may not be obvious or are left unexplained in the currently available online tutorials and documents. In particular, information about new and emerging topics is very likely to be fragmented. This fragmentation adds friction when implementing and testing new technologies. With Graph Neural Networks in Action, we offer a book that can reduce that friction by filling in the gaps and answering key questions whose answers are likely scattered over the internet or not covered at all. We’ve done so in a way that emphasizes approachability rather than high rigor. Who should read this book This book is designed for machine learning engineers and data scientists familiar with neural networks but new to graph learning. If you have experience in object-oriented programming, you’ll find the concepts particularly accessible and applicable. How this book is organized: A road map In part 1 of this book, we provide a motivation for exploring GNNs, as well as cover fun- damental concepts of graphs and graph-based machine learning. In chapter 1, we intro- duce the concepts of graphs and graph machine learning, providing guidelines for their use and applications. Chapter 2 covers graph representations up to and including nodexvi

ABOUT THIS BOOK xviiembeddings. This will be the first programmatic exposure to graph neural networks (GNNs), which are used to create such embeddings. In part 2, the core of the book, we introduce the major types of GNNs, including graph convolutional networks (GCNs) and GraphSAGE in chapter 3, graph attention networks (GATs) in chapter 4, and graph autoencoders (GAEs) in chapter 5. These methods are the bread and butter for most GNN applications and also cover a range of other deep learning concepts such as convolution, attention, and autoencoders. In part 3, we’ll look at more advanced topics. We describe GNNs for dynamic graphs (spatio-temporal GNNs) in chapter 6 and give methods to train GNNs at scale in chapter 7. Finally, we end with some consideration for project and system planning for graph learning projects in chapter 8. About the code Python is the coding language of choice throughout this book. There are now several GNN libraries in the Python ecosystem, including PyTorch Geometric (PyG), Deep Graph Library (DGL), GraphScope, and Jraph. We focus on PyG, which is one of the most popular and easy-to-use frameworks, written on top of PyTorch. We want this book to be approachable by an audience with a wide set of hardware constraints, so with the exception of some individual sections and chapter 7 on scalability, distributed systems and GPU systems aren’t required, although they can be used for some of the coded examples. The book provides a survey of the most relevant implementations of GNNs, includ- ing graph convolutional networks (GCNs), graph autoencoders (GAEs), graph atten- tion networks (GATs), and graph long short-term memory (LSTM). The aim is to cover the GNN tasks mentioned earlier. In addition, we’ll touch on different types of graphs, including knowledge graphs. This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Sometimes code is also in bold to highlight code that has changed from previous steps in the chapter, such as when a new feature adds to an existing line of code. In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and listings include line-continuation markers (➥). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts. You can get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/graph-neural-networks-in-action. The complete code for the examples in the book is available for download from the Man- ning website at www.manning.com/books/graph-neural-networks-in-action and from GitHub at https://github.com/keitabroadwater/gnns_in_action.

ABOUT THIS BOOKxviiiliveBook discussion forum Purchase of Graph Neural Networks in Action includes free access to liveBook, Manning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the authors and other users. To access the forum, go to https://livebook.manning.com/ book/graph-neural-networks-in-action/discussion. You can also learn more about Man- ning’s forums and the rules of conduct at https://livebook.manning.com/discussion. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the authors can take place. It is not a commitment to any specific amount of participation on the part of the authors, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the authors some challenging questions lest their interest stray! The forum and the archives of previous discussions will be accessible from the pub- lisher’s website as long as the book is in print.

Statistics

Uploader

Graph Neural Networks in Action (Keita Broadwater, Namid Stillman) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Statistics

Uploader

Graph Neural Networks in Action (Keita Broadwater, Namid Stillman) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment