Statistics
4
Views
0
Downloads
0
Donations
Support
Share
Uploader

高宏飞

Shared on 2026-03-22

AuthorVictor Lee, Phuc Kien Nguyen, Alexander Thomas

With the rapid rise of graph databases, organizations are now implementing advanced analytics and machine learning solutions to help drive business outcomes. This practical guide shows data scientists, data engineers, architects, and business analysts how to get started with a graph database using TigerGraph, one of the leading graph database models available. You'll explore a three-stage approach to deriving value from connected data: connect, analyze, and learn. Victor Lee, Phuc Kien Nguyen, and Alexander Thomas present real use cases covering several contemporary business needs. By diving into hands-on exercises using TigerGraph Cloud, you'll quickly become proficient at designing and managing advanced analytics and machine learning solutions for your organization. • Use graph thinking to connect, analyze, and learn from data for advanced analytics and machine learning • Learn how graph analytics and machine learning can deliver key business insights and outcomes • Use five core categories of graph algorithms to drive advanced analytics and machine learning • Deliver a real-time 360-degree view of core business entities, including customer, product, service, supplier, and citizen • Discover insights from connected data through machine learning and advanced analytics

Tags
No tags
ISBN: 1098106652
Publisher: O'Reilly Media
Publish Year: 2023
Language: 英文
Pages: 317
File Format: PDF
File Size: 18.7 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

Lee, N g uyen & Thom a s Graph-Powered Analytics and Machine Learning with TigerGraph Driving Business Outcomes with Connected Data Victor Lee, Phuc Kien Nguyen & Alexander Thomas
MACHINE LE ARNING Graph-Powered Analytics and Machine Learning with TigerGraph Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia With the rapid rise of graph databases, organizations are now implementing advanced analytics and machine learning solutions to help drive business outcomes. This practical guide shows data scientists, data engineers, architects, and business analysts how to get started with a graph database using TigerGraph, one of the leading graph database models available. You’ll explore a three-stage approach to deriving value from connected data: connect, analyze, and learn. Victor Lee, Phuc Kien Nguyen, and Alexander Thomas present real use cases covering several contemporary business needs. By diving into hands-on exercises using TigerGraph Cloud, you’ll quickly become proficient at designing and managing advanced analytics and machine learning solutions for your organization. • Use graph thinking to connect, analyze, and learn from data for advanced analytics and machine learning • Learn how graph analytics and machine learning can deliver key business insights and outcomes • Use five core categories of graph algorithms to drive advanced analytics and machine learning • Deliver a real-time, 360-degree view of core business entities including customer, product, service, supplier, and citizen • Discover insights from connected data through machine learning and advanced analytics Victor Lee is vice president of machine learning and AI at TigerGraph. Phuc Kien Nguyen is a data scientist in the field of anti-money laundering and terrorist financing at ABN AMRO Bank. Alexander Thomas is a former TigerGraph technical writer with a background in linguistics and education. US $65.99 CAN $82.99 ISBN: 978-1-098-10665-2 Lee, N g uyen & Thom a s
Victor Lee, Phuc Kien Nguyen, and Alexander Thomas Graph-Powered Analytics and Machine Learning with TigerGraph Driving Business Outcomes with Connected Data Boston Farnham Sebastopol TokyoBeijing
978-1-098-10665-2 [LSI] Graph-Powered Analytics and Machine Learning with TigerGraph by Victor Lee, Phuc Kien Nguyen, and Alexander Thomas Copyright © 2023 O’Reilly Media. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Butterfield Development Editor: Gary O’Brien Production Editor: Jonathon Owen Copyeditor: nSight, Inc. Proofreader: Shannon Turlington Indexer: BIM Creatives, LLC Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea July 2023: First Edition Release History for the First Edition 2023-07-21: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098106652 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Graph-Powered Analytics and Machine Learning with TigerGraph, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors, and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. This work is part of a collaboration between O’Reilly and TigerGraph. See our statement of editorial independence.
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. Connections Are Everything. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Connections Change Everything 2 What Is a Graph? 2 Why Graphs Matter 3 Edges Outperform Table Joins 5 Graph Analytics and Machine Learning 9 Graph-Enhanced Machine Learning 9 Chapter Summary 10 Part I. Connect 2. Connect and Explore Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Graph Structure 16 Graph Terminology 16 Graph Schemas 22 Traversing a Graph 24 Hops and Distance 24 Breadth and Depth 25 Graph Modeling 25 Schema Options and Trade-Offs 26 Transforming Tables in a Graph 30 Model Evolution 32 Graph Power 33 Connecting the Dots 33 The 360 View 34 iii
Looking Deep for More Insight 35 Seeing and Finding Patterns 37 Matching and Merging 39 Weighing and Predicting 40 Chapter Summary 41 3. See Your Customers and Business Better: 360 Graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Case 1: Tracing and Analyzing Customer Journeys 43 Solution: Customer 360 + Journey Graph 44 Implementing the C360 + Journey Graph: A GraphStudio Tutorial 47 Create a TigerGraph Cloud Account 48 Get and Install the Customer 360 Starter Kit 48 An Overview of GraphStudio 51 Design a Graph Schema 51 Data Loading 54 Queries and Analytics 55 Case 2: Analyzing Drug Adverse Reactions 67 Solution: Drug Interaction 360 Graph 68 Implementation 68 Graph Schema 69 Queries and Analytics 70 Chapter Summary 77 4. Studying Startup Investments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Goal: Find Promising Startups 79 Solution: A Startup Investment Graph 80 Implementing a Startup Investment Graph and Queries 81 The Crunchbase Starter Kit 81 Graph Schema 82 Queries and Analytics 83 Chapter Summary 99 5. Detecting Fraud and Money Laundering Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Goal: Detect Financial Crimes 101 Solution: Modeling Financial Crimes as Network Patterns 102 Implementing Financial Crime Pattern Searches 103 The Fraud and Money Laundering Detection Starter Kit 103 Graph Schema 103 Queries and Analytics 104 Chapter Summary 115 iv | Table of Contents
Part II. Analyze 6. Analyzing Connections for Deeper Insight. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Understanding Graph Analytics 119 Requirements for Analytics 120 Graph Traversal Methods 120 Parallel Processing 122 Aggregation 122 Using Graph Algorithms for Analytics 123 Graph Algorithms as Tools 123 Graph Algorithm Categories 125 Chapter Summary 145 7. Better Referrals and Recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Case 1: Improving Healthcare Referrals 147 Solution: Form and Analyze a Referral Graph 148 Implementing a Referral Network of Healthcare Specialists 149 The Healthcare Referral Network Starter Kit 149 Graph Schema 149 Queries and Analytics 151 Case 2: Personalized Recommendations 160 Solution: Use Graph for Multirelationship-Based Recommendations 161 Implementing a Multirelationship Recommendation Engine 162 The Recommendation Engine 2.0 Starter Kit 162 Graph Schema 162 Queries and Analytics 164 Chapter Summary 172 8. Strengthening Cybersecurity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 The Cost of Cyberattacks 175 Problem 177 Solution 177 Implementing a Cybersecurity Graph 178 The Cybersecurity Threat Detection Starter Kit 178 Graph Schema 178 Queries and Analytics 180 Chapter Summary 190 9. Analyzing Airline Flight Routes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Goal: Analyzing Airline Flight Routes 191 Solution: Graph Algorithms on a Flight Route Network 192 Implementing an Airport and Flight Route Analyzer 193 Table of Contents | v
The Graph Algorithms Starter Kit 193 Graph Schema and Dataset 193 Installing Algorithms from the GDS Library 194 Queries and Analytics 195 Chapter Summary 207 Part III. Learn 10. Graph-Powered Machine Learning Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Unsupervised Learning with Graph Algorithms 213 Learning Through Similarity and Community Structure 213 Finding Frequent Patterns 214 Extracting Graph Features 215 Domain-Independent Features 216 Domain-Dependent Features 222 Graph Embeddings: A Whole New World 225 Graph Neural Networks 235 Graph Convolutional Networks 235 GraphSAGE 240 Comparing Graph Machine Learning Approaches 242 Use Cases for Machine Learning Tasks 243 Pattern Discovery and Feature Extraction Methods 244 Graph Neural Networks: Summary and Uses 244 Chapter Summary 245 11. Entity Resolution Revisited. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Problem: Identify Real-World Users and Their Tastes 247 Solution: Graph-Based Entity Resolution 249 Learning Which Entities Are the Same 249 Resolving Entities 250 Implementing Graph-Based Entity Resolution 251 The In-Database Entity Resolution Starter Kit 251 Graph Schema 251 Queries and Analytics 253 Method 1: Jaccard Similarity 254 Merging 261 Method 2: Scoring Exact and Approximate Matches 265 Chapter Summary 273 12. Improving Fraud Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Goal: Improve Fraud Detection 275 vi | Table of Contents
Solution: Use Relationships to Make a Smarter Model 276 Using the TigerGraph Machine Learning Workbench 277 Setting Up the ML Workbench 277 Working with ML Workbench and Jupyter Notes 279 Graph Schema and Dataset 280 Graph Feature Engineering 282 Training Traditional Models with Graph Features 283 Using a Graph Neural Network 286 Chapter Summary 289 Connecting with You 289 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Table of Contents | vii
(This page has no text content)
Preface Objectives The goal of this book is to introduce you to the concepts, techniques, and tools for graph data structures, graph analytics, and graph machine learning. When you’ve finished the book, we hope you’ll understand how graph analytics can be used to address a range of real-world problems. We want you to be able to answer questions like the following: Is graph a good fit for this task? What tools and techniques should I use? What are the meaningful relationships in my data, and how do I formulate a task in terms of relationship analysis? In our experience, we see that many people quickly grasp the general concept and structure of graphs, but it takes more effort and experience to “think graph,” that is, to develop the intuition for how best to model your data as a graph and then to formulate an analytical task as a graph query. Each chapter begins with a list of its objectives. The objectives fall into three general areas: learning concepts about graph analytics and machine learning; solving particular problems with graph analytics; and understanding how to use the GSQL query language and the TigerGraph graph platform. Audience and Prerequisites We designed this book for anyone who has an interest in data analytics and wants to learn about graph analytics. You don’t need to be a serious programmer or a data scientist, but some exposure to databases and programming concepts will definitely help you to follow the presentations. When we go into depth on a few graph algo‐ rithms and machine learning techniques, we present some mathematical equations involving sets, summation, and limits. Those equations, however, are a supplement to our explanations with words and figures. In the use case chapters, we will be running prewritten GSQL code on the TigerGraph Cloud platform. You’ll just need a computer and internet access. If you are familiar ix
with the SQL database query language and any mainstream programming language, then you will be able to understand much of the GSQL code. If you are not, you can simply follow the instructions and run the prewritten use case examples while following along with the commentary in the book. Approach and Roadmap We aim to present the material as motivated by real-world data analytics needs, as opposed to theoretical principles. We always try to explain things in the simplest terms we can, using everyday concepts instead of technical jargon. The GSQL language is introduced through complete examples. Early in the book, we provide line-by-line descriptions of the purpose and function of each line. We also highlight language structures, syntax, and semantics that are particularly important. For a comprehensive tutorial to GSQL, you can refer to additional resources beyond this book. This book is structured as three parts: Part I: Connect; Part II: Analyze; and Part III: Learn. Each part has two types of chapters. The first is a concept chapter, followed by two or three chapters of use cases on TigerGraph Cloud and GSQL. Chapter Format Title 1 Introduction Connections Are Everything Part I: Connect 2 Concept Connect and Explore Data 3 Use Case, Introduction to TigerGraph See Your Customers and Business Better: 360 Graphs 4 Use Case Studying Startup Investments 5 Use Case Detecting Fraud and Money Laundering Patterns Part II: Analyze 6 Concept Analyzing Connections for Deeper Insight 7 Use Case Better Referrals and Recommendations 8 Use Case Strengthening Cybersecurity 9 Use Case Analyzing Airline Flight Routes Part III: Learn 10 Concept Graph-Powered Machine Learning Methods 11 Use Case Entity Resolution Revisited 12 Use Case, Introduction to Machine Learning Workbench Improving Fraud Detection x | Preface
Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Indicates vertex or edge types. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. Using Code Examples This book has its own GitHub repository at https://github.com/TigerGraph-DevLabs/ Book-graph-powered-analytics. The initial content for this site will be copies of all the use case examples. We will also gather the book’s GSQL tips into a single document as a primer. As we receive feedback from readers (and we hope to hear from you!), we’ll post answers to frequently asked questions. We’ll also add additional or modified GSQL examples or point out how you can take advantage of new capabilities in the TigerGraph platform. Preface | xi
For additional resources on TigerGraph and the GSQL language, the most compre‐ hensive material will be found through TigerGraph’s main website (https://www.tiger graph.com), its documentation site (https://docs.tigergraph.com), or its YouTube channel (https://www.youtube.com/@TigerGraph). You can contact the authors at gpaml.book@gmail.com. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-829-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://www.oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/gpaml. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media Follow us on Twitter: https://twitter.com/oreillymedia Watch us on YouTube: https://youtube.com/oreillymedia xii | Preface
Acknowledgments This book would not exist without Gaurav Deshpande, TigerGraph’s VP of market‐ ing, who proposed that we should and could write it. He wrote the original proposal and chapter outline; the three-part structure is his idea. Thank you to TigerGraph’s CEO and Founder Dr. Yu Xu, who supported our effort and who granted us the flexibility to work on this project. Dr. Xu also envisioned GraphStudio and its Starter Kits. Mingxi Wu and Alin Deutsch developed the GSQL language with efficient graph analytics in mind. Besides the official authors, several others contributed to the material in this book. Tom Reeve applied his professional writing skills and knowledge of graph concepts to help us write Chapter 2, when writer’s block and procrastination seemed to be our biggest foe. Emily McAuliffe and Amanda Morris designed several of the figures in the Early Release edition of the book. We needed some data scientists to review our chapters on machine learning. We turned to Parker Erickson and Bill Shi, who not only are experts in graph machine learning but developed the TigerGraph ML Workbench. We are indebted to Xinyu Chang, TigerGraph’s original GSQL query and solutions expert, for developing or overseeing the development of many of the use case starter kits and graph algorithm implementations in this book. Yiming Pan also wrote or optimized several graph algorithms and queries. Many of the book’s examples are based on designs that they developed for TigerGraph’s customers. The schemas, queries, and output displays in those starter kits are just as much a part of the content of this book as are the English paragraphs. We made several improvements to the starter kits to adapt them for this book. A number of people helped with reviewing and standardizing the starter kits: Jon Herke, head of developer relations; and several TigerGraph interns: Abudula Aisikaer, Shreya Chaudhary, McKenzie Steenson, and Kristine Zheng. Renchu Song and Duc Le, who lead the design and development of TigerGraph Cloud and GraphStudio, made sure that our revised starter kits were released into the product. A million thanks to our two development editors at O’Reilly. Nicole Taché showed us the ropes and got us to our first early release of two chapters, with insightful comments, advice, and encouragement for this project. Gary O’Brien steered us from there to completion, through thick and thin. Both are wonderful editors, who were a pleasure and an honor to work with. Thank you also to our production editor Jonathon Owen and copyeditor Adam Lawrence. Victor would like to thank his parents George and Sylvia Lee for their tireless support of his academic and nonacademic pursuits. He would like to thank his wife Susan Haddox for always being there for him, for putting up with his writing late into the Preface | xiii
night, for watching any and all Star Trek with him, and for being his model for how a person can be wicked smart and kind and funny. Kien would like to thank his mother, My Linh Ly, for being a constant source of inspiration and a driving force for his career. He is also thankful for his wife, Sammy Wai-lok Lee, who has always been there with him, giving color to his life and caring for him and their baby girl Liv Vy Ly Nguyen-Lee, who was born during the writing of this book. Alex would like to thank his parents, Chris and Becky Thomas, and his sister, Ari, for their support and encouragement as discussion partners during the writing process. Special thanks goes to his wife Gloria Zhang for her incredible strength, her vast intelligence, and her limitless capability for inspiration. xiv | Preface
1 “Killer Application,” Wikipedia, last updated May 14, 2023, https://en.wikipedia.org/wiki/Killer_application. CHAPTER 1 Connections Are Everything In an extreme view, the world can be seen as only connections, nothing else. We think of a dictionary as the repository of meaning, but it defines words only in terms of other words. I liked the idea that a piece of information is really defined only by what it’s related to, and how it’s related. There really is little else to meaning. The structure is everything. —Tim Berners-Lee, Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web (1999), p. 14 The 20th century demonstrated how much we could achieve with spreadsheets and relational databases. Tabular data ruled. The 21st century has already shown us that that isn’t enough. Tables flatten our perspective, showing connections in only two dimensions. In the real world, things are related to and connected to a myriad of other things, and those relationships shape what is and what will happen. To gain full understanding, we need to model these connections. Personal computers were introduced in the 1970s, but they didn’t take off until they found their first killer apps: financial spreadsheets. VisiCalc on the Apple II and then Lotus 1-2-3 on the IBM PC1 automated the laborious and error-prone calculations that bookkeepers had been doing by hand ever since the invention of writing and arithmetic: adding up rows and columns of figures, and then perhaps performing even more complex statistical calculations. In 1970, E. F. Codd published his seminal paper on the relational database model. In these early days of databases, a few models were bouncing around, including the network database model. Codd’s relational model was built on something that everyone could identify with and was easy to program: the table. 1
Moreover, matrix algebra and many statistical methods are also ready-made to work with tables. Both physicists and business analysts used matrices to define and find the optimal solutions to everything from nuclear reactor design to supply chain man‐ agement. Tables lend themselves to parallel processing; just partition the workload vertically or horizontally. Spreadsheets, relational databases, and matrix algebra: the tabular approach seemed to be the solution to everything. Then the World Wide Web happened, and everything changed. Connections Change Everything The web is more than the internet. The internet began in the early 1970s as a data connection network between selected US research institutions. The World Wide Web, invented by CERN researcher Tim Berners-Lee in 1989, is a set of technologies run‐ ning on top of the internet that make it much easier to publish, access, and connect data in a format easy for humans to consume and interact with. Browsers, hyperlinks, and web addresses are also hallmarks of the web. At the same time that the web was being developed, governments were loosening their controls on the internet and allowing private companies to expand it. We now have billions of interconnected web pages, connecting people, multimedia, facts, and opinions at a truly global scale. Having data isn’t enough. How the data is structured matters. What Is a Graph? As the word “web” started to take on new connotations, so did the word “graph.” For most people, “graph” was synonymous with a line chart that could show something such as a stock’s price over time. Mathematicians had another meaning for the word, however, and as networks and connections started to matter to the business world, the mathematical meaning started to come to the fore. A graph is an abstract data structure consisting of vertices (or nodes) and connections between vertices called edges. That’s it. A graph is the idea of a network, constructed from these two types of elements. This abstraction allows us to study networks (or graphs) in general, to discover properties, and to devise algorithms to solve general tasks. Graph theory and graph analytics provided organizations with the tools they needed to leverage the sudden abundance of connected data. In Figure 1-1, we can see the network of relationships between the actors and direc‐ tors of Star Wars (1977) and The Empire Strikes Back (1980). This is easily modelable as a graph with different types of edges connecting the different types of vertices. Actors and movies can have an acted_in vertex connecting them, movies and other movies can be connected by an is_sequel_of vertex, and movies and directors can have a directed_by edge connection. 2 | Chapter 1: Connections Are Everything
Figure 1-1. A graph showing some key players and connections in early Star Wars films Why Graphs Matter The web showed us that sometimes we accomplish more by having varied data that is linked together than by trying to merge it all into a few rigid tables. It also showed us that connections themselves are a form of information. We have a limitless number of types of relationships: parent–child, purchaser–product, friend–friend, and so on. As Berners-Lee observed, we get meaning from connections. When we know someone is a parent, we can infer that they have had certain life experiences and have certain concerns. We can also make informed guesses at how the parent and child will interact relative to each other. The web, however, only highlighted what has always been true: data relationships matter when representing data and when analyzing data. Graphs can embody the informatiooonal content of relationships better than tables. This enriched data for‐ mat is better at representing complex information, and when it comes to analytics, it produces more insightful results. Business-oriented data analysts appreciate the intuitive aspect of seeing relationships visualized as a graph, and data scientists find that the richer content yields more accurate machine learning models. As a bonus, graph databases often perform faster than relational databases when working on tasks involving searching multiple levels of connections (or multiple hops). Structure matters The founders of Google recognized that the web would become too large for anyone to grasp. We would need tools to help us search for and recommend pages. A key component of Google’s early success was PageRank, an algorithm that models the Connections Change Everything | 3
internet as a set of interconnected pages and decides which are the most influential or authoritative pages—based solely on their pattern of interconnection. Over the years, search engines have become better and better at inferring from our queries what we would really like to know and would find useful. One of Google’s tools for that is its Knowledge Graph, an interconnected set of categorized and tagged facts and concepts, harvested from the broader web. After analyzing the user’s query to understand not just the surface words but the implied categories and objectives, Google searches its Knowledge Graph to find the best matching facts and then presents them in a well-formatted sidebar. Only a graph has the flexibility and expressiveness to make sense of this universe of facts. Communities matter Facebook started as a social networking app for college students; it’s grown to become the world’s largest online social network. It’s self-evident that Facebook cares about networks and graphs. From each user’s perspective, there is oneself and one’s set of friends. Though we act individually, people will naturally tend to gather into communities that evolve and have influence as though they were living entities them‐ selves. Communities are powerful influences on what information we receive and how we form opinions. Businesses leverage community behavior for promoting their products. People also use social networks to promote political agendas. Detecting these communities is essential to understand the social dynamics, but you won’t see the communities in a tabular view. Patterns of connections matter The same information can be presented either in tabular form or in graph form, but the graph form shows us things that the table obscures. Think of a family tree. We could list all the parent–child relationships in a table, but the table would miss important patterns that span multiple relationships: family, grandchildren, cousins. A less obvious example is a graph of financial transactions. Financial institutions and vendors look for particular patterns of transactions that suggest possible fraudulent or money laundering activity. One pattern is a large amount of money being transfer‐ red from party to party, with a high percentage of the money coming back to the origin: a closed loop. Figure 1-2 shows such loops, extracted from a graph database containing millions of transactions, from our financial fraud example in Chapter 5. Other patterns can be linear or Y-shaped; anything is possible. The pattern depends on the nature of the data and the question of interest. 4 | Chapter 1: Connections Are Everything