统计信息
308
浏览次数
145
下载次数
0
捐款次数
支持
分享
分享者

高宏飞

分享于 2025年08月02日

作者Adam Bellemare

Building an Event-Driven Data Mesh Patterns for Designing and Building Event-Driven Architectures The exponential growth of data combined with the need to derive real-time business value is a critical issue. An event-driven data mesh can power real-time operational and analytical workloads, all from a single set of data product streams. With practical real-world examples, this book provides patterns that show software architects and developers how to successfully design and build an event-driven data mesh. Author Adam Bellemare demonstrates what events and streams are, where they come from, and how you can use them. You’ll also examine design patterns, their implications, and trade-offs inherent in their use. This book provides:• A foundation for how events and event streams relate to the four pillars of data mesh • Practical tips for building an event-driven data mesh, including incremental integration with your existing systems • A clear understanding of how events relate to systems and other events, both in the same stream and across streams • A realistic look at event design options such as fact, delta, and command event types, including how these choices will impact your data products • Best practices for privacy, handling events at scale, and regulatory compliance • Advice on asynchronous communication and handling eventual consistency

标签
暂无标签
ISBN: 1098127609
出版社: O'Reilly Media
出版年份: 2023
语言: 英文
页数: 262
文件格式: PDF
文件大小: 6.9 MB
支持统计
¥.00 · 0
文本预览 (前20页)
注册用户可免费阅读完整内容

注册成为高阁书海会员,即可免费在线阅读完整电子书,享受更好的阅读体验。

B ellem a re B uild ing a n Event-D riven D a ta M esh B uild ing a n Event-D riven D a ta M esh Adam Bellemare Building an Event-Driven Data Mesh Patterns for Designing & Building Event-Driven Architectures
SOF T WARE ARCHITEC TURE ”Adam Bellemare offers a concrete and practical architectural approach to realize the promise of data mesh.” —Chris Ford Head of Technology, Thoughtworks Building an Event-Driven Data Mesh Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia The exponential growth of data combined with the need to derive real-time business value is a critical issue. An event-driven data mesh can power real-time operational and analytical workloads, all from a single set of data product streams. With practical real-world examples, this book provides patterns that show software architects and developers how to successfully design and build an event-driven data mesh. Author Adam Bellemare demonstrates what events and streams are, where they come from, and how you can use them. You’ll also examine design patterns, their implications, and trade-offs inherent in their use. This book provides: • A foundation for how events and event streams relate to the four pillars of data mesh • Practical tips for building an event-driven data mesh, including incremental integration with your existing systems • A clear understanding of how events relate to systems and other events, both in the same stream and across streams • A realistic look at event design options such as fact, delta, and command event types, including how these choices will impact your data products • Best practices for privacy, handling events at scale, and regulatory compliance • Advice on asynchronous communication and handling eventual consistency Adam Bellemare is a staff technologist, Office of the CTO, at Confluent. He previously served as staff engineer for data platforms at Shopify and Flipp, and has worked extensively with microservices, data pipelines, and distributed computing systems and infrastructure. Adam’s expertise includes technical thought leadership, software development, microservices, and data engineering. He’s the author of Building Event-Driven Microservices (O’Reilly). US $65.99 CAN $82.99 ISBN: 978-1-098-12760-2 B ellem a re
Adam Bellemare Building an Event-Driven Data Mesh Patterns for Designing and Building Event-Driven Architectures Boston Farnham Sebastopol TokyoBeijing
978-1-098-12760-2 [LSI] Building an Event-Driven Data Mesh by Adam Bellemare Copyright © 2023 Adam Bellemare. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (https://oreilly.com). For more information, contact our corporate/institu‐ tional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Melissa Duffield Development Editor: Melissa Potter Production Editors: Jonathon Owen and Beth Kelly Copyeditor: Stephanie English Proofreader: Penelope Perkins Indexer: nSight, Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea April 2023: First Edition Release History for the First Edition 2023-04-04: First Release See https://oreilly.com/catalog/errata.csp?isbn=9781098127602 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Building an Event-Driven Data Mesh, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. Event-Driven Data Communication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 What Is Data Mesh? 2 An Event-Driven Data Mesh 3 Using Data in the Operational Plane 5 The Data Monolith 5 The Difficulties of Communicating Data for Operational Concerns 6 The Analytical Plane: Data Warehouses and Data Lakes 10 The Organizational Impact of Schema on Read 13 Bad Data: The Costs of Inaction 17 Can We Unify Analytical and Operational Workflows? 19 Rethinking Data with Data Mesh 20 Common Objections to an Event-Driven Data Mesh 21 Producers Cannot Model Data for Everyone’s Use Cases 21 Making Multiple Copies of Data Is Bad 22 Eventual Consistency Is Too Difficult to Manage 23 Summary 24 2. Data Mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Principle 1: Domain Ownership 26 Domain-Driven Design in Brief 26 Selecting the Data to Expose from Your Domain 28 Principle 2: Data as a Product 29 Data Products Provide Immutable and Time-Stamped Data 30 Data Products Are Multimodal 31 Accessing a Data Product Via Push or Pull 32 The Three Data Product Alignment Types 33 iii
Event-Driven Data Products as Inputs for Operational Systems 36 Principle 3: Federated Governance 37 Specifying Data Product Language, Framework, and API Support 38 Establishing Data Product Life Cycle Requirements 38 Establishing Data Handling and Infosec Policies 38 Identifying and Standardizing Cross-Domain Polysemes 39 Formalizing Self-Service Platform Requirements 39 Principle 4: Self-Service Platform 39 Discovering Data Products and Dependencies 40 Data Product Management Controls 41 Data Product Access Controls 42 Compute and Storage Resources for Building and Using Data Products 42 Providing Self-Service Through SaaS 43 Summary 44 3. Event Streams for Data Mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Events, Messages, and Records 49 What’s an Event Stream? What Is It Not? 51 Ephemeral Message-Passing 52 Queuing 53 Consuming and Using Event-Driven Data Products 55 State Events and Event-Carried State Transfer 55 Materializing Events 56 Aggregating Events 57 The Kappa Architecture 59 The Lambda Architecture and Why It Doesn’t Work for Data Mesh 62 Supporting the Requirements for Kappa Architecture 65 Selecting an Event Broker 67 Summary 69 4. Federated Governance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Forming a Federated Governance Team 73 Implementing Standards 74 Supporting Multimodal Data Product Types 74 Supporting Data Product Schemas 75 Supporting Programming Languages and Frameworks 76 Metadata Standards and Requirements 77 Ensuring Cross-Domain Data Product Compatibility and Interoperability 81 Defining and Using Common Entities 82 Event Stream Keying and Partitioning 82 Time and Time Zones 83 What Does a Governance Meeting Look Like? 84 iv | Table of Contents
1. Identifying Existing Problems 84 2. Drafting Proposals 85 3. Reviewing Proposals 86 4. Implementing Proposals 86 5. Archiving Proposals 87 Data Security and Access Policies 87 Disable Data Product Access by Default 88 Consider End-to-End Encryption 88 Field-Level Encryption 89 Data Privacy, the Right to Be Forgotten, and Crypto-Shredding 90 Data Product Lineage 92 Topology-Based Lineage 93 Record-Based Lineage 93 Summary 95 5. Self-Service Data Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 The Self-Service Platform Maturity Model 98 Level 1: The Minimal Viable Platform 99 The Schema Registry 99 An Extremely Basic Metadata Catalog 100 Connectors 101 Level 1 Wrap-Up: How Does It Work? 102 Level 2: The Expanded Platform 103 Full-Featured Metadata Catalog 104 The Data Product Management Service and UI 106 Service and User Identities 110 Basic Access Controls 112 Stream Processing for Building Data Products 114 Level 2 Wrap-Up: How Does It Work? 116 Level 3: The Mature Platform 116 Authentication, Identification, and Access Management 118 Integration with Existing Application Delivery Processes 119 Programmatic Data Product Management API 120 Monitoring and Alerting 122 Multiregion and Multicloud Data Products 123 Level 3 Wrap-Up: How Does It Work? 125 Summary 125 6. Event Schemas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 A Brief Introduction to Serialization and Deserialization 128 What Is a Schema? 129 What Are Our Schema Technology Options? 132 Table of Contents | v
Google’s Protocol Buffers, aka Protobuf 133 Apache Avro 134 JSON Schema 135 Schema Evolution: Changing Your Schemas Through Time 137 Negotiating a Breaking Schema Change 140 Step 1: Design the New Data Model 141 Step 2: Iterate with Your Existing Consumers and the Federated Governance Team 141 Step 3. Create a Release Schedule, a Data Migration Plan, and a Deprecation Plan 142 Step 4. Execute the Release 143 The Role of the Schema Registry 143 Best Practices for Managing Schemas in Your Codebase 146 Choosing a Schema Technology 148 Summary 150 7. Designing Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Introduction to Event Types 151 Expanding on State Events and Event-Carried State Transfer 152 Current State Events 153 Before/After State Events 154 Delta Events 156 Event Sourcing with Delta Events 156 Why Delta Events Don’t Work for Event-Driven Data Products 159 Measurement Events 168 Measurement Events Often Form Aggregate-Aligned Data Products 168 Measurement Event Sources May Be Lossy 168 Measurement Events May Power Time-Sensitive Applications 169 Hybrid Events—State with a Bit of Delta 170 Notification Events 172 Summary 173 8. Bootstrapping Data Products. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Getting Started: Bootstrapping with Connectors 176 Dual Writes 176 Polling the Database to Create Data Products 177 Change-Data Capture 179 Change-Data Capture Using a Transactional Outbox 182 Denormalization and Eventification 186 Eventification at the Transactional Outbox 189 Eventification in a Dedicated Service 190 What Should Go In the Event? And What Should Stay Out? 192 vi | Table of Contents
Slowly Changing Dimensions 193 Bootstrapping Cloud Storage Files to an Event Stream 195 Summary 197 9. Integrating Event-Driven Data into Data at Rest. . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Analytics and the Medallion Architecture 199 Connecting Event Streams Into Existing Batch-Data Flows 201 Through the Lens of Data Mesh: What’s Going On? 204 Through the Lens of Data Mesh: How Do We Solve It? 204 Balancing File Sizes, SLAs, and Latency 206 Budget Blues: A Tale of Overspending 207 Extending the Self-Service Platform for Nonstreaming Data Products 211 Summary 212 10. Eventual Consistency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Converging on Consistency, One Event at a Time 217 Strategies for Dealing with Eventual Consistency 220 Prevent Failures to Avoid Inconsistency 221 Use Event-Driven Data Products Instead of Request-Response Server API Calls 221 Expose Eventual Consistency in the Server Response 223 Plan for New Services and Reprocessing of Data 224 Synchronize Data Products on Time Boundaries 226 Out-of-Order Events 227 Resolving Late-Arriving Events 228 Summary 230 11. Bringing It All Together. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Event Streams for Data Mesh 235 Integrating with Existing Systems 235 Operations, Analytics, and Everything in Between 236 Summary 236 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Table of Contents | vii
(此页无文本内容)
Preface Data mesh is a fundamental shift in the way we think about, create, share, and use data. We promote data to a first-class citizen by carefully curating and crafting it into data products, supported with the same level of care and commitment as any other business product. Consumers can discover and select the data products they need for their own use cases, relying upon the commitment of the data product producer to maintain and support it. At its heart, data mesh is as much about technological reor‐ ganization as it is about the renegotiation of social contracts, responsibilities, and expectations. Back when I wrote Building Event-Driven Microservices (O’Reilly) I made reference to (and a bit vaguely defined) a data communication layer, very similar yet not nearly so well thought out as data mesh. The principles of the data communication layer were simple enough: treat data as a first-class citizen, make it reliable and trustworthy, and produce it through event streams so that you can power both operational and analyti‐ cal applications. The beauty of data mesh is that it’s not a big-bang total revision of everything we know about data. In fact, it’s really an affirmation of best practices, both social and technical, based on the collective hard work and experiences of countless people. It provides the framework necessary to discuss how to go about creating, communicat‐ ing, and using data, acting as a lingua franca for the data world. Zhamak Dehghani has done a phenomenal job in bringing data mesh to the world. I remember being blown away by her initial article in Martin Fowler’s blog from 2019. She very eloquently described the problems that my team was facing at that very moment and identified the principles we would need to adopt for working toward a solution. Her work really influenced my thinking on the need to have a well-defined data communication layer to make sharing and using data reliable and easy. Dehghani’s data mesh is precisely the social-technical framework we need to build a better data world. ix
Events and event streams play a critical role in a data mesh, as your business opportu‐ nities can only ever be solved as fast as your slowest data source. Classic analytical use cases, such as computing a monthly sales report, may be satisfied with a data product that updates just once a day. But many of your most important business use cases, such as fulfilling a sale, computing inventory, and ensuring prompt shipment, require real-time data. An event-driven data mesh provides the capabilities to power both operational and analytical use cases, in both real time and batch. There is real value in adopting a data mesh. It streamlines discovery, consumption, processing, and application of data across your entire organization. But one of the best features of data mesh is that you can start applying it wherever you are today. It is not an all-or-nothing proposition. You can take the pieces, principles, and concepts that work for improving your situation, and leave the rest until you’re ready to adopt those next. I’m quite excited about data mesh. It provides us with a principled social and techno‐ logical framework for building out our own data meshes, but just as importantly, the language to talk about and solve data problems with all of our colleagues. I hope you’ll enjoy reading this book as much as I did writing it. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. This element signifies a tip or suggestion. This element signifies a general note. x | Preface
This element indicates a warning or caution. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/build-data-mesh. Email bookquestions@oreilly.com to comment or ask technical questions about this book. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media Follow us on Twitter: https://twitter.com/oreillymedia Watch us on YouTube: https://youtube.com/oreillymedia Preface | xi
Acknowledgments There are many people who I would like to thank for supporting, reviewing, and advising me while writing this book. I’d like to thank my development editors, Nicole Tache and Melissa Potter, who both provided a ton of great support and really helped keep me focused and accountable. I’ve also been fortunate enough to have two stellar production editors, Beth Kelly and Jonathon Owen. They really helped take a kludge of TODOs, mostly completed figures, and run on sentences and reshape it into some‐ thing coherent and sensible. Thanks as well to Stephanie English who provided the copyediting as we moved from draft into production. My reviewer and former Confluent colleague Hubert Daley provided initial thoughts and feedback that helped shape the rest of the book. Chris Ford, Head of Technology, Thoughtworks, provided critical feedback, helping me identify what worked and what didn’t. Pramod Sadalage of Thoughtworks, Data Mesh leader for North Amer‐ ica, similarly provided me a wealth of constructive criticisms and support. Thanks to each of you for taking the time to help me improve this book. Thanks to my Confluent colleagues Ben Stopford, Andrew Sellers, Jack Vanlightly, Ian Robinson, and Travis Hoffman with whom I had many discussions on the merits, drawbacks, and implementation of data mesh. I greatly value your thoughts, com‐ ments, constructive criticisms, and helpful insights. And finally, thanks to my family and friends who provided me with the emotional support and encouragement to keep on keeping on. xii | Preface
CHAPTER 1 Event-Driven Data Communication The way that businesses relate to their data is changing rapidly. Gone are the days when all of a business’s data would fit neatly into a single relational database. The big data revolution, started more than two decades ago, has since evolved, and it is no longer sufficient to store your massive data sets in a big data lake for batch analysis. Speed and interconnectivity have emerged as the next major competitive business requirements, again transforming the way that businesses create, store, access, and share their important data. Data is the lifeblood of a business. But many of the ways that businesses create, share, and use data is haphazard and disjointed. Data mesh provides a comprehensive framework for revisiting these often dysfunctional relationships and provides a new way to think about, build, and share data across an organization, so that we can do helpful and useful things: better service for our customers, error-free reporting, actionable insights, and enabling truly data-driven processes. To get an understanding of what we’re trying to fix, we first need an idea of the main data problems facing a modern business. First, big data systems, underpinning a company’s business analytics engine, have exploded in size and complexity. There have been many attempts to address and reduce this complexity, but they all fall short of the mark. Second, business operations for large companies have long since passed the point of being served by a single monolithic deployment. Multiservice deployments are the norm, including microservice and service-oriented architectures. The boundaries of these modular systems are seldomly easily defined, especially when many separate operational and analytical systems rely on read-only access to the same data sets. There is an opposing tension here: on one hand, colocating business functions in a single application provides consistent access to all data produced and stored in that 1
system. On the other, these business functions may have absolutely no relation to one another aside from needing common read-only access to important business data. And third, a problem common to both operational and analytical domains: the inability to access high-quality, well-documented, self-updating, and reliable data. The sheer volume of data that an organization deals with increases substantially year- over-year, fueling a need for better ways to sort, store, and use it. This pressure deals the final blow to the ideal of keeping everything in a single database and forces devel‐ opers to split up monolithic applications into separate deployments with their own databases. Meanwhile, the big data teams struggle to keep up with the fragmentation and refactoring of these operational systems, as they remain solely responsible for obtaining their own data. Data has historically been treated as a second-class citizen, as a form of exhaust or by- product emitted by business applications. This application-first thinking remains the major source of problems in today’s computing environments, leading to ad hoc data pipelines, cobbled together data access mechanisms, and inconsistent sources of similar-yet-different truths. Data mesh addresses these shortcomings head-on, by fundamentally altering the relationships we have with our data. Instead of a secon‐ dary by-product, data, and the access to it, is promoted to a first-class citizen on par with any other business service. Important business data needs to be readily and reliably available as building block primitives for your applications, regardless of the runtime, environment, or codebase of your application. We treat our data as a first-class citizen, complete with dedicated ownership, minimum quality guarantees, service-level agreements (SLAs), and scal‐ able mechanisms for clean and reliable access. Event streams are the ideal mechanism for serving this data, providing a simple yet powerful way of reliably communicating important business data across an organization, enabling each consumer to access and use the data primitives they need. In this chapter, we’ll take a look at the forces that have shaped the operational and analytical tools and systems that we commonly use today and the problems that go along with them. The massive inefficiencies of contemporary data architectures pro‐ vide us with rich learnings that we will apply to our event-driven solutions. This will set the stage for the next chapter, when we talk about data mesh as a whole. What Is Data Mesh? Data mesh was invented by Zhamak Dehghani. It’s a social and technological shift in the way that data is created, accessed, and shared across organizations. Data mesh provides a lingua franca for discussing the needs and responsibilities of different teams, domains, and services and how to they can work together to make data a first- class citizen. This chapter explores the principles that form the basis of data mesh. 2 | Chapter 1: Event-Driven Data Communication
In my last book, Building Event-Driven Microservices (O’Reilly), I introduced the term data communication layer, touching on many of the same principles as data mesh: treat data as a first-class citizen, formalize the structure for communication between domains, publish data to event streams for general purpose usage, and make it easy to use for both the producers and consumers of data. And while I am fond of the data communication layer terminology, the reality is that I think the language and formal‐ ized principles of data mesh provide everything we need to talk about this problem without introducing another “data something something” paradigm. Dehghani’s book, Data Mesh (O’Reilly), showcases the theory and thought leadership of data mesh in great depth and detail, but remains necessarily agnostic of specific implementations. In this book, we’ll look at a practical implementation of data mesh that uses the event stream as the primary data product mode for interdomain data communications. We can be a bit more pragmatic and less intense on the theory and more concrete and specific on the implementation of an event-driven design. While I think that event streams are fundamentally the best option for interdomain communication, they do come with trade-offs, and I will, of course, cover these too, mentioning nonstreaming possibilities where they are best suited. Data mesh is based on four main principles: domain ownership, data as a product, federated governance, and self-service platform. Together, these principles help us structure a way to communicate important business data across the entire organiza‐ tion. We’ll evaluate these principles in more detail in the next chapter, but before we get there, let’s take a look at why data mesh matters today. An Event-Driven Data Mesh The modern competitive requirements of big data in motion, combined with modern cloud computing, require a rethink of how businesses create, store, move, and use data. The foundation of this new data architecture is the event, the data quantum that represents real business activities, provided through a multitude of purpose-built event streams. Event streams provide the means for a central nervous system for ena‐ bling business units to access and use fundamental, self-updating data building blocks. These data building blocks join the ranks of containerization, infrastructure as a service (IaaS), continuous integration (CI) and continuous deployment (CD) pipelines, and monitoring solutions, the components on which modern cloud appli‐ cations are built. An Event-Driven Data Mesh | 3
Event streams are not new. But many of the technological limitations underpinning previous event-driven architectures, such as limited scale, retention, and perfor‐ mance, have largely been alleviated. Modern multitenant event brokers complete with tiered storage can store an unbounded amount of data, removing the strict capacity restrictions that limited previous architectures. Producers write their important busi‐ ness domain data to an event stream, enabling others to couple on that stream and use the data building blocks for their own applications. Finally, consumer applica‐ tions can in turn create their own event streams to share their own business facts with others, resulting in a standardized communications mesh for all to use. Data mesh provides us with very useful concepts and language for building out this interconnected central nervous system. Figure 1-1 shows a basic example of what a data mesh could look like. Figure 1-1. A very basic Hello Data Mesh implementation The team that owns operational system Alpha selects some data from their service boundary, remodels it, and writes it to a source-aligned data product, which they also own (we’ll cover data product alignments more in “The Three Data Product Align‐ ment Types” on page 33). The team that owns operational system Beta reads data from this data product into its own service boundary, again remodeling it, transform‐ ing it, and storing only what they need. Meanwhile, a third team connects to Alpha team’s data product and uses it to com‐ pose their own aggregate-aligned data product. This same team then uses its aggregate-aligned data product to both power a streaming analytics use case and to write a batch of files to cloud storage, where data analysts will use it to compose reports and power existing batch-based analytics jobs. 4 | Chapter 1: Event-Driven Data Communication
This diagram represents just the tip of the data mesh iceberg, and there remain many areas to cover. But the gist of the event-driven data mesh is to make data readily avail‐ able in real time to any consumers who need it. Many of the problems that data mesh solves have existed for a very long time. We’re now going to take a brief history tour to get a better understanding of what it is we’re solving and why data mesh is a very relevant and powerful solution. Using Data in the Operational Plane Data tends to be created by an operational system doing business things. Eventually, that data tends to be pulled into the analytical plane for analysis and reporting pur‐ poses. In this section, we’ll focus on some of the operational plane and the common challenges of sharing business data with other operational (and analytical) services. The Data Monolith Online transaction processing (OLTP) databases form the basis of much of today’s operational computer services (let’s call them “monoliths” for simplicity). Monolithic systems tend to play a big role in the operational plane, as consistent synchronous communication tends to be simpler to reason and develop against than asynchronous communication. Relational databases, such as PostgreSQL and MySQL, feature heav‐ ily in monolithic applications, providing atomicity, consistency, isolation, and dura‐ bility (ACID) transactions and consistent state for the application. Together, the application and database demonstrate the following monolith data principles: The database is the source of truth The monolith relies on the underlying database to be the durable store of infor‐ mation for the application. Any new or updated records are first recorded into the database, making it the definitive source of truth for those entities. Data is strongly consistent The monolith’s data, when stored in a typical relational database, is strongly con‐ sistent. This provides the business logic with strong read-after-write consistency, and, thanks to transactions, it will not inadvertently access partially updated records. Read-only data is readily available The data stored within the monolith’s database can be readily accessed by any part of the monolith. Read-only access permissions ensure that there are no inad‐ vertent alterations to the data. Using Data in the Operational Plane | 5
Note that the database should be directly accessed only by the service that owns it, and not used as an integration point. These three principles form a binding force that make monolithic architectures powerful. Your application code has read-only access to the entire span of data stored in the monolith’s database as a set of authoritative, consistent, and accessible data primitives. This foundation makes it easy to build new application functionality pro‐ vided it’s in the same application. But what if you need to build a new application? The Difficulties of Communicating Data for Operational Concerns A new application cannot rely on the same easy access to data primitives that it would have if it were built as part of the monolith. This would not be a problem if the new application had no need for any of the business data in the monolith. However, this is rarely the case, as businesses are effectively a set of overlapping domains, particularly the common core, with the same data serving multiple business requirements. For example, an ecommerce retailer may rely on its monolith to handle its orders, sales, and inventory, but requires a new application powered by a document-based database (or other database type) for plain-text search functionality. Figure 1-2 highlights the crux of the issue: how do we get the data from Ol’ Reliable into the new document database to power search? Figure 1-2. The new search service team must figure out how to get the data it needs out of the monolith and keep it up to date This puts the new search service team in a bit of a predicament. The service needs access to the item, store, and inventory data in the monolith, but it also needs to model it all as a set of documents for the search engine. There are two common ways that teams attempt to resolve this. One is to replicate and transform the data to the search engine, in an attempt to preserve the three monolith data principles. The sec‐ ond is to use APIs to restructure the service boundaries of the source system, such 6 | Chapter 1: Event-Driven Data Communication