Event Streams in Action Real-time event systems with Kafka and Kinesis (Alexander Dean, Valentin Crettaz) (Z-Library)

Author: Alexander Dean, Valentin Crettaz

技术

Summary Event Streams in Action is a foundational book introducing the ULP paradigm and presenting techniques to use it effectively in data-rich environments. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Technology Many high-profile applications, like LinkedIn and Netflix, deliver nimble, responsive performance by reacting to user and system events as they occur. In large-scale systems, this requires efficiently monitoring, managing, and reacting to multiple event streams. Tools like Kafka, along with innovative patterns like unified log processing, help create a coherent data processing architecture for event-based applications. About the Book This book teaches you techniques for aggregating, storing, and processing event streams using the unified log processing pattern. In this hands-on guide, you'll discover important application designs like the lambda architecture, stream aggregation, and event reprocessing. You'll also explore scaling, resiliency, advanced stream patterns, and much more! By the time you're finished, you'll be designing large-scale data-driven applications that are easier to build, deploy, and maintain. What's inside Validating and monitoring event streams Event analytics Methods for event modeling Examples using Apache Kafka and Amazon Kinesis About the Reader For readers with experience coding in Java, Scala, or Python. About the Author Alexander Dean developed Snowplow, an open source event processing and analytics platform. Valentin Crettaz is an independent IT consultant with 25 years of experience. Table of Contents PART 1 - EVENT STREAMS AND UNIFIED LOGS Introducing event streams The unified log 24 Event stream processing with Apache Kafka Event stream processing with Amazon Kinesis Stateful stream processing PART 2- DATA ENGINEERING WITH STREAMS Schemas Archiving events Railway-oriented processing Commands PART 3 - EVENT ANALYTICS Analytics-on-read Analytics-on-

📄 File Format: PDF
💾 File Size: 14.1 MB
59
Views
0
Downloads
0.00
Total Donations

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

📄 Page 1
M A N N I N G Alexander Dean Valentin Crettaz Real-time event systems with Kafka and Kinesis
📄 Page 2
Event Streams in Action
📄 Page 3
(This page has no text content)
📄 Page 4
Event Streams in Action REAL-TIME EVENT SYSTEMS WITH KAFKA AND KINESIS ALEXANDER DEAN VALENTIN CRETTAZ M A N N I N G SHELTER ISLAND
📄 Page 5
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2019 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Acquisitions editors: Mike Stephens and 20 Baldwin Road Frank Pohlmann PO Box 761 Development editors: Jennifer Stout and Cynthia Kane Shelter Island, NY 11964 Technical development editor: Kostas Passadis Review editor: Aleks Dragosacljević Production editor: Anthony Calcara Copy editor: Sharon Wilkey Proofreader: Melody Dolab Technical proofreader: Michiel Trimpe Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781617292347 Printed in the United States of America
📄 Page 6
brief contents PART 1 EVENT STREAMS AND UNIFIED LOGS . .............................1 1 ■ Introducing event streams 3 2 ■ The unified log 24 3 ■ Event stream processing with Apache Kafka 38 4 ■ Event stream processing with Amazon Kinesis 60 5 ■ Stateful stream processing 88 PART 2 DATA ENGINEERING WITH STREAMS............................115 6 ■ Schemas 117 7 ■ Archiving events 140 8 ■ Railway-oriented processing 171 9 ■ Commands 208 PART 3 EVENT ANALYTICS .....................................................235 10 ■ Analytics-on-read 237 11 ■ Analytics-on-write 268v
📄 Page 7
(This page has no text content)
📄 Page 8
contents preface xiii acknowledgments xiv about this book xvi about the authors xix about the cover illustration xx PART 1 EVENT STREAMS AND UNIFIED LOGS . ...................1 1 Introducing event streams 3 1.1 Defining our terms 4 Events 5 ■ Continuous event streams 6 1.2 Exploring familiar event streams 7 Application-level logging 7 ■ Web analytics 8 Publish/subscribe messaging 10 1.3 Unifying continuous event streams 12 The classic era 13 ■ The hybrid era 16 The unified era 17 1.4 Introducing use cases for the unified log 19 Customer feedback loops 19 ■ Holistic systems monitoring 21 Hot-swapping data application versions 22vii
📄 Page 9
CONTENTSviii2 The unified log 24 2.1 Understanding the anatomy of a unified log 25 Unified 25 ■ Append-only 26 ■ Distributed 27 Ordered 28 2.2 Introducing our application 29 Identifying our key events 30 ■ Unified log, e-commerce style 31 Modeling our first event 32 2.3 Setting up our unified log 34 Downloading and installing Apache Kafka 34 ■ Creating our stream 35 ■ Sending and receiving events 36 3 Event stream processing with Apache Kafka 38 3.1 Event stream processing 101 39 Why process event streams? 39 ■ Single-event processing 41 Multiple-event processing 42 3.2 Designing our first stream-processing app 42 Using Kafka as our company’s glue 43 ■ Locking down our requirements 44 3.3 Writing a simple Kafka worker 46 Setting up our development environment 46 ■ Configuring our application 47 ■ Reading from Kafka 49 ■ Writing to Kafka 50 ■ Stitching it all together 51 ■ Testing 52 3.4 Writing a single-event processor 54 Writing our event processor 54 ■ Updating our main function 56 ■ Testing, redux 57 4 Event stream processing with Amazon Kinesis 60 4.1 Writing events to Kinesis 61 Systems monitoring and the unified log 61 ■ Terminology differences from Kafka 63 ■ Setting up our stream 64 Modeling our events 65 ■ Writing our agent 66 4.2 Reading from Kinesis 72 Kinesis frameworks and SDKs 72 ■ Reading events with the AWS CLI 73 ■ Monitoring our stream with boto 79 5 Stateful stream processing 88 5.1 Detecting abandoned shopping carts 89 What management wants 89 ■ Defining our algorithm 90 Introducing our derived events stream 91
📄 Page 10
CONTENTS ix5.2 Modeling our new events 92 Shopper adds item to cart 92 ■ Shopper places order 93 Shopper abandons cart 93 5.3 Stateful stream processing 94 Introducing state management 94 ■ Stream windowing 96 Stream processing frameworks and their capabilities 97 Stream processing frameworks 97 ■ Choosing a stream processing framework for Nile 100 5.4 Detecting abandoned carts 101 Designing our Samza job 101 ■ Preparing our project 102 Configuring our job 103 ■ Writing our job’s Java task 104 5.5 Running our Samza job 110 Introducing YARN 110 ■ Submitting our job 111 Testing our job 112 ■ Improving our job 113 PART 2 DATA ENGINEERING WITH STREAMS..................115 6 Schemas 117 6.1 An introduction to schemas 118 Introducing Plum 118 ■ Event schemas as contracts 120 Capabilities of schema technologies 121 ■ Some schema technologies 123 ■ Choosing a schema technology for Plum 125 6.2 Modeling our event in Avro 125 Setting up a development harness 126 ■ Writing our health check event schema 127 ■ From Avro to Java, and back again 129 Testing 131 6.3 Associating events with their schemas 132 Some modest proposals 132 ■ A self-describing event for Plum 135 ■ Plum’s schema registry 137 7 Archiving events 140 7.1 The archivist’s manifesto 141 Resilience 142 ■ Reprocessing 143 ■ Refinement 144 7.2 A design for archiving 146 What to archive 146 ■ Where to archive 147 How to archive 148 7.3 Archiving Kafka with Secor 149 Warming up Kafka 150 ■ Creating our event archive 152 Setting up Secor 153
📄 Page 11
CONTENTSx7.4 Batch processing our archive 155 Batch processing 101 155 ■ Designing our batch processing job 158 ■ Writing our job in Apache Spark 159 Running our job on Elastic MapReduce 163 8 Railway-oriented processing 171 8.1 Leaving the happy path 172 Failure and Unix programs 172 ■ Failure and Java 175 Failure and the log-industrial complex 178 8.2 Failure and the unified log 179 A design for failure 179 ■ Modeling failures as events 181 Composing our happy path across jobs 183 8.3 Failure composition with Scalaz 184 Planning for failure 184 ■ Setting up our Scala project 186 From Java to Scala 187 ■ Better failure handling through Scalaz 189 ■ Composing failures 191 8.4 Implementing railway-oriented processing 196 Introducing railway-oriented processing 196 ■ Building the railway 199 9 Commands 208 9.1 Commands and the unified log 209 Events and commands 209 ■ Implicit vs. explicit commands 210 ■ Working with commands in a unified log 212 9.2 Making decisions 213 Introducing commands at Plum 213 ■ Modeling commands 214 Writing our alert schema 216 ■ Defining our alert schema 218 9.3 Consuming our commands 219 The right tool for the job 219 ■ Reading our commands 220 Parsing our commands 221 ■ Stitching it all together 224 Testing 224 9.4 Executing our commands 226 Signing up for Mailgun 226 ■ Completing our executor 226 Final testing 230 9.5 Scaling up commands 231 One stream of commands, or many? 231 ■ Handling command- execution failures 231 ■ Command hierarchies 233
📄 Page 12
CONTENTS xiPART 3 EVENT ANALYTICS ...........................................235 10 Analytics-on-read 237 10.1 Analytics-on-read, analytics-on-write 238 Analytics-on-read 238 ■ Analytics-on-write 239 Choosing an approach 240 10.2 The OOPS event stream 242 Delivery truck events and entities 242 ■ Delivery driver events and entities 243 ■ The OOPS event model 243 The OOPS events archive 245 10.3 Getting started with Amazon Redshift 246 Introducing Redshift 246 ■ Setting up Redshift 248 Designing an event warehouse 251 ■ Creating our fat events table 255 10.4 ETL, ELT 256 Loading our events 256 ■ Dimension widening 259 A detour on data volatility 263 10.5 Finally, some analysis 264 Analysis 1: Who does the most oil changes? 264 Analysis 2: Who is our most unreliable customer? 265 11 Analytics-on-write 268 11.1 Back to OOPS 269 Kinesis setup 269 ■ Requirements gathering 271 Our analytics-on-write algorithm 272 11.2 Building our Lambda function 276 Setting up DynamoDB 276 ■ Introduction to AWS Lambda 277 Lambda setup and event modeling 279 ■ Revisiting our analytics- on-write algorithm 281 ■ Conditional writes to DynamoDB 286 Finalizing our Lambda 289 11.3 Running our Lambda function 290 Deploying our Lambda function 290 ■ Testing our Lambda function 293 appendix AWS primer 297 index 309
📄 Page 13
(This page has no text content)
📄 Page 14
preface A continuous stream of real-world and digital events already power the company where you work, even though you probably don’t think in those terms. Instead, you likely think about your daily work in terms of the people or things that you interact with, the software or hardware you use to get stuff done, or your own microcosm of a to-do list of tasks. Computers can’t think like this! Instead, computers see a company as an organiza- tion that generates a response to a continuous stream of events. We believe that reframing your business in terms of a continuous stream of events offers huge bene- fits. This is a young but hugely important field, and there is a lot still to discuss. Event Streams in Action is all about events: how to define events, how to send streams of events into unified log technologies like Apache Kafka and Amazon Kinesis, and how to write applications that process those event streams. We’re going to cover a lot of ground in this book: Kafka and Kinesis, stream processing frameworks like Samza and Spark Streaming, event-friendly databases like Amazon Redshift, and more. This book will give you confidence to identify, model, and process event streams wherever you find them—and we guarantee that by the end of this book, you will be seeing event streams everywhere! Above all, we hope that this book acts as a spring- board for a broader conversation about how we, as software engineers, should work with events.xiii
📄 Page 15
acknowledgments I would like to thank my wife Charis for her support through the long process of writ- ing this book, as well as my parents for their lifelong encouragement. And many thanks to my cofounder at Snowplow Analytics, Yali Sassoon, for giving me the “air cover” to work on this book even while we were trying to get our tech startup off the ground. On the Manning side, I will always be appreciative to commissioning editor Frank Pohlmann for believing I had a book in me. Thanks too to Cynthia Kane, Jennifer Stout, and Rebecca Rinehart for their patience and support through the difficult and lengthy gestation. I am grateful to my coauthor, Valentin Crettaz, for his contributions and his laser focus on getting this book completed. Special thanks also to all the reviewers whose feedback and insight greatly helped to improve this book, including Alex Nelson, Alexander Myltsev, Azatar Solowiej, Bachir Chihani, Charles Chan, Chris Snow, Cosimo Attanasi, Earl Bingham, Ernesto Garcia, Gerd Klevesaat, Jeff Lim, Jerry Tan, Lourens Steyn, Miguel Eduardo Gil Biraud, Nat Luengnaruemitchai, Odysseas Pentakalos, Rodrigo Abreu, Roger Meli, Sanket Naik, Shobha Iyer, Sumit Pal, Thomas Lockney, Thorsten Weber, Tischliar Ronald, Tomasz Borek, and Vitaly Bragilevsky. Finally, I’d like to thank Jay Kreps, CEO of Confluent and creator of Apache Kafka, for his monograph “The Log,” published back in December 2013, which started me on the journey of writing this book in addition to informing so much of my work at Snowplow. —ALEXANDER DEANxiv
📄 Page 16
ACKNOWLEDGMENTS xvFirst and foremost, I’d like to thank my family for having to deal daily with a father and husband who is so passionate about his work that he sometimes (read: often) for- gets to give his keyboards and mice a break. I would never have been able to fulfill my dreams without your unconditional support and understanding. I’ve worked with Manning on many different book projects over a long period of time now. But this one was special—not only a nice technological adventure, but also a human one. I can’t emphasis the human part enough, as writing books is not only about content, grammar rules, typos, and phrasing, but also about collaborating and empathizing with human beings, understanding their context and their sensibilities, and sharing one chapter of your life with them. For all this, I’d like to thank Michael Stephens, Jennifer Stout, and Rebecca Rinehart for taking the time and effort to per- suade me to take on this project. It wasn’t easy (it never is and never should be), but it was a great deal of fun and highly instructive. Finally, I’d like to thank Alex for being such a good writer and for always managing to mix an entertaining writing style with illustrative examples and figures to make complex subjects and concepts easy for the reader to grasp. —VALENTIN CRETTAZ
📄 Page 17
about this book Writing real-world applications in a data-rich environment can feel like being caught in the cross fire of a paintball battle. Any action may require you to combine event streams, batch archives, and live user or system requests in real time. Unified log pro- cessing is a coherent data processing architecture designed to encompass batch and near-real-time stream data, event logging and aggregation, and data processing on the resulting unified event stream. By efficiently creating a single log of events from multi- ple data sources, unified log processing makes it possible to design large-scale data- driven applications that are easier to design, deploy, and maintain. Who should read this book This book is written for readers who have experience writing some Java code. Scala and Python experience may be helpful to understanding some concepts in the book but is not required. How this book is organized: a roadmap This book has 11 chapters divided into three parts. Part 1 defines event streams and unified logs, providing a wide-ranging look: ■ Chapter 1 provides a ground-level foundation by offering definitions and exam- ples of events and continuous event streams, and takes a brief look at unifying event streams with a unified log. ■ Chapter 2 dives deep into the key attributes of a unified log, and walks you through setting up, sending, and reading events in Apache Kafka.xvi
📄 Page 18
ABOUT THIS BOOK xvii■ Chapter 3 introduces event stream processing, and how to write applications that process individual events while also validating and enriching events. ■ Chapter 4 focuses on event stream processing with Amazon Kinesis, a fully man- aged unified log service. ■ Chapter 5 looks at stateful stream processing, using the most popular stream processing frameworks to process multiple events from a stream-using state. Part 2 dives deep into the quality of events being fed into a unified log: ■ Chapter 6 covers event schemas and schema technologies, focusing on using Apache Avro to represent self-describing events. ■ Chapter 7 covers event archiving, providing a deep look into why archiving a unified log is so important and the best practices for doing so. ■ Chapter 8 looks at how to handle failure in Unix programs, Java exceptions, and error logging, and how to design for failure inside and across stream pro- cessing applications. ■ Chapter 9 covers the role of commands in the unified log, using Apache Avro to define schemas and process commands. Part 3 takes an analysis-first look at the unified log, leading with the two main method- ologies for unified log analytics, and then applying various database and stream pro- cessing technologies to analyze our event streams: ■ Chapter 10 uses Amazon Redshift, a horizontally scalable columnar database, to cover analytics-on-read versus analytics-on-write and techniques for storing and widening events. ■ Chapter 11 provides simple algorithms for analytics-on-write event streams, and will allow you to deploy and test an AWS Lambda function. About the code This book contains many examples of source code, both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Sometimes code is also in bold to high- light code that has changed from previous steps in the chapter, such as when a new feature adds to an existing line of code. In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and listings include line-continuation markers (➥). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts. Source code for the examples in this book is available for download from the pub- lisher’s website at www.manning.com/books/event-streams-in-action.
📄 Page 19
ABOUT THIS BOOKxviiiliveBook discussion forum Purchase of Event Streams in Action includes free access to a private web forum run by Manning Publications, where you can make comments about the book, ask technical questions, and receive help from the authors and from other users. To access the forum, go to https://livebook.manning.com/#!/book/event-streams-in-action/discus- sion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/#!/discussion. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We sug- gest you try asking the author some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.
📄 Page 20
about the authors ALEXANDER DEAN is cofounder and technical lead of Snowplow Analytics, an open source event processing and analytics platform. VALENTIN CRETTAZ is an independent IT consultant who’s been working for the past 25 years on many challenging projects across the globe. His expertise ranges from software engineering and architecture to data science and business intelligence. His daily job boils down to using the latest and most cutting-edge web, data, and stream- ing technologies to implement IT solutions that will help reduce the cultural gap between IT and business people.xix
The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00
Total Amount (¥)
0
Donation Count

Login to support the author

Login Now
Back to List