M A N N I N G Dylan Scott Viktor Gamov Dave Klein Foreword by Jun Rao
Kafka in Action DYLAN SCOTT VIKTOR GAMOV AND DAVE KLEIN FOREWORD BY JUN RAO M A N N I N G SHELTER ISLAND
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2022 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. The author and publisher have made every effort to ensure that the information in this book was correct at press time. The author and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Manning Publications Co. Development editor: Toni Arritola 20 Baldwin Road Technical development editors: Raphael Villela, Nickie Buckner PO Box 761 Review editor: Aleksandar Dragosavljević Shelter Island, NY 11964 Production editor: Andy Marinkovich Copy editor: Frances Buran Proofreader: Katie Tennant Technical proofreaders: Felipe Esteban Vildoso Castillo, Mayur Patil, Sumant Tambe, Valentin Crettaz, and William Rudenmalm Typesetter and cover designer: Marija Tudor ISBN 9781617295232 Printed in the United States of America
Dylan: I dedicate this work to Harper, who makes me so proud every day, and to Noelle, who brings even more joy to our family every day. I would also like to dedicate this book to my parents, sister, and wife, who are always my biggest supporters. Viktor: I dedicate this work to my wife, Maria, for her support during the process of writing this book. It’s a time-consuming task, time that I needed to carve out here and there. Without your encouragement, nothing would have ever happened. I love you. Also, I would like to dedicate this book to (and thank) my children, Andrew and Michael, for being so naïve and straightforward. When people asked where daddy is working, they would say, “Daddy is working in Kafka.” Dave: I dedicate this work to my wife, Debbie, and our children, Zachary, Abigail, Benjamin, Sarah, Solomon, Hannah, Joanna, Rebekah, Susanna, Noah, Samuel, Gideon, Joshua, and Daniel. Ultimately, everything I do, I do for the honor of my Creator and Savior, Jesus Christ.
brief contents PART 1 GETTING STARTED .......................................................... 1 1 ■ Introduction to Kafka 3 2 ■ Getting to know Kafka 17 PART 2 APPLYING KAFKA .......................................................... 41 3 ■ Designing a Kafka project 43 4 ■ Producers: Sourcing data 66 5 ■ Consumers: Unlocking data 87 6 ■ Brokers 111 7 ■ Topics and partitions 129 8 ■ Kafka storage 144 9 ■ Management: Tools and logging 158 PART 3 GOING FURTHER ........................................................ 179 10 ■ Protecting Kafka 181 11 ■ Schema registry 197 12 ■ Stream processing with Kafka Streams and ksqlDB 209 vii
contents foreword xv preface xvi acknowledgments xviii about this book xx about the authors xxiii about the cover illustration xxiv PART 1 GETTING STARTED ............................................... 1 1 Introduction to Kafka 3 1.1 What is Kafka? 4 1.2 Kafka usage 8 Kafka for the developer 8 ■ Explaining Kafka to your manager 9 1.3 Kafka myths 10 Kafka only works with Hadoop® 10 ■ Kafka is the same as other message brokers 11 1.4 Kafka in the real world 11 Early examples 12 ■ Later examples 13 ■ When Kafka might not be the right fit 14 1.5 Online resources to get started 15 References 15ix
CONTENTSx2 Getting to know Kafka 17 2.1 Producing and consuming a message 18 2.2 What are brokers? 18 2.3 Tour of Kafka 23 Producers and consumers 23 ■ Topics overview 26 ZooKeeper usage 27 ■ Kafka’s high-level architecture 28 The commit log 29 2.4 Various source code packages and what they do 30 Kafka Streams 30 ■ Kafka Connect 31 ■ AdminClient package 32 ■ ksqlDB 32 2.5 Confluent clients 33 2.6 Stream processing and terminology 36 Stream processing 37 ■ What exactly-once means 38 References 39 PART 2 APPLYING KAFKA ........................................... 41 3 Designing a Kafka project 43 3.1 Designing a Kafka project 44 Taking over an existing data architecture 44 ■ A first change 44 Built-in features 44 ■ Data for our invoices 47 3.2 Sensor event design 49 Existing issues 49 ■ Why Kafka is the right fit 51 Thought starters on our design 52 ■ User data requirements 53 High-level plan for applying our questions 54 ■ Reviewing our blueprint 57 3.3 Format of your data 57 Plan for data 58 ■ Dependency setup 59 References 64 4 Producers: Sourcing data 66 4.1 An example 67 Producer notes 70 4.2 Producer options 70 Configuring the broker list 71 ■ How to go fast (or go safer) 72 Timestamps 74
CONTENTS xi4.3 Generating code for our requirements 76 Client and broker versions 84 References 85 5 Consumers: Unlocking data 87 5.1 An example 88 Consumer options 89 ■ Understanding our coordinates 92 5.2 How consumers interact 96 5.3 Tracking 96 Group coordinator 98 ■ Partition assignment strategy 100 5.4 Marking our place 101 5.5 Reading from a compacted topic 103 5.6 Retrieving code for our factory requirements 103 Reading options 103 ■ Requirements 105 References 108 6 Brokers 111 6.1 Introducing the broker 111 6.2 Role of ZooKeeper 112 6.3 Options at the broker level 113 Kafka’s other logs: Application logs 115 ■ Server log 115 Managing state 116 6.4 Partition replica leaders and their role 117 Losing data 119 6.5 Peeking into Kafka 120 Cluster maintenance 121 ■ Adding a broker 122 Upgrading your cluster 122 ■ Upgrading your clients 122 Backups 123 6.6 A note on stateful systems 123 6.7 Exercise 125 References 126 7 Topics and partitions 129 7.1 Topics 129 Topic-creation options 132 ■ Replication factors 134
CONTENTSxii7.2 Partitions 134 Partition location 135 ■ Viewing our logs 136 7.3 Testing with EmbeddedKafkaCluster 137 Using Kafka Testcontainers 138 7.4 Topic compaction 139 References 142 8 Kafka storage 144 8.1 How long to store data 145 8.2 Data movement 146 Keeping the original event 146 ■ Moving away from a batch mindset 146 8.3 Tools 147 Apache Flume 147 Red Hat® Debezium™ 149 ■ Secor 149 ■ Example use case for data storage 150 8.4 Bringing data back into Kafka 151 Tiered storage 152 8.5 Architectures with Kafka 152 Lambda architecture 153 ■ Kappa architecture 154 8.6 Multiple cluster setups 155 Scaling by adding clusters 155 8.7 Cloud- and container-based storage options 155 Kubernetes clusters 156 References 156 9 Management: Tools and logging 158 9.1 Administration clients 159 Administration in code with AdminClient 159 ■ kcat 161 Confluent REST Proxy API 162 9.2 Running Kafka as a systemd service 163 9.3 Logging 164 Kafka application logs 164 ■ ZooKeeper logs 166 9.4 Firewalls 166 Advertised listeners 167
CONTENTS xiii9.5 Metrics 167 JMX console 167 9.6 Tracing option 170 Producer logic 171 ■ Consumer logic 172 ■ Overriding clients 173 9.7 General monitoring tools 174 References 176 PART 3 GOING FURTHER ......................................... 179 10 Protecting Kafka 181 10.1 Security basics 183 Encryption with SSL 183 ■ SSL between brokers and clients 184 SSL between brokers 187 10.2 Kerberos and the Simple Authentication and Security Layer (SASL) 187 10.3 Authorization in Kafka 189 Access control lists (ACLs) 189 ■ Role-based access control (RBAC) 190 10.4 ZooKeeper 191 Kerberos setup 191 10.5 Quotas 191 Network bandwidth quota 192 ■ Request rate quotas 193 10.6 Data at rest 194 Managed options 194 References 195 11 Schema registry 197 11.1 A proposed Kafka maturity model 198 Level 0 198 ■ Level 1 199 ■ Level 2 199 ■ Level 3 200 11.2 The Schema Registry 200 Installing the Confluent Schema Registry 201 ■ Registry configuration 201 11.3 Schema features 202 REST API 202 ■ Client library 203
CONTENTSxiv11.4 Compatibility rules 205 Validating schema modifications 205 11.5 Alternative to a schema registry 207 References 208 12 Stream processing with Kafka Streams and ksqlDB 209 12.1 Kafka Streams 210 KStreams API DSL 211 ■ KTable API 215 ■ GlobalKTable API 216 ■ Processor API 216 ■ Kafka Streams setup 218 12.2 ksqlDB: An event-streaming database 219 Queries 220 ■ Local development 220 ■ ksqlDB architecture 222 12.3 Going further 223 Kafka Improvement Proposals (KIPs) 223 ■ Kafka projects you can explore 223 ■ Community Slack channel 224 References 224 appendix A Installation 227 appendix B Client example 234 index 239
xv foreword Beginning with its first release in 2011, Apache Kafka® has helped create a new cate- gory of data-in-motion systems, and it’s now the foundation of countless modern event- driven applications. This book, Kafka in Action, written by Dylan Scott, Viktor Gamov, and Dave Klein, equips you with the skills to design and implement event-based appli- cations built on Apache Kafka. The authors have had many years of real-world experi- ence using Kafka, and this book’s on-the-ground feel really sets it apart. Let’s take a moment to ask the question, “Why do we need Kafka in the first place?” Historically, most applications were built on data-at-rest systems. When some interest- ing events happened in the world, they were stored in these systems immediately, but the utilization of those events happened later, either when the user explicitly asked for the information, or from some batch-processing jobs that would eventually kick in. With data-in-motion systems, applications are built by predefining what they want to do when new events occur. When new events happen, they are reflected in the applica- tion automatically in near-real time. Such event-driven applications are appealing because they allow enterprises to derive new insights from their data much quicker. Switching to event-driven applications requires a change of mindset, however, which may not always be easy. This book offers a comprehensive resource for understanding event-driven thinking, along with realistic hands-on examples for you to try out. Kafka in Action explains how Kafka works, with a focus on how a developer can build end-to-end event-driven applications with Kafka. You’ll learn the components needed to build a basic Kafka application and also how to create more advanced appli- cations using libraries such as Kafka Streams and ksqlDB. And once your application is built, this book also covers how to run it in production, including key topics such as monitoring and security. I hope that you enjoy this book as much as I have. Happy event streaming! —JUN RAO, CONFLUENT COFOUNDER
preface One of the questions we often get when talking about working on a technical book is, why the written format? For Dylan, at least, reading has always been part of his pre- ferred learning style. Another factor is the nostalgia in remembering the first practical programming book he ever really read, Elements of Programming with Perl by Andrew L. Johnson (Manning, 2000). The content was something that registered with him, and it was a joy to work through each page with the other authors. We hope to capture some of that practical content regarding working with and reading about Apache Kafka. The excitement of learning something new touched each of us when we started to work with Kafka for the first time. In our opinion, Kafka was unlike any other message broker or enterprise service bus (ESB) that we had used before. The speed to get started developing producers and consumers, the ability to reprocess data, and the pace of independent consumers moving quickly without removing the data from other consumer applications were options that solved pain points we had seen in past development and impressed us most as we started looking at Kafka. We see Kafka as changing the standard for data platforms; it can help move batch and ETL workflows near real-time data feeds. Because this foundation is likely a shift from past data architectures that many enterprise users are familiar with, we wanted to take a user with no prior knowledge of Kafka and develop their ability to work with Kafka producers and consumers, and perform basic Kafka developer and administra- tive tasks. By the end of this book, we hope you will feel comfortable digging into more advanced Kafka topics such as cluster monitoring, metrics, and multi-site data replication with your new core Kafka knowledge. xvi
PREFACE xvii Always remember, this book captures a moment in time of how Kafka looks today. It will likely change and, hopefully, get even better by the time you read this work. We hope this book sets you up for an enjoyable path of learning about the foundations of Apache Kafka.
acknowledgments DYLAN: I would like to acknowledge first, my family: thank you. The support and love shown every day is something that I can never be thankful enough for—I love you all. Dan and Debbie, I appreciate that you have always been my biggest supporters and number one fans. Sarah, Harper, and Noelle, I can’t do justice in these few words to the amount of love and pride I have for you all and the support you have given me. To the DG family, thanks for always being there for me. Thank you, as well, JC. Also, a special thanks to Viktor Gamov and Dave Klein for being coauthors of this work! I also had a team of work colleagues and technical friends that I need to men- tion that helped motivate me to move this project forward: Team Serenity (Becky Campbell, Adam Doman, Jason Fehr, and Dan Russell), Robert Abeyta, and Jeremy Castle. And thank you, Jabulani Simplisio Chibaya, for not only reviewing, but for your kind words. VIKTOR: I would like to acknowledge my wife and thank her for all her support. Thanks also go to the Developer Relations and Community Team at Confluent: Ale Murray, Yeva Byzek, Robin Moffatt, and Tim Berglund. You are all doing incredible work for the greater Apache Kafka community! DAVE: I would like to acknowledge and thank Dylan and Viktor for allowing me to tag along on this exciting journey. The group would like to acknowledge our editor at Manning, Toni Arritola, whose experience and coaching helped make this book a reality. Thanks also go to Kristen Watterson, who was the first editor before Toni took over, and to our technical editors, Raphael Villela, Nickie Buckner, Felipe Esteban Vildoso Castillo, Mayur Patil, Valentin Crettaz, and William Rudenmalm. We also express our gratitude to Chuck Larson for the immense help with the graphics, and to Sumant Tambe for the technical proof- read of the code.xviii
ACKNOWLEDGMENTS xix The Manning team helped in so many ways, from production to promotion—a helpful team. With all the edits, revisions, and deadlines involved, typos and issues can still make their way into the content and source code (at least we haven’t ever seen a book without errata!), but this team certainly helped to minimize those errors. Thanks go also to Nathan Marz, Michael Noll, Janakiram MSV, Bill Bejeck, Gunnar Morling, Robin Moffatt, Henry Cai, Martin Fowler, Alexander Dean, Valentin Crettaz and Anyi Li. This group was so helpful in allowing us to talk about their work, and pro- viding such great suggestions and feedback. Jun Rao, we are honored that you were willing to take the time to write the fore- word to this book. Thank you so much! We owe a big thank you to the entire Apache Kafka community (including, of course, Jay Kreps, Neha Narkhede, and Jun Rao) and the team at Confluent that pushes Kafka forward and allowed permission for the material that helped inform this book. At the very least, we can only hope that this work encourages developers to take a look at Kafka. Finally, to all the reviewers: Bryce Darling, Christopher Bailey, Cicero Zandona, Conor Redmond, Dan Russell, David Krief, Felipe Esteban Vildoso Castillo, Finn Newick, Florin-Gabriel Barbuceanu, Gregor Rayman, Jason Fehr, Javier Collado Cabeza, Jon Moore, Jorge Esteban Quilcate Otoya, Joshua Horwitz, Madhanmohan Savadamuthu, Michele Mauro, Peter Perlepes, Roman Levchenko, Sanket Naik, Shobha Iyer, Sumant Tambe, Viton Vitanis, and William Rudenmalm—your sugges- tions helped make this a better book. It is likely we are leaving some names out and, if so, we can only ask you to forgive us for our error. We do appreciate you.
Comments 0
Loading comments...
Reply to Comment
Edit Comment