📄 Page
1
Catherine Nelson Software Engineering for Data Scientists From Notebooks to Scalable Systems
📄 Page
2
DATA “Catherine’s book demystifies how to scale your individual work to production capacity. Whether you are a data scientist, developer, or executive, she makes data services at scale accessible.” —Carol Willing Core Developer of Python and 2017 ACM Software System Award recipient for Jupyter’s lasting influence “This book...offers a clear, actionable guide that fills the crucial skill gap many data scientists face in software engineering, elevating their coding practices to new heights.” —Gabriela de Queiroz Director of AI, Microsoft; Startup Advisor and Angel Investor Software Engineering for Data Scientists linkedin.com/company/oreilly-media youtube.com/oreillymedia Data science happens in code. The ability to write reproducible, robust, scalable code is key to a data science project’s success— and is absolutely essential for those working with production code. This practical book bridges the gap between data science and software engineering, and clearly explains how to apply the best practices from software engineering to data science. Examples are provided in Python, drawn from popular packages such as NumPy and pandas. If you want to write better data science code, this guide covers the essential topics that are often missing from introductory data science or coding classes, including how to: • Understand data structures and object-oriented programming • Clearly and skillfully document your code • Package and share your code • Integrate data science code with a larger code base • Learn how to write APIs • Create secure code • Apply best practices to common tasks such as testing, error handling, and logging • Work more effectively with software engineers • Write more efficient, maintainable, and robust code in Python • Put your data science projects into production • And more Catherine Nelson is a freelance data scientist and writer. Previously, she was a Principal Data Scientist at SAP Concur, where she developed production machine learning applications and created innovative new business travel features. She’s also coauthor of O’Reilly’s Building Machine Learning Pipelines. US $69.99 CAN $87.99 ISBN: 978-1-098-13620-8
📄 Page
3
Praise for Software Engineering for Data Scientists This book is the missing link data scientists have long sought, masterfully bridging the gap between data science and software engineering. It offers a clear, actionable guide that fills the crucial skill gap many data scientists face in software engineering, elevating their coding practices to new heights. Truly, this is the book we’ve been waiting for. —Gabriela de Queiroz, Director of AI, Microsoft; Startup Advisor and Angel Investor Catherine’s book demystifies how to scale your individual work to production capacity. Whether you are a data scientist, developer, or executive, she makes data services at scale accessible. From startup to massive corporate data, following her best practices will set your data projects up for success. —Carol Willing, Core Developer of Python; 2017 ACM Software System Award recipient for Jupyter’s lasting influence I love this book! It’s the missing piece on every data scientist’s shelf. For years, bootcamps, universities, and industry managers have been trying to get skilled scientists to function more like software engineers. No book bridges that gap, until this one. —Shawn Ling Ramirez, CEO, eloraHQ Software Engineering for Data Scientists is a must read if you want to take your data science skills from ideas to fully implemented systems. It’s a terrific guide to help you through the most important engineering aspects of coding. I wish I’d had this book years ago, it would have saved me countless hours! I thoroughly recommend it. —Laurence Moroney, AI Advocacy Lead, Google
📄 Page
4
Since its beginnings, data scientists have come from a wide variety of backgrounds in education and experience. While in many ways this has been a strength of the field, often data scientists lack the software engineering skills to work closely with peers from more traditional software development backgrounds. In this book, Catherine Nelson provides a much-needed bridge between the two disciplines, giving data scientists the knowledge to level up their own work and impact. —Chris Albon, Director of Machine Learning, The Wikimedia Foundation
📄 Page
5
Catherine Nelson Software Engineering for Data Scientists From Notebooks to Scalable Systems Boston Farnham Sebastopol TokyoBeijing
📄 Page
6
978-1-098-13620-8 [LSI] Software Engineering for Data Scientists by Catherine Nelson Copyright © 2024 Catherine Nelson. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Butterfield Development Editor: Virginia Wilson Production Editor: Christopher Faucher Copyeditor: Piper Editorial Consulting, LLC Proofreader: Krsta Technology Solutions Indexer: WordCo Indexing Services, Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea April 2024: First Edition Revision History for the First Edition 2024-04-16: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098136208 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Software Engineering for Data Scien‐ tists, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
📄 Page
7
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1. What Is Good Code?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Why Good Code Matters 1 Adapting to Changing Requirements 2 Simplicity 3 Don’t Repeat Yourself (DRY) 4 Avoid Verbose Code 6 Modularity 6 Readability 7 Standards and Conventions 8 Names 9 Cleaning up 9 Documentation 9 Performance 10 Robustness 10 Errors and Logging 10 Testing 11 Key Takeaways 12 2. Analyzing Code Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Methods to Improve Performance 14 Timing Your Code 15 Profiling Your Code 18 cProfile 18 line_profiler 21 Memory Profiling with Memray 22 Time Complexity 23 v
📄 Page
8
How to Estimate Time Complexity 24 Big O Notation 25 Key Takeaways 27 3. Using Data Structures Effectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Native Python Data Structures 30 Lists 30 Tuples 32 Dictionaries 32 Sets 34 NumPy Arrays 35 NumPy Array Functionality 35 NumPy Array Performance Considerations 36 Array Operations Using Dask 39 Arrays in Machine Learning 41 pandas DataFrames 42 DataFrame Functionality 42 DataFrame Performance Considerations 43 Key Takeaways 45 4. Object-Oriented Programming and Functional Programming. . . . . . . . . . . . . . . . . . . . . 47 Object-Oriented Programming 48 Classes, Methods, and Attributes 48 Defining Your Own Classes 51 OOP Principles 53 Functional Programming 56 Lambda Functions and map() 57 Applying Functions to DataFrames 58 Which Paradigm Should I Use? 59 Key Takeaways 59 5. Errors, Logging, and Debugging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Errors in Python 61 Reading Python Error Messages 61 Handling Errors 63 Raising Errors 65 Logging 67 What to Log 67 Logging Configuration 68 How to Log 69 Debugging 71 Strategies for Debugging 71 vi | Table of Contents
📄 Page
9
Tools for Debugging 72 Key Takeaways 77 6. Code Formatting, Linting, and Type Checking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Code Formatting and Style Guides 80 PEP8 81 Import Formatting 82 Automatic Code Formatting with Black 83 Linting 85 Linting Tools 86 Linting in Your IDE 88 Type Checking 89 Type Annotations 89 Type Checking with mypy 91 Key Takeaways 92 7. Testing Your Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Why You Should Write Tests 94 When to Test 95 How to Write and Run Tests 96 A Basic Test 96 Testing Unexpected Inputs 98 Running Automated Tests with Pytest 99 Types of Tests 101 Unit Tests 102 Integration Tests 102 Data Validation 103 Data Validation Examples 104 Using Pandera for Data Validation 104 Data Validation with Pydantic 105 Testing for Machine Learning 107 Testing Model Training 108 Testing Model Inference 109 Key Takeaways 109 8. Design and Refactoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Project Design and Structure 112 Project Design Considerations 112 An Example Machine Learning Project 113 Code Design 115 Modular Code 116 A Code Design Framework 117 Table of Contents | vii
📄 Page
10
Interfaces and Contracts 117 Coupling 118 From Notebooks to Scalable Scripts 120 Why Use Scripts Instead of Notebooks? 120 Creating Scripts from Notebooks 121 Refactoring 124 Strategies for Refactoring 124 An Example Refactoring Workflow 125 Key Takeaways 127 9. Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Documentation Within the Codebase 130 Names 131 Comments 133 Docstrings 134 Readmes, Tutorials, and Other Longer Documents 136 Documentation in Jupyter Notebooks 137 Documenting Machine Learning Experiments 139 Key Takeaways 141 10. Sharing Your Code: Version Control, Dependencies, and Packaging. . . . . . . . . . . . . . . 143 Version Control Using Git 143 How Does Git Work? 144 Tracking Changes and Committing 145 Remote and Local 147 Branches and Pull Requests 148 Dependencies and Virtual Environments 151 Virtual Environments 152 Managing Dependencies with pip 154 Managing Dependencies with Poetry 155 Python Packaging 157 Packaging Basics 158 pyproject.toml 159 Building and Uploading Packages 160 Key Takeaways 162 11. APIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Calling an API 164 HTTP Methods and Status Codes 164 Getting Data from the SDG API 165 Creating Your Own API Using FastAPI 168 Setting Up the API 169 viii | Table of Contents
📄 Page
11
Adding Functionality to Your API 172 Making Requests to Your API 175 Key Takeaways 177 12. Automation and Deployment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Deploying Code 180 Automation Examples 181 Pre-Commit Hooks 181 GitHub Actions 184 Cloud Deployments 189 Containers and Docker 190 Building a Docker Container 190 Deploying an API on Google Cloud 192 Deploying an API on Other Cloud Providers 194 Key Takeaways 194 13. Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 What Is Security? 197 Security Risks 199 Credentials, Physical Security, and Social Engineering 199 Third-Party Packages 200 The Python Pickle Module 200 Version Control Risks 201 API Security Risks 201 Security Practices 202 Security Reviews and Policies 202 Secure Coding Tools 202 Simple Code Scanning 203 Security for Machine Learning 205 Attacks on ML Systems 205 Security Practices for ML Systems 208 Key Takeaways 208 14. Working in Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Development Principles and Practices 211 The Software Development Lifecycle 211 Waterfall Software Development 212 Agile Software Development 213 Agile Data Science 214 Roles in the Software Industry 215 Software Engineer 215 QA or Test Engineer 216 Table of Contents | ix
📄 Page
12
Data Engineer 217 Data Analyst 218 Product Manager 218 UX Researcher 219 Designer 220 Community 220 Open Source 221 Speaking at Events 222 The Python Community 223 Key Takeaways 224 15. Next Steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 The Future of Code 226 Your Future in Code 229 Thank You 230 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 x | Table of Contents
📄 Page
13
Preface Data science happens in code. Whether you’re building a machine learning system, exploring your data for the first time, visualizing the distribution of your data, or running a statistical analysis, your coding and computation skills are what make it happen. If you are working on production code, these skills are essential for writing successful, maintainable code. Even if you aren’t working in a production software team, you’ll find it beneficial to write more robust, reproducible code that other data scientists can use easily. And if you’re working alone, good practices will accelerate your coding and help you pick up your code after a break. I didn’t always see the value of good engineering. Earlier in my data science career, I joined a team where I was the only data scientist. My teammates were software engi‐ neers and designers, and I was concerned that it would be hard to increase my skills with no other data scientists to learn from. I expressed my concern to my coworker, a developer. He said, “But learning to write better code will let you do more data sci‐ ence.” This comment stuck with me, and I’ve found since then that improving my software engineering skills has been incredibly beneficial in doing data science. It’s helped me write code that is easier for my coworkers to use and that is still easy to change when I go back to it many months later. My aim with this book is to guide you on your journey to writing better data science code. I’ll describe best practices for common tasks including testing, error handling, and logging. I’ll explain how to write code that is easier to maintain and that will remain robust as your projects grow. I’ll show you how to make your code easy for other people to use, and by the end of this book you’ll be able to integrate your data science code with a larger codebase. You might think that software engineering skills are less useful in the age of genera‐ tive AI. Can’t ChatGPT just write your code for you? I’d argue that the content in this book is still just as useful even when you can speed up your coding with an AI assis‐ tant. As I’ll show throughout this book, there are many choices available for every function you write, and it’s incredibly helpful to understand the principles for why xi
📄 Page
14
you might pick one line of code over another. You’ll need to evaluate the output of any AI assistant and check that it has made a good choice for the problem you’re working on. This book will help you do that. Who Is This Book For? This book is aimed at data scientists, but people working in closely related fields such as data analysts, machine learning (ML) engineers, and data engineers will also find it useful. I’ll explain well-established software engineering principles that will be useful to anyone who writes code, but the examples I’ll use to illustrate these principles will be most familiar to data scientists. I’ve aimed to make this book accessible to data scientists who are relatively new to the field. Maybe you’ve just finished a degree in data science or you’re starting your first job in industry. This book will cover the practical software engineering skills that are not always included in introductory data science courses. Or maybe you didn’t take a formal data science course. Maybe you’re self-taught or you’re moving into data sci‐ ence from math or another science. No matter which route you’re taking into data science, this book is for you. More experienced data scientists will also learn a great deal, and you’ll find this book especially useful if you’re in a job where you’ll often interact with software developers. You’ll learn the skills that will help you work effectively on a larger codebase and how to write Python code that will work efficiently in production. I’m assuming that you already know the fundamentals of data science, including data exploration, data visualization, data wrangling, basic ML, and the math skills that go along with these. I’m also assuming that you already know the basics of how to code in Python: how to write functions and control flow statements, and the basics of how to use modules including NumPy, Matplotlib, pandas, and scikit-learn. If these are new to you, I recommend the following books: • Python Data Science Handbook by Jake VanderPlas (O’Reilly, 2023) • Data Science From Scratch by Joel Grus (O’Reilly, 2019) • Learning Data Science by Sam Lau, Joseph Gonzalez, and Deborah Nolan (O’Reilly, 2023) This is not a book for software developers who are looking to learn data science and machine learning skills. If this is your situation, I recommend AI and Machine Learn‐ ing for Coders by Laurence Moroney (O’Reilly, 2020). xii | Preface
📄 Page
15
Software Engineering Versus Data Science It’s useful at this point to define what I see as the distinction between data science and software engineering mindsets. Data scientists generally come from a background that emphasizes the scientific processes of exploration, discovery, and hypothesis test‐ ing. The end result of a project is not known at the beginning. Software engineering, in contrast, is a process that focuses on planning what to build, designing the best way to build, then writing the code to build what was planned. The expected outcome of the project is known at the start of the project. Software engineering practices empha‐ size standardization and automation. Data scientists can use aspects of the engineer‐ ing mindset to improve the quality of their code, a subject I will discuss in detail in Chapter 1. Why Python? All the code examples in this book are written in Python, and many of the chapters describe Python-specific tools. In recent years, Python has become the most popular programming language for data science. The following quote is from a 2021 survey of over 3,000 data scientists carried out by Anaconda: “63% of respondents said they always or frequently use Python, making it the most popular language included in this year’s survey. In addition, 71% of educators are teaching Python, and 88% of students reported being taught Python in preparation to enter the data science/ML field.” Python has an extremely solid set of open source libraries for data science, with good backing and a healthy community of maintainers. Large trend-setting companies have chosen Python for their main ML frameworks, including TensorFlow (Google) and PyTorch (Meta). Because of this, Python appears to be especially popular among data scientists working on production machine learning code, where good coding skills are particularly important. In my experience, the Python community has been friendly and welcoming, with many excellent events that have helped me improve my skills. It’s my preferred pro‐ gramming language, so it was an easy choice for this book. What Is Not in This Book As I mentioned in “Who Is This Book For?” on page xii, this is not an introduction to data science or an introduction to programming. Additionally, none of the following topics appears in this book: Installing Python: I assume that you have already installed a recent version of Python (3.9 or later) and you have some form of IDE (integrated development environment) Preface | xiii
📄 Page
16
where you can write code, such as VS Code or PyCharm. I won’t describe how to install Python, but I will explain how to set up a virtual environment in Chapter 10. Other programming languages: This book covers only Python, for the reasons given in “Why Python?” on page xiii. I haven’t included any examples in R, Julia, SQL, MAT‐ LAB or any other language. Command line scripting: Command line or shell scripting is a powerful way to work with files and text. I don’t include it here because other sources cover it in great detail, including Data Science at the Command Line by Jeroen Janssens (O’Reilly, 2021). Advanced Python: The examples in this book contain relatively simple code. For cov‐ erage of more advanced Python coding, I recommend Robust Python by Patrick Via‐ fore (O’Reilly, 2021). Guide to This Book In this book, I start by walking through good practices at the level of writing individ‐ ual functions and go into detail about how you can improve your coding. In later chapters, I’ll describe how you can take that code and make it easy for someone else to use, and I’ll explain some common techniques for deployment and best practices for working in software. This book is divided into 14 chapters. Here is an overview of their contents: Chapter 1, “What Is Good Code?”, introduces the basics of how to write code that is simple, modular, readable, efficient, and robust. Chapter 2, “Analyzing Code Performance”, describes how to measure the perfor‐ mance of your code and discusses some options for making your data science code run more efficiently. Chapter 3, “Using Data Structures Effectively”, discusses the trade-offs involved in choosing the data structures you work with. The data structure you choose can make a huge difference to the efficiency of your code. Chapter 4, “Object-Oriented Programming and Functional Programming”, describes the basics of these styles of programming. Used correctly, they can help you write code that is well structured and efficient. Chapter 5, “Errors, Logging, and Debugging”, walks you through what to do when your code breaks, how to raise useful errors, and strategies to identify where those errors are coming from. Chapter 6, “Code Formatting, Linting, and Type Checking”, describes how to stand‐ ardize your code using tools that can automate this process. xiv | Preface
📄 Page
17
Chapter 7, “Testing Your Code”, covers how to make your code robust to changes in inputs through testing. This is a vital step in writing code that is easy to maintain. Chapter 8, “Design and Refactoring”, discusses how to structure your projects in a standardized, consistent way and how to go from a notebook to a script. Chapter 9, “Documentation”, shows you how to make your code readable for other people, including best practices for naming and commenting on your code. Chapter 10, “Sharing Your Code: Version Control, Dependencies, and Packaging”, covers the basics of version control using Git and how to manage your project’s dependencies in virtual environments. It also shows the steps involved in turning a script into a Python package. Chapter 11, “APIs”, introduces the concept of APIs, shows how you can use them, and includes a basic example using FastAPI. Chapter 12, “Automation and Deployment”, describes the basics of deploying code, how to automate your code deployments using CI/CD (Continuous Integration/ Continuous Deployment or Delivery) and GitHub Actions, and how to deploy your code to a cloud environment in a Docker container. Chapter 13, “Security”, discusses common security risks, how these risks can be miti‐ gated, and some of the security threats unique to machine learning. Chapter 14, “Working in Software”, introduces you to common practices in software development teams including Agile ways of working, describes common roles in soft‐ ware teams, and introduces the wider community. Chapter 15, “Next Steps”, wraps up with some thoughts on how coding might change in the future and some suggestions for what you can do next. Reading Order You don’t necessarily need to read the chapters in this book in order, but I recom‐ mend that you start by reading Chapter 1. In this chapter, I’ll explain the fundamen‐ tals of how to write good code, and I’ll introduce topics that I’ll cover in greater detail in the rest of the book. I’ll also introduce several of the code examples that I’ll use throughout the book. Following Chapter 1, many of the chapters can be read on their own, with these exceptions: • You should read Chapter 2 before reading Chapter 3. • You should read Chapters 6, 7, 10, and 11 before you read Chapter 12. Preface | xv
📄 Page
18
Some chapters include a section that goes deeper into a machine learning topic. These sections always include ML in the section name, and if your job doesn’t involve ML you can skip these sections without missing anything that you would need to understand the rest of the chapter. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. xvi | Preface
📄 Page
19
Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/catherinenelson1/SEforDS. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Software Engineering for Data Scientists by Catherine Nelson (O’Reilly). Copyright 2024 Catherine Nelson, 978-1-098-13620-8.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. How to Contact Us Please address comments and questions to sefordatascientists@gmail.com or to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 Preface | xvii
📄 Page
20
800-889-8969 (in the United States or Canada) 707-827-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://www.oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/software-engineering-data- scientists. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media Watch us on YouTube: https://youtube.com/oreillymedia Acknowledgments Sending a huge thank you to everyone who has helped me with this book! Your com‐ ments, feedback, discussions, and support have been so valuable. It’s been an absolute pleasure working with the team at O’Reilly. Thank you to Vir‐ ginia Wilson for being a superb, supportive editor. I really enjoyed working with you. Thank you to Nicole Butterfield for valuable overall direction and your help with the book proposal process. Thank you to Jeff Bleiel for thorough reviews of several of the chapters and Chris Faucher for making the production process go smoothly. Thank you so much to my technical reviewers William Jamir Silva, Ganesh Harke, Jo Stichbury, Antony Milne, Jess Males, and Swetha Kommuri. Your feedback was super constructive, and it’s made the final book so much better. I really appreciated your attention to detail and your helpful suggestions. Thank you to Rob Masson for great feedback on the final draft and thoughtful discussions throughout the writing process. Thank you to Carol Willing, Ricardo Martín Brualla, Chris Trudeau, Michelle Liu, Maryam Ehsani, Shivani Patel, John Sweet, Andy Ross, and Abigail Mesrenyame Dogbe for valuable technical discussions and insightful conversations. I’ve also bene‐ fited hugely from being part of the wider Python and PyLadies community; thank you to all the volunteers who give their time to it. Finally, thank you to my amazing friends and family for all your support. Rob, Mum, Richard, Lina, Salomé, Ricardo, Chris, Kiana, and Katie—I appreciate you all so much. xviii | Preface