📄 Page
1
Yaron Haviv & Noah Gift Implementing MLOps in the Enterprise A Production-First Approach
📄 Page
2
DATA “The authors excel in presenting complex concepts in a clear and relatable manner. Their emphasis on the importance of ROI, risk management, and strategic technology adoption provides practical guidance for organizations looking to leverage ML effectively.” —Dhanasekar Sundararaman Researcher, Microsoft Implementing MLOps in the Enterprise Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia This practical guide will help your organization bring data science to life for different real-world MLOps scenarios. Senior data scientists, MLOps engineers, and ML engineers will learn how to tackle challenges that prevent many businesses from moving ML models to production and scaling their AI initiatives. Authors Yaron Haviv and Noah Gift take a production-first approach. Rather than beginning with the ML model, you’ll learn how to design a continuous operational pipeline—while making sure that various components and practices can map into it. By automating as many components as possible, and making the process fast and repeatable, your pipeline can scale to match your organization’s needs. This book will show you how to generate rapid business value while answering dynamic MLOps requirements. You’ll learn the foundations of the MLOps process, including its technological and business value, and discover how to: • Build and structure effective MLOps pipelines • Efficiently scale MLOps across your organization • Explore common MLOps use cases • Build MLOps pipelines for hybrid deployments, real-time predictions, and composite AI • Prepare for and adapt to the future of MLOps • Use pretrained models like Hugging Face and OpenAI to complement your MLOps strategy Yaron Haviv is a serial entrepreneur with deep technological experience in data, cloud, AI, and networking. Yaron is the cofounder and CTO of Iguazio, which was acquired by McKinsey and Company in 2023. He is an author, keynote speaker, and contributor to various AI associations, publications, and communities. Noah Gift is the founder of Pragmatic AI Labs. He lectures in the data science programs at universities including Northwestern, Duke, UC Berkeley, UNC Charlotte, and the University of Tennessee. US $79.99 CAN $99.99 ISBN: 978-1-098-13658-1
📄 Page
3
Yaron Haviv and Noah Gift Implementing MLOps in the Enterprise A Production-First Approach Boston Farnham Sebastopol TokyoBeijing
📄 Page
4
978-1-098-13658-1 [LSI] Implementing MLOps in the Enterprise by Yaron Haviv and Noah Gift Copyright © 2024 Yaron Haviv and Noah Gift. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisition Editor: Nicole Butterfield Development Editor: Corbin Collins Production Editor: Beth Kelly Copyeditor: Piper Editorial Consulting, LLC Proofreader: Heather Walley Indexer: WordCo Indexing Services, Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea December 2023: First Edition Revision History for the First Edition 2023-11-30: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098136581 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Implementing MLOps in the Enterprise, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
📄 Page
5
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. MLOps: What Is It and Why Do We Need It?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 What Is MLOps? 2 MLOps in the Enterprise 2 Understanding ROI in Enterprise Solutions 3 Understanding Risk and Uncertainty in the Enterprise 5 MLOps Versus DevOps 6 What Isn’t MLOps? 8 Mainstream Definitions of MLOps 8 What Is ML Engineering? 9 MLOps and Business Incentives 10 MLOps in the Cloud 10 Key Cloud Development Environments 13 The Key Players in Cloud Computing 17 MLOps On-Premises 21 MLOps in Hybrid Environments 22 Enterprise MLOps Strategy 22 Conclusion 23 Critical Thinking Discussion Questions 24 Exercises 24 2. The Stages of MLOps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Getting Started 25 Choose Your Algorithm 26 Design Your Pipelines 28 iii
📄 Page
6
Data Collection and Preparation 29 Data Storage and Ingestion 30 Data Exploration and Preparation 33 Data Labeling 35 Feature Stores 36 Model Development and Training 38 Writing and Maintaining Production ML Code 39 Tracking and Comparing Experiment Results 42 Distributed Training and Hyperparameter Optimization 44 Building and Testing Models for Production 45 Deployment (and Online ML Services) 48 From Model Endpoints to Application Pipelines 49 Online Data Preparation 51 Continuous Model and Data Monitoring 52 Monitoring Data and Concept Drift 54 Monitoring Model Performance and Accuracy 57 The Strategy of Pretrained Models 58 Building an End-to-End Hugging Face Application 59 Flow Automation (CI/CD for ML) 61 Conclusion 64 Critical Thinking Discussion Questions 65 Exercises 65 3. Getting Started with Your First MLOps Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Identifying the Business Use Case and Goals 67 Finding the AI Use Case 69 Defining Goals and Evaluating the ROI 72 How to Build a Successful ML Project 74 Approving and Prototyping the Project 75 Scaling and Productizing Projects 76 Project Structure and Lifecycle 78 ML Project Example from A to Z 80 Exploratory Data Analysis 80 Data and Model Pipeline Development 82 Application Pipeline Development 84 Scaling and Productizing the Project 86 CI/CD and Continuous Operations 88 Conclusion 90 Critical Thinking Discussion Questions 90 Exercises 90 iv | Table of Contents
📄 Page
7
4. Working with Data and Feature Stores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Data Versioning and Lineage 92 How It Works 93 Common ML Data Versioning Tools 95 Data Preparation and Analysis at Scale 105 Structured and Unstructured Data Transformations 106 Distributed Data Processing Architectures 107 Interactive Data Processing 108 Batch Data Processing 110 Stream Processing 114 Stream Processing Frameworks 115 Feature Stores 117 Feature Store Architecture and Usage 118 Ingestion and Transformation Service 119 Feature Storage 120 Feature Retrieval (for Training and Serving) 121 Feature Stores Solutions and Usage Example 122 Using Feast Feature Store 123 Using MLRun Feature Store 126 Conclusion 130 Critical Thinking Discussion Questions 131 Exercises 131 5. Developing Models for Production. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 AutoML 133 Running, Tracking, and Comparing ML Jobs 136 Experiment Tracking 137 Saving Essential Metadata with the Model Artifacts 139 Comparing ML Jobs: An Example with MLflow 140 Hyperparameter Tuning 142 Auto-Logging 144 MLOps Automation: AutoMLOps 146 Example: Running and Tracking ML Jobs Using Azure Databricks 147 Handling Training at Scale 151 Building and Running Multi-Stage Workflows 152 Managing Computation Resources Efficiently 153 Conclusion 158 Critical Thinking Discussion Questions 158 Exercises 159 Table of Contents | v
📄 Page
8
6. Deployment of Models and AI Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Model Registry and Management 161 Solution Examples 163 SageMaker Example 163 MLflow Example 165 MLRun Example 166 Model Serving 168 Amazon SageMaker 170 Seldon Core 171 MLRun Serving 173 Advanced Serving and Application Pipelines 176 Implementing Scalable Application Pipelines 177 Model Routing and Ensembles 187 Model Optimization and ONNX 189 Data and Model Monitoring 190 Integrated Model Monitoring Solutions 192 Standalone Model Monitoring Solutions 197 Model Retraining 200 When to Retrain Your Models 201 Strategies for Data Retraining 202 Model Retraining in the MLOps Pipeline 203 Deployment Strategies 203 Measuring the Business Impact 206 Conclusion 206 Critical Thinking Discussion Questions 207 Exercises 207 7. Building a Production Grade MLOps Project from A to Z. . . . . . . . . . . . . . . . . . . . . . . . . 209 Exploratory Data Analysis 211 Interactive Data Preparation 220 Preparing the Credit Transaction Dataset 220 Preparing the User Events (Activities) Dataset 223 Extracting Labels and Training a Model 223 Data Ingestion and Preparation Using a Feature Store 224 Building the Credit Transactions Data Pipeline (Feature Set) 225 Building the User Events Data Pipeline (FeatureSet) 228 Building the Target Labels Data Pipeline (FeatureSet) 229 Ingesting Data into the Feature Store 229 Model Training and Validation Pipeline 231 Creating and Evaluating a Feature Vector 232 vi | Table of Contents
📄 Page
9
Building and Running an Automated Training and Validation Pipeline 234 Real-Time Application Pipeline 238 Defining a Custom Model Serving Class 238 Building an Application Pipeline with Enrichment and Ensemble 238 Testing the Application Pipeline Locally 240 Deploying and Testing the Real-Time Application Pipeline 241 Model Monitoring 242 CI/CD and Continuous Operations 243 Conclusion 246 Critical Thinking Discussion Questions 246 Exercises 246 8. Building Scalable Deep Learning and Large Language Model Projects. . . . . . . . . . . . . 247 Distributed Deep Learning 248 Horovod 249 Ray 250 Data Gathering, Labeling, and Monitoring in DL 251 Data Labeling Pitfalls to Avoid 252 Data Labeling Best Practices 253 Data Labeling Solutions 254 Using Foundation Models as Labelers 256 Monitoring DL Models with Unstructured Data 257 Build Versus Buy Deep Learning Models 258 Foundation Models, Generative AI, LLMs 259 Risks and Challenges with Generative AI 262 MLOps Pipelines for Efficiently Using and Customizing LLMs 267 Application Example: Fine-Tuning an LLM Model 269 Conclusion 281 Critical Thinking Discussion Questions 281 Exercises 282 9. Solutions for Advanced Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 ML Problem Framing with Time Series 284 Navigating Time Series Analysis with AWS 286 Diving into Time Series with DeepAR+ 290 Time Series with the GCP BigQuery and SQL 292 Build Versus Buy for MLOps NLP Problems 296 Build Versus Buy: The Hugging Face Approach 296 Exploring Natural Language Processing with AWS 297 Exploring NLP with OpenAI 303 Video Analysis, Image Classification, and Generative AI 305 Table of Contents | vii
📄 Page
10
Image Classification Techniques with CreateML 307 Composite AI 308 Getting Started with Serverless for Composite AI 309 Use Cases of Composite AI with Serverless 312 Conclusion 313 Critical Thinking Discussion Questions 313 Exercises 314 10. Implementing MLOps Using Rust. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 The Case for Rust for MLOps 316 Leveling Up with Rust, GitHub Copilot, and Codespaces 317 In the Beginning Was the Command Line 321 Getting Started with Rust for MLOps 323 Using PyTorch and Hugging Face with Rust 326 Using Rust to Build Tools for MLOps 330 Building Containerized Rust Command-Line Tools 330 GPU PyTorch Workflows 332 Using TensorFlow Rust 335 Doing k-means Clustering with Rust 336 Final Notes on Rust 337 Ruff Linter 337 rust-new-project-template 337 Conclusion 339 Critical Thinking Discussion Questions 340 Exercises 340 A. Job Interview Questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 B. Enterprise MLOps Interviews. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 viii | Table of Contents
📄 Page
11
Preface As MLOps veterans, we have often seen the following scenario play out across enterprises building their data science practices. Traditionally, when enterprises built their data science practice, they would start by building a model in the lab, with a small team, often working on their laptops and with a small, manually extracted dataset. They developed the model in operational isolation, and the results were incorporated manually into applications. Then, once the model was complete and predicting with accuracy, the true struggle of trying to bring it to production, to generate real business value, began. At this point, the enterprise faced challenges such as ingestion of production data, large scale training, serving in real-time, and monitoring/management of the models in production. These hurdles would often take months to overcome, presenting a huge cost in resources and lost time. The AI pipeline is siloed, with teams working in isolation and with many different tools and frameworks that don’t necessarily play well with each other. This results in a huge waste of resources and businesses not being able to capitalize on their investment in data science. According to Gartner, as many as 85% of data science projects fall short of expectations. In this book, we propose a mindset shift, one that addresses these existing challenges that prevent bringing models to production. We recommend a production-first approach: starting out not with the model but rather by designing a continuous operational pipeline, and then making sure the various components and practices map into it. By automating as many components as possible and making the process fast and repeatable, the pipeline can scale along with the organization’s needs and provide rapid business value while answering dynamic and enterprise MLOps needs. Today, more businesses understand the vast potential of AI models to positively impact the business across many new use cases. And with generative AI opening up new opportunities for business innovation across industries, it seems that AI ix
📄 Page
12
adoption and usage are set to skyrocket in the coming years. This book explores how to bring data science to life for these real-world MLOps scenarios. Who This Book Is For This book is for practitioners in charge of building, managing, maintaining, and operationalizing the data science process end to end: the heads of data science, heads of ML engineering, senior data scientists, MLOps engineers, and machine learning engineers. These practitioners are familiar with the nooks and crannies (as well as the challenges and obstacles) of the data science pipeline, and they have the initial technological know-how, for example, in Python, pandas, sklearn, and others. This book can also be valuable for other technology leaders like CIOs, CTOs, and CDOs who want to efficiently scale the use of AI across their organization, create AI applications for multiple business use cases, and bridge organizational and technolog‐ ical silos that prevent them from doing so today. The book is meant to be read in three ways. First, in one go, as a strategic guide that opens horizons to new MLOps ideas. Second, when making any strategic changes to the pipeline that require consultation and assistance. For example, when introducing real-time data into the pipeline, scaling the existing pipeline to a new data source/ business use case, automating the MLOps pipeline, implementing a Feature Store, or introducing a new tool into the pipeline. Finally, the book can be referred to daily when running and implementing MLOps. For example, for identifying and fixing a bottleneck in the pipeline, pipeline monitoring, and managing inference. Navigating This Book This book is built according to the phases of the MLOps pipeline, guiding you through your first steps with MLOps up to the most advanced use cases: • Chapters 1–3 show how organizations should approach MLOps, how data sci‐ ence teams can get started, and what to prepare for your first MLOps project. • Chapters 4–7 explain the components of a resilient and scalable MLOps pipeline and how to build a machine learning pipeline that scales across the organization. • Chapter 8 covers deep learning pipelines and also dives into GenAI and LLMs. • Chapters 9 and 10 show how to adapt pipelines for specific verticals and use cases, like hybrid deployments, real-time predictions, composite AI, and so on. Throughout the book, you will find real code examples to interactively try out for yourself. x | Preface
📄 Page
13
After reading this book, you will be a few steps closer to being able to: • Build an MLOps pipeline. • Build a deep learning pipeline. • Build application-specific solutions (for example, for NLP). • Build use-case specific solutions, (for example, for fraud prediction). Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. Preface | xi
📄 Page
14
Using Code Examples Supplemental material (code examples, exercises, and so on) is available for down‐ load at https://github.com/mlrun/demo-fraud and https://github.com/mlrun/demo-llm- tuning. If you have a technical question or a problem using the code examples, please send email to bookquestions@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Implementing MLOps in the Enterprise by Yaron Haviv and Noah Gift (O’Reilly). Copyright 2024 Yaron Haviv and Noah Gift, 978-1-098-13658-1.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. xii | Preface
📄 Page
15
How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-829-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://www.oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/mlops-in-the-enterprise. Email bookquestions@oreilly.com to comment or ask technical questions about this book. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media. Follow us on Twitter: https://twitter.com/oreillymedia. Watch us on YouTube: https://youtube.com/oreillymedia. Acknowledgments We’d like to thank the people behind the scenes who assisted, guided, and supported us throughout this book’s journey. Without them, this book wouldn’t have been brought to life. Thank you to the dedicated team at O’Reilly, who provided feedback and guidance, drove the writing process of this book, and helped polish the content. We’d especially like to thank Corbin Collins for being our partner throughout the process, paying close attention to all the details and helping us meet deadlines, and to Nicole Butter‐ field, for her unwavering support and valuable input. We’re deeply appreciative of our tech reviewers, Dhanasekar Sundararaman, Tigran Harutyunyan, Nivas Durairaj, and Noga Cohen for their expertise and wisdom. Preface | xiii
📄 Page
16
Yaron I am thrilled to present my first book, a culmination of years of experience and knowledge, as I eagerly share it with readers worldwide. I am deeply grateful to my family, Dvori, Avia, Ofri, and Amit, for their love and support throughout my career and the long process of writing this book. Their patience and encouragement have meant a lot to me. Special thanks go to Sahar, who encouraged me to write this book, and to Guy and the Iguazio team, who shared their knowledge, experiences, and code examples. Noah It is always an honor to have the opportunity to work on an O’Reilly book. This book marks my fifth O’Reilly and likely my last technical book as I shift to other writing and content creation forms. Thank you to everyone I worked with at O’Reilly, including current and former editors and collaborators and authors of the recent book. Also, thanks to many of my current and former students, faculty, and staff at Duke MIDS, Duke Artificial Intelligence Masters in Engineering, as many ideas in this book came from courses I taught and questions brought up by students. Finally, thank you to my family, Leah, Liam, and Theodore, who put up with me working on weekends and late at night to hit deadlines. xiv | Preface
📄 Page
17
1 Dr. Luks summarizes the systematic evidence-based strategy: “Create a caloric deficit, then stay lean. Get sleep. Eat real food. Move often, throughout the day. Push and pull heavy things. Socialize. Have a sense of purpose.” CHAPTER 1 MLOps: What Is It and Why Do We Need It? At the root of inefficient systems is an interconnected web of incorrect decisions that compound over time. It is tempting to look for a silver bullet fix to a system that doesn’t perform well, but that strategy rarely, if ever, pays off. Consider the human body; there is no shortage of quick fixes sold to make you healthy, but the solution to health longevity requires a systematic approach.1 Similarly, there is no shortage of advice on “getting rich quick.” Here again, the data conflicts with what we want to hear. In Don’t Trust Your Gut (HarperCollins, 2022), Seth Stephens-Davidowitz shows that 84% of the top 0.1% of earners receive at least some money from owning a business. Further, the average age of a business founder is about 42, and some of the most successful companies are real estate or automobile dealerships. These are hardly get-rich-quick schemes but businesses that require significant skill, expertise, and wisdom through life experience. Cities are another example of complex systems that don’t have silver bullet fixes. WalletHub created a list of best-run cities in America with San Francisco ranked 149 out of 150 despite having many theoretical advantages over other cities, like beautiful weather, being home to the top tech companies in the world, and a 2022-2023 budget of $14 billion for a population of 842,000 people. The budget is similar to the entire country of Panama, with a population of 4.4 million people. As the case of San Francisco shows, revenue or natural beauty alone isn’t enough to have a well-run city; there needs to be a comprehensive plan: execution and strategy matter. No single solution is going to make or break a city. The WalletHub survey points to 1
📄 Page
18
extensive criteria for a well-run city, including infrastructure, economy, safety, health, education, and financial stability. Similarly, with MLOps, searching for a single answer to getting models into produc‐ tion, perhaps by getting better data or using a specific deep learning framework, is tempting. Instead, just like these other domains, it is essential to have an evidence- based, comprehensive strategy. What Is MLOps? At the heart of MLOps is the continuous improvement of all business activity. The Japanese automobile industry refers to this concept as kaizen, meaning literally “improvement.” For building production machine learning systems, this manifests in both the noticeable aspects of improving the model’s accuracy as well the entire ecosystem supporting the model. A great example of one of the nonobvious components of the machine learning sys‐ tem is the business requirements. If the company needs an accurate model to predict how much inventory to store in the warehouse, but the data science team creates a computer vision system to keep track of the inventory already in the warehouse, the wrong problem is solved. No matter how accurate the inventory tracking computer vision system is, the business asked for a different requirement, and the system cannot meet the goals of the organization as a result. So what is MLOps? A compound of Machine Learning (ML) and Operations (Ops), MLOps is the processes and practices for designing, building, enabling, and support‐ ing the efficient deployment of ML models in production, to continuously improve business activity. Similar to DevOps, MLOps is based on automation, agility, and collaboration to improve quality. If you’re thinking continuous integration/continu‐ ous delivery (CI/CD), you’re not wrong. MLOps supports CI/CD. According to Gartner, “MLOps aims to standardize the deployment and management of ML models alongside the operationalization of the ML pipeline. It supports the release, activation, monitoring, performance tracking, management, reuse, maintenance, and governance of ML artifacts“. MLOps in the Enterprise There are substantial differences between an enterprise company and a startup com‐ pany. Entrepreneurship expert Scott Shane wrote in The Illusions of Entrepreneurship (Yale University Press, 2010) “only one percent of people work in companies less than two years old, while 60 percent work in companies more than ten years old.” Longevity is a characteristic of the enterprise company. 2 | Chapter 1: MLOps: What Is It and Why Do We Need It?
📄 Page
19
He also says, “it takes 43 startups to end up with just one company that employs anyone other than the founder after ten years.” In essence, the enterprise builds for scale and longevity. As a result, it is essential to consider technologies and services that support these attributes. Startups have technological advantages for users, but they also have different risk profiles for the investors versus the employees. Ven‐ ture capitalists have a portfolio of many companies, diversifying their risk. According to FundersClub, a typical fund “contains 135 million” and is “spread between 30-85 startups.” Meanwhile, startup employees have their salary and equity invested in one company. Using the expected value to generate the actual equity value at a probability of 1/43, an enterprise offering a yearly 50k bonus returns 200k at year four. A startup produces $4,651.16 in year four. For most people, on average, startups are a risky decision if judged on finance alone. However, they might offer an excellent reward via an accelerated chance to learn new technology or skills with the slight chance of a huge payout. On the flip side, if a startup’s life is dynamic, it must pick very different technology solutions than the enterprise. If there is a 2.3% chance a startup will be around in 10 years, why care about vendor lock-in or multicloud deployment? Only the mathematically challenged startups build what they don’t yet need. Likewise, if you are a profitable enterprise looking to build upon your existing success, consider looking beyond solutions that startups use. Other metrics like the ability to hire, enterprise support, business continuity, and price become critical key performance indicators (KPIs). Understanding ROI in Enterprise Solutions The appeal of a “free” solution is that you get something for nothing. In practice, this is rarely the case. Figure 1-1 presents three scenarios. In the first scenario, the solution costs nothing but delivers nothing, so the ROI is zero. In the second scenario, high value is at stake, but the cost exceeds the value, resulting in a negative ROI. In the third scenario, a value of one million with a cost of half a million delivers half a million in value. The best choice isn’t free but is the solution that delivers the highest ROI since this ROI increases the velocity of the profitable enterprise. Let’s expand on the concept of ROI even more by digging into bespoke solutions, which in some sense are also “free” since an employee built the solution. What Is MLOps? | 3
📄 Page
20
Figure 1-1. Evaluating ROI for technology platform solutions In Figure 1-2, a genuinely brilliant engineer convinces management to allow them to build a bespoke system that solves a particular problem for the Fortune 100 company. The engineer not only delivers quickly, but the system exceeds expectations. It would be tempting to think this is a success story, but it is actually a story of failure. One year later, the brilliant engineer gets a job offer from a trillion-dollar company and leaves. About three months later, the system breaks, and no one is smart enough to fix it. The company reluctantly replaces the entire system and retrains the company on the new proprietary system. Figure 1-2. Bespoke system dilemma The ultimate cost to the organization is the lack of momentum from using a superior system for a year, alongside the training time necessary to switch from the old system to the new system. Thus, a “free” solution with positive ROI can have long-term 4 | Chapter 1: MLOps: What Is It and Why Do We Need It?