Deep Learning at Scale At the Intersection of Hardware, Software, and Data (Suneeta Mall) (Z-Library)

Author: Suneeta Mall

教育

Bringing a deep-learning project into production at scale is quite challenging. To successfully scale your project, a foundational understanding of full stack deep learning, including the knowledge that lies at the intersection of hardware, software, data, and algorithms, is required. This book illustrates complex concepts of full stack deep learning and reinforces them through hands-on exercises to arm you with tools and techniques to scale your project. A scaling effort is only beneficial when it's effective and efficient. To that end, this guide explains the intricate concepts and techniques that will help you scale effectively and efficiently. You'll gain a thorough understanding of: How data flows through the deep-learning network and the role the computation graphs play in building your model How accelerated computing speeds up your training and how best you can utilize the resources at your disposal How to train your model using distributed training paradigms, i.e., data, model, and pipeline parallelism How to leverage PyTorch ecosystems in conjunction with NVIDIA libraries and Triton to scale your model training Debugging, monitoring, and investigating the undesirable bottlenecks that slow down your model training How to expedite the training lifecycle and streamline your feedback loop to iterate model development A set of data tricks and techniques and how to apply them to scale your training model How to select the right tools and techniques for your deep-learning project Options for managing the compute infrastructure when running at scale

📄 File Format: PDF

💾 File Size: 20.8 MB

Views

Downloads

0.00

Total Donations

📖 Read Online ⬇️ Download

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

📄 Page 1

(This page has no text content)

📄 Page 2

DATA Deep Learning at Scale linkedin.com/company/oreilly-media youtube.com/oreillymedia Bringing a deep learning project into production at scale is quite challenging. To successfully scale your project, you require a foundational understanding of the deep learning stack—specifically, how deep learning interfaces with hardware, software, and data. Ideal for anyone interested in model development at scale, this book illustrates complex concepts of the deep learning stack and reinforces them through practical exercises. Author Suneeta Mall explains the intricate concepts, tools, and techniques to help you scale your deep learning model development and training workload effectively and efficiently. Topics include: • How your model is decomposed into a computation graph and how your data flows through this graph during the training process • How accelerated computing speeds up your training and how you can best utilize the hardware resources at your disposal • How to train your model using distributed training paradigms (e.g., data, model, pipeline, and hybrid multidimensional parallelism) • Debugging, monitoring, and investigating bottlenecks that undesirably slow down the scale out of model training • How to expedite the training lifecycle and streamline your feedback loop to iterate model development and other related tricks, tools, and techniques to scale your training workload • How to apply data-centric techniques to efficiently train your model at scale Suneeta Mall is head of the AI Engineering Division at harrison.ai, a clinician-led artificial intelligence medical technology company focused on addressing significant healthcare issues. She has a strong computer science and engineering background through her roles at IBM, Expedia, USyd, Nearmap, and harrison.ai. 9 7 8 1 0 9 8 1 4 5 2 8 6 5 7 9 9 9 US $79.99 CAN $99.99 ISBN: 978-1-098-14528-6 “A new paradigm is emerging, one where our programs are written on distributed architectures through data. This requires a change of approach and a new set of challenges: leveraging data to do our bidding while facing the challenges arising from distributed computing. This book offers a comprehensive overview of what this entails, making it an interesting starting point for less experienced practitioners and a great reference for experts.” —Giovanni Alzetta, PhD Machine Learning Engineer at Oramasearch

📄 Page 3

Suneeta Mall Deep Learning at Scale At the Intersection of Hardware, Software, and Data Boston Farnham Sebastopol TokyoBeijing

📄 Page 4

978-1-098-14528-6 [LSI] Deep Learning at Scale by Suneeta Mall Copyright © 2024 Suneeta Mall. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (https://oreilly.com). For more information, contact our corporate/institu‐ tional sales department: 800-998-9938 or corporate@oreilly.com. Acquisition Editor: Nicole Butterfield Development Editor: Sara Hunter Production Editor: Aleeya Rahman Copyeditor: Rachel Head Proofreader: Kim Cofer Indexer: Judith McConville Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea June 2024: First Edition Revision History for the First Edition 2024-06-17: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098145286 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Deep Learning at Scale, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

📄 Page 5

Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1. What Nature and History Have Taught Us About Scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 The Philosophy of Scaling 1 The General Law of Scaling 2 History of Scaling Law 2 Scalable Systems 4 Nature as a Scalable System 5 Our Visual System: A Biological Inspiration 6 Artificial Intelligence: The Evolution of Learnable Systems 7 It Takes Four to Tango 7 Evolving Deep Learning Trends 16 Scale in the Context of Deep Learning 21 Six Development Considerations 22 Scaling Considerations 26 Summary 33 Part I. Foundational Concepts of Deep Learning 2. Deep Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 The Role of Data in Deep Learning 37 Data Flow in Deep Learning 39 Hands-On Exercise #1: Implementing Minimalistic Deep Learning 42 Developing the Model 43 The Embedded/Latent Space 49 A Word of Caution 51 The Learning Rate and Loss Landscape 52 iii

📄 Page 6

Scaling Consideration 54 Profiling 55 Hands-On Exercise #2: Getting Complex with PyTorch 57 Model Input Data and Pipeline 58 Model 59 Auxiliary Utilities 60 Putting It All Together 62 Computation Graphs 63 Inference 66 Summary 67 3. The Computational Side of Deep Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 The Higgs Boson of the Digital World 70 Floating-Point Numbers: The Faux Continuous Numbers 70 Units of Data Measurement 74 Data Storage Formats: The Trade-off of Latency and Throughput 75 Computer Architecture 75 The Birth of the Electromechanical Engine 76 Memory and Persistence 77 Computation and Memory Combined 81 The Scaling Laws of Electronics 83 Scaling Out Computation with Parallelization 85 Threads Versus Processes: The Unit of Parallelization 85 Hardware-Optimized Libraries for Acceleration 90 Parallel Computer Architectures: Flynn’s and Duncan’s Taxonomies 90 Accelerated Computing 91 Popular Accelerated Devices for Deep Learning 93 CUDA 100 Accelerator Benchmarking 112 Summary 112 4. Putting It All Together: Efficient Deep Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Hands-On Exercise #1: GPT-2 114 Exercise Objectives 114 Model Architecture 115 Implementation 118 Running the Example 119 Experiment Tracking 120 Measuring to Understand the Limitations and Scale Out 121 Transitioning from Language to Vision 127 Hands-On Exercise #2: Vision Model with Convolution 128 Model Architecture 128 iv | Table of Contents

📄 Page 7

Running the Example 132 Observations 132 Graph Compilation Using PyTorch 2.0 132 New Components of PyTorch 2.0 133 Graph Execution in PyTorch 2.0 134 Modeling Techniques to Scale Training on a Single Device 136 Graph Compilation 136 Reduced- and Mixed-Precision Training 138 Memory Tricks for Efficiency 142 Optimizer Efficiencies 144 Model Input Pipeline Tricks 148 Writing Custom Kernels in PyTorch 2.0 with Triton 148 Summary 149 Part II. Distributed Training 5. Distributed Systems and Communications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Distributed Systems 154 The Eight Fallacies of Distributed Computing 155 The Consistency, Availability, and Partition Tolerance (CAP) Theorem 156 The Scaling Law of Distributed Systems 157 Types of Distributed Systems 159 Communication in Distributed Systems 162 Communication Paradigm 162 Communication Patterns 163 Communication Technologies 167 MPI 169 Communication Initialization: Rendezvous 172 Hands-On Exercise 173 Scaling Compute Capacity 173 Infrastructure Setup Options 173 Provisioning of Accelerated Devices 176 Workload Management 178 Deep Learning Infrastructure Review 185 Overview of Leading Deep Learning Clusters 185 Similarities Between Today’s Most Powerful Systems 188 Summary 189 6. Theoretical Foundations of Distributed Deep Learning. . . . . . . . . . . . . . . . . . . . . . . . . 191 Distributed Deep Learning 191 Centralized DDL 192 Table of Contents | v

📄 Page 8

Decentralized DDL 199 Dimensions of Scaling Distributed Deep Learning 207 Partitioning Dimensions of Distributed Deep Learning 207 Types of Distributed Deep Learning Techniques 208 Choosing a Scaling Technique 218 Measuring Scale 220 End-to-End Metrics and Benchmarks 221 Measuring Incrementally in a Reproducible Environment 226 Summary 227 7. Data Parallelism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Data Partitioning 229 Implications of Data Sampling Strategies 231 Working with Remote Datasets 231 Introduction to Data Parallel Techniques 232 Hands-On Exercise #1: Centralized Parameter Server Using RCP 232 Hands-On Exercise #2: Centralized Gradient-Partitioned Joint Worker/ Server Distributed Training 236 Hands-On Exercise #3: Decentralized Asynchronous Distributed Training 238 Centralized Synchronous Data Parallel Strategies 240 Data Parallel (DP) 242 Distributed Data Parallel (DDP) 242 Zero Redundancy Optimizer–Powered Data Parallelism (ZeRO-DP) 244 Fault-Tolerant Training 246 Hands-On Exercise #4: Scene Parsing with DDP 247 Hands-On Exercise #5: Distributed Sharded DDP (ZeRO) 251 Building Efficient Pipelines 253 Dataset Format 253 Local Versus Remote 254 Staging 254 Threads Versus Processes: Scaling Your Pipelines 255 Memory Tricks 255 Data Augmentations: CPU Versus GPU 255 JIT Acceleration 255 Hands-On Exercise #6: Pipeline Efficiency with FFCV 256 Summary 257 8. Scaling Beyond Data Parallelism: Model, Pipeline, Tensor, and Hybrid Parallelism. 259 Questions to Ask Before Scaling Vertically 261 Theoretical Foundations of Vertical Scaling 264 Revisiting the Dimensions of Scaling 265 Operators’ Perspective of Parallelism Dimensions 271 vi | Table of Contents

📄 Page 9

Data Flow and Communications in Vertical Scaling 271 Basic Building Blocks for Scaling Beyond DP 284 PyTorch Primitives for Vertical Scaling 284 Working with Larger Models 287 Distributed Checkpointing: Saving the Partitioned Model 288 Summary 289 9. Gaining Practical Expertise with Scaling Across All Dimensions. . . . . . . . . . . . . . . . . . 291 Hands-On Exercises: Model, Tensor, Pipeline, and Hybrid Parallelism 291 The Dataset 291 Hands-On Exercise #1: Baseline DeepFM 292 Hands-On Exercise #2: Model Parallel DeepFM 293 Hands-On Exercise #3: Pipeline Parallel DeepFM 296 Hands-On Exercise #4: Pipeline Parallel DeepFM with RPC 297 Hands-On Exercise #5: Tensor Parallel DeepFM 298 Hands-On Exercise #6: Hybrid Parallel DeepFM 300 Tools and Libraries for Vertical Scaling 301 OneFlow 301 FairScale 302 DeepSpeed 302 FSDP 305 Overview and Comparison 305 Hands-On Exercise #7: Automatic Vertical Scaling with DeepSpeed 307 Observations 307 Summary 308 Part III. Extreme Scaling 10. Data-Centric Scaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 The Seven Vs of Data Through a Deep Learning Lens 312 The Scaling Law of Data 313 Data Quality 316 Validity 317 Variety 317 Veracity 332 Value and Volume 340 The Data Engine and Continual Learning 346 Volatility 347 Velocity 348 Summary 348 Table of Contents | vii

📄 Page 10

11. Scaling Experiments: Effective Planning and Management. . . . . . . . . . . . . . . . . . . . . 349 Model Development Is Iterative 350 Planning for Experiments and Execution 351 Simplify the Complex 351 Fast Iteration for Fast Feedback 352 Decoupled Iterations 352 Feasibility Testing 353 Developing and Scaling a Minimal Viable Solution 353 Setting Up for Iterative Execution 354 Techniques to Scale Your Experiments 357 Accelerating Model Convergence 358 Accelerating Learning Via Optimization and Automation 362 Accelerating Learning by Increasing Expertise 372 Learning with Scarce Supervision 380 Hands-On Exercises 382 Hands-On Exercise #1: Transfer Learning 383 Hands-On Exercise #2: Hyperparameter Optimization 383 Hands-On Exercise #3: Knowledge Distillation 384 Hands-On Exercise #4: Mixture of Experts 386 Hands-On Exercise #5: Contrastive Learning 388 Hands-On Exercise #6: Meta-Learning 389 Summary 389 12. Efficient Fine-Tuning of Large Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 Review of Fine-Tuning Techniques 392 Standard Fine Tuning 392 Meta-Learning (Zero-/Few-Shot Learning) 393 Adapter-Based Fine Tuning 393 Low-Rank Tuning 394 LoRA—Parameter-Efficient Fine Tuning 395 Quantized LoRA (QLoRA) 396 Hands-on Exercise: QLoRA-Based Fine Tuning 397 Implementation Details 397 Inference 398 Exercise Summary 399 Summary 399 13. Foundation Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 What Are Foundation Models? 401 The Evolution of Foundation Models 402 Challenges Involved in Developing Foundation Models 406 Measurement Complexity 406 viii | Table of Contents

📄 Page 11

Deployment Challenges 407 Propagation of Defects to All Downstream Models 407 Legal and Ethical Considerations 407 Ensuring Consistency and Coherency 408 Multimodal Large Language Models 408 Projection 409 Gated Cross-Attention 410 Query-Based Encoding 411 Further Exploration 412 Summary 412 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Table of Contents | ix

📄 Page 12

(This page has no text content)

📄 Page 13

Preface I started my professional career as a software engineer. Over the course of my time in that role, I became deeply interested and involved in running software and systems at scale. I learned a lot about distributed systems, performance, optimizations, and run‐ ning them reliably at scale. Subsequently, I went on to perform many other roles, from building systems at the intersection of software and operations (DevOps) and auxiliary systems to enable intelligent software (MLOps), to running deep learning inference at scale and developing data engines for deep learning (machine learning engineering), to developing multitasking, multiobjective models for critical functions such as healthcare and business decision workflows as a data scientist and machine learning specialist. Since I’ve become involved in building intelligent systems, deep learning is a big part of what I do today. The wide adoption of deep learning–based intelligent (AI) sys‐ tems is motivated by its ability to solve problems at scale with efficiency. However, building such systems is complex, because deep learning is not just about algorithms and mathematics. Much of the complexity lies at the intersection of hardware, soft‐ ware, data, and deep learning (the algorithms and techniques, specifically). I consider myself fortunate to have gained experience in a series of roles that forced me to rap‐ idly develop a detailed understanding of building and managing deep learning–based AI systems at scale. The knowledge that I have acquired because of the opportunities presented to me is not so easily available and consumed, because each of these domains—hardware, software, and data—is as complex as deep learning itself. The key motivation behind this book is to democratize this knowledge so that every machine learning practitioner, engineer or not, can navigate the deep learning land‐ scape. I’ve always felt that this knowledge was somewhat fragmented, and saw an opportunity to pull it together to create a coherent knowledge base. This unified knowledge base will provide theoretical and practical guidance for developing deep learning engineering knowledge so you can easily scale out your deep learning work‐ loads without needing to go through as many explorations as I did. xi

📄 Page 14

Why Scaling Matters Deep learning and scaling are correlated. Deep learning is capable of scaling your objectives from single task to multitask, from one modality to multimodality, from one class to thousands of classes. Anything is possible, provided you have scalable hardware and a large volume of data and write software that can efficiently scale to utilize all the resources available to you. Scaling is complex, and thus not free. Developing a deep learning–based system requires a large number of layers, a large volume of data, and hardware capable of handling computationally intensive workloads. Scaling requires understanding the elasticity of your entire system—not just your model but your entire deep learning stack—and adapting to situations where elasticity nears a breaking point. Therein lies the secondary motivation of this book: to enable you to gain a deeper understanding of your system and when it might break, and how you can avoid unnecessary breaks. Who This Book Is For This book aims to help you develop a deeper knowledge of the deep learning stack— specifically, how deep learning interfaces with hardware, software, and data. It will serve as a valuable resource when you want to scale your deep learning model, either by expanding the hardware resources or by adding larger volumes of data or increas‐ ing the capacity of the model itself. Efficiency is a key part of any scaling operation. For this reason, consideration of efficiency is weaved in throughout the book, to pro‐ vide you with the knowledge and resources you need to scale effectively. This book is written for machine learning practitioners from all walks of life: engi‐ neers, data engineers, MLOps, deep learning scientists, machine learning engineers, and others interested in learning about model development at scale. It assumes that the reader already has a fundamental knowledge of deep learning concepts such as optimizers, learning objectives and loss functions, and model assembly and compila‐ tion, as well as some experience with model development. Familiarity with Python and PyTorch is also essential for the practical sections of the book. Given the complexity and scope, this book primarily focuses on scale-out of model development and training, with an extensive focus on distributed training. While the first few chapters may be useful for deployment and inference use cases, scaling infer‐ ence is beyond the scope of this book. The topics we will cover include: • How your model is decomposed into a computation graph and how your data flows through this graph during the training process. • The less told but beautiful story of floating-point numbers and how these Higgs bosons of deep learning can be used to achieve memory efficiency. xii | Preface

📄 Page 15

• How accelerated computing speeds up your training and how you can best utilize the hardware resources at your disposal. • How to train your model using distributed training paradigms (i.e., data, model, pipeline, and hybrid multidimensional parallelism). You will also learn about federated learning and its challenges. • How to leverage the PyTorch ecosystem in conjunction with NVIDIA libraries and Triton to scale your model training. • Debugging, monitoring, and investigating bottlenecks that undesirably slow down the scale-out of model training. • How to expedite the training lifecycle and streamline your feedback loop to iter‐ ate model development and related best practices. • A set of data tricks and techniques and how to apply them to scale your training over limited resources. • How to select the right tools and techniques for your deep learning project. • Options for managing compute infrastructure when running at scale. How This Book Is Organized This book consists of an introductory chapter followed by a dozen chapters divided into three parts covering foundational concepts, distributed training, and extreme scaling. Each chapter builds upon the concepts, fundamentals, and principles from the preceding chapters to provide a holistic knowledge of deep learning that will enable efficient and effective scale-out of training workloads. Introduction Chapter 1, “What Nature and History Have Taught Us About Scale”, sets out the the‐ oretical framework for deciding when to scale and explores the high-level challenges involved in scaling out. In this chapter, you will also read about the history of deep learning and how scaling has been a key driver of its success. Part I: Foundational Concepts of Deep Learning Chapter 2, “Deep Learning”, introduces deep learning through the lens of computa‐ tional graphs and data flow. Early-stage machine learning practitioners may find this chapter helpful as it explains the inner workings of deep learning through pure Python, no-frills exercises. More experienced deep learning practitioners may choose to skip this chapter. Chapter 3, “The Computational Side of Deep Learning”, dives into the inner work‐ ings of electronic computations and hardware, exploring how compute capabilities Preface | xiii

📄 Page 16

are achieved and scaled. It also provides detailed insights into the variety of acceler‐ ated hardware available today, to arm you with the knowledge required to choose the most suitable hardware for your project. Chapter 4, “Putting It All Together: Efficient Deep Learning”, brings the foundational knowledge of deep learning together to provide more practical guidance on how to build an efficient and effective intelligent system for your task and how to measure and monitor it. In this chapter, you will also learn about graph compilation and a ser‐ ies of memory tricks to provide you with the knowledge to build an efficient stack. Part II: Distributed Training Chapter 5, “Distributed Systems and Communications”, introduces the foundations of distributed systems and provides detailed insights into the different types and the challenges associated with each one. Communication is a critical aspect of distributed systems that’s explained in this chapter through the lens of deep learning. This chap‐ ter also provides insights into the options and tools that can be used to scale out your hardware resources to achieve distributed computing, along with what this means for hardware with acceleration. Chapter 6, “Theoretical Foundations of Distributed Deep Learning”, extends Chap‐ ter 5 to provide theoretical and foundational knowledge of distributed deep learning. In this chapter, you will learn about a variety of distributed deep learning training techniques and a framework for choosing one. Chapter 7, “Data Parallelism”, dives into the details of distributed data parallelism and provides a series of practical exercises demonstrating these techniques. Chapter 8, “Scaling Beyond Data Parallelism: Model, Pipeline, Tensor, and Hybrid Parallelism”, provides foundational and practical knowledge of scaling model train‐ ing beyond data parallel. In this chapter, you will learn about model, pipeline, and multidimensional hybrid parallelism and experience the challenges and limitations of each of these techniques via practical exercises. Chapter 9, “Gaining Practical Expertise with Scaling Across All Dimensions”, brings all the learning of Part II together to provide knowledge and insights on how to real‐ ize multidimensional parallelism in a more effective manner. Part III: Extreme Scaling Chapter 10, “Data-Centric Scaling”, provides a data-centric perspective and offers valuable information on assorted techniques to maximize the gain from your data. This chapter also provides useful insights on how to achieve efficiency in your data pipelines through sampling and selection techniques. xiv | Preface

📄 Page 17

Chapter 11, “Scaling Experiments: Effective Planning and Management”, focuses on scaling out of experiments and provides insights on experiment planning and man‐ agement. This chapter provides useful information for when you’re conducting mul‐ tiple experiments and want to maximize your chances of finding the best-performing model; it covers techniques like fine tuning, mixture of experts (MoE), contrastive learning, etc. Chapter 12, “Efficient Fine-Tuning of Large Models”, explores low-rank fine tuning of large models with a practical example. Chapter 13, “Foundation Models”, lays out the conceptual framework of foundation models and provides a summary of this evolving landscape. What You Need to Use This Book To run the code samples in this book, you will need a working device with at least a 16-core CPU and 16 GB (ideally 32 GB) of RAM. Most of the exercises in Part II use accelerated hardware, so access to a system with more than one GPU—ideally NVI‐ DIA—will be required for some of the exercises. Most exercises are written in a platform-agnostic way, and a Dockerfile with a list of runtime dependencies required to run the exercises is provided. Setting Up Your Environment for Hands-on Exercises Instructions to set up your environment for this book’s practical exercises are included in the companion GitHub repository. This page includes specific guidelines to set up either a Python-based native environment or an emulated Docker environ‐ ment. Instructions to set up the NVIDIA drivers and CUDA runtime are also pro‐ vided, along with instructions on updating the versions and running the exercises. Some exercises in Part II will come with special instructions that will be explained in the context of those exercises. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/suneeta-mall/deep_learning_at_scale. If you have a technical question or a problem using the code examples, please send an email to bookquestions@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this Preface | xv

📄 Page 18

book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require per‐ mission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Deep Learning at Scale by Suneeta Mall (O’Reilly). Copyright 2024 Suneeta Mall, 978-1-098-14528-6.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. xvi | Preface

📄 Page 19

This element indicates a warning or caution. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-827-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://www.oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/DLAS. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media. Watch us on YouTube: https://youtube.com/oreillymedia. Preface | xvii

📄 Page 20

Acknowledgments To my beloved family: Your unwavering support and understanding during the cre‐ ation of this book has been huge. My heartfelt thanks to my husband, whose patience and encouragement kept me going. To my incredible children, your curiosity and enthusiasm for learning inspire me every day. This book is as much yours as it is mine. Mum, Dad, and parents in-law, your love, wisdom, unwavering belief in my abilities, and endless encouragement have been a guiding light throughout this journey. To my brother, your perseverance knows no bounds and keeps me inspired. This book is dedicated to all of you. To the open source deep learning community: I have deepest gratitude for the open source communities around the world that have been forthcoming with their knowl‐ edge and work to collectively and collaboratively improve the posture of AI systems in production. Your commitment to innovation and accessibility in the field of deep learning has been revolutionary. The knowledge, tools, and resources that these communities have built together have not only shaped this book, but have also transformed the landscape of machine learn‐ ing. I’m deeply thankful for your contributions. This work would not have been pos‐ sible without you. I take deep pleasure in dedicating this book to you! To my dedicated tech reviewers and editorial team: I’m indebted to your valuable input and dedication to excellence. I would like to acknowledge and express my deepest gratitude to the technical reviewers, Tim Hauke Langer, Giovanni Alzetta, Satyarth Praveen, and Vishwesh Ravi Shrimali, and my editor, Sara Hunter, whose guidance and advice have greatly improved this book. I would also like to express my gratitude to Nicole Butterfield, my acquisitions editor, for her support and guidance in shaping the direction of the book. xviii | Preface

The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00

Total Amount (¥)

Donation Count

← Back to List