LLMOps Managing Large Language Models in Production (Abi Aryan)（Z-Library）

Author: Abi Aryan

代码

Are you wrestling with the complexities of deploying and managing large language models? The rapid evolution of AI technologies demands robust solutions that can streamline development, enhance security, and scale effectively. However, the lack of clear guidance can make navigating this landscape daunting. Enter this much needed book by Abi Aryan--a vital resource poised to transform your approach to MLOps. This comprehensive guide equips you with the essential techniques and tools to develop, deploy, and manage large language models efficiently. Whether you're a seasoned AI practitioner or just stepping into the field, this book is your gateway to mastering LLMOps, ensuring your projects are not just functional but flourishing. By reading, you will: • Gain a robust understanding of data versioning, experiment tracking, and model deployment • Understand the architectures of models like OpenAI ChatGPT and how to fine-tune them • Learn how to implement critical security measures and comply with privacy regulations • Explore using Flask and Kubernetes to deploy models, optimizing for both performance and cost • Discover how to integrate cutting-edge tools like ChatGPT and Whisper

📄 File Format: PDF

💾 File Size: 6.8 MB

Views

Downloads

0.00

Total Donations

📖 Read Online ⬇️ Download

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

📄 Page 1

Abi Aryan LLMOps Managing Large Language Models in Production

📄 Page 2

ISBN: 978-1-098-15420-2 US $79.99 CAN $99.99 DATA Here’s the thing about large language models: they don’t play by the old rules. Traditional MLOps completely falls apart when you’re dealing with GenAI. The model hallucinates, security assumptions crumble, monitoring breaks, and agents can’t operate. Suddenly you’re in uncharted territory. That’s exactly why LLMOps has emerged as its own discipline. LLMOps: Managing Large Language Models in Production is your guide to actually running these systems when real users and real money are on the line. This book isn’t about building cool demos. It’s about keeping LLM systems running smoothly in the real world. • Navigate the new roles and processes that LLM operations require • Monitor LLM performance when traditional metrics don’t tell the whole story • Set up evaluations, governance, and security audits that actually matter for GenAI • Wrangle the operational mess of agents, RAG systems, and evolving prompts • Scale infrastructure without burning through your compute budget LLMOps “Developing with AI is getting easier, but production is where the real challenges lie. This book is the essential guide I’ll use to teach my students how to navigate the complexities of LLMOps and successfully deploy large language models in the real world. With clarity and actionable solutions, LLMOps is a vital read for transforming LLM prototypes into robust, production-ready AI systems.” Ammar Mohanna, lead AI consultant at EDT&Partners and lecturer at the American University of Beirut “This book demystifies LLMOps with clear, actionable guidance—a perfect resource for ML engineers, platform teams, and anyone taking LLMs from prototype to production.” Nirmal Budhathoki, senior data scientist, Microsoft Abi Aryan is the founder of Abide AI and a machine learning research engineer with nearly a decade of experience building production- level ML systems. A mathematician by training, she previously served as a visiting research scholar at the Cognitive Systems Lab at UCLA, under Dr. Judea Pearl, where she focused on developing intelligent agents. She’s currently advancing research in reflective intelligence in AI agents, distributed self-healing protocols for multi-agent systems, and GPU engineering for very large-scale AI systems.

📄 Page 3

Abi Aryan LLMOps Managing Large Language Models in Production

📄 Page 4

978-1-098-15420-2 [LSI] LLMOps by Abi Aryan Copyright © 2025 Abi Aryan and MeyerPerin Inc. All rights reserved. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Butterfield Development Editor: Sarah Grey Production Editor: Beth Kelly Copyeditor: Paula L. Fleming Proofreader: Vanessa Moore Indexer: BIM Creatives, LLC Cover Designer: Karen Montgomery Cover Illustrator: Monica Kamsvaag Interior Designer: David Futato Interior Illustrator: Kate Dullea July 2025: First Edition Revision History for the First Edition 2025-07-10: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098154202 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. LLMOps, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

📄 Page 5

Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1. Introduction to Large Language Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Some Key Terms 2 Transformer Models 3 Large Language Models 5 LLM Architectures 6 Encoder-Only LLMs 6 Decoder-Only LLMs 6 Encoder–Decoder LLMs 7 State Space Architectures 7 Small Language Models 8 Choosing an LLM 8 Considerations in the Selection of an LLM 9 The Big Debate: Open Source Versus Proprietary LLMs 10 Enterprise Use Cases for LLMs 13 Knowledge Retrieval 13 Translation 14 Speech Synthesis 14 Recommender Systems 15 Autonomous AI Agents 15 Agentic Systems 15 Ten Challenges of Building with LLMs 16 1. Size and Complexity 17 2. Training Scale and Duration 17 3. Prompt Engineering 17 4. Inference Latency and Throughput 18 5. Ethical Considerations 18 iii

📄 Page 6

6. Resource Scaling and Orchestration 18 7. Integrations and Toolkits 18 8. Broad Applicability 18 9. Privacy and Security 19 10. Costs 19 Conclusion 19 References 19 2. Introduction to LLMOps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 What Are Operational Frameworks? 22 From MLOps to LLMOps: Why Do We Need a New Framework? 23 Four Goals for LLMOps 26 LLMOps Teams and Roles 26 The LLMOps Engineer Role 29 A Day in the Life 30 Hiring an LLMOps Engineer Externally 31 Hiring Internally: Upskilling an MLOps Engineer into an LLMOps Engineer 34 LLMs and Your Organization 35 The Four Goals of LLMOps 36 Reliability 36 Scalability 37 Robustness 38 Security 38 The LLMOps Maturity Model 39 Conclusion 43 References 43 Further Reading 43 3. LLM-Based Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Using AI Models in Applications 47 Infrastructure Applications 49 Agentic Workflows 49 Model Context Protocol 52 Agent-to-Agent Protocol 57 The Rise of vLLMs and Multimodal LLMs 60 The LLMOps Question 62 Monitoring Application Performance 63 Measuring a Consumer LLM Application’s Performance 63 Choosing the Best Model for Your Application 67 Other Application Metrics 69 iv | Table of Contents

📄 Page 7

What Can You Control in an LLM-Based Application? 70 Prompt Engineering Is “Hard” 71 Did Our Prompt Engineering Produce Better Results? 72 LLM-Based Infrastructure Systems Are “Harder” 76 Conclusion 77 References 77 4. Data Engineering for LLMs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Data Engineering and the Rise of LLMs 79 The DataOps Engineer Role 82 Data Management 83 Synthetic Data 84 LLM Pipelines 84 Training an LLM 85 Data Composition 89 Scaling Laws 90 Data Repetition 91 Data Quality 91 A General Data-Preprocessing Pipeline for LLMs 94 Step 1: Catalog Your Data 94 Step 2: Check Privacy and Legal Compliance 94 Step 3: Filter the Data 95 Step 4: Perform Data Deduplication 96 Step 5: Collect Data 97 Step 6: Detect Encoding 97 Step 7: Detect Languages 97 Step 8: Chunking 97 Step 9: Back Up Your Data 98 Step 10: Perform Maintenance and Updates 98 Vectorization 98 Vector Databases 99 Maintaining Fresh Data 101 Generating the Fine-Tuning Dataset 101 Automatically Generating an Instruction Fine-Tuning Dataset 103 Conclusion 104 References 104 Further Reading 106 5. Model Domain Adaptation for LLM-Based Applications. . . . . . . . . . . . . . . . . . . . . . . . . 107 Training LLMs from Scratch 107 Step 1: Pick a Task 108 Table of Contents | v

📄 Page 8

Step 2: Prepare the Data 108 Step 3: Decide on the Model Architecture 108 Step 4: Set Up Your Training Infrastructure 110 Step 5: Implement Training 110 Model Ensembling Approaches 115 Model Averaging and Blending 115 Weighted Ensembling 116 Stacked Ensembling (Two-Stage Model) 116 Diverse Ensembles for Robustness 117 Multi-Step Decoding and Voting Mechanisms 117 Composability 118 Soft Actor–Critic 119 Model Domain Adaptation 121 Prompt Engineering 122 One-Shot Prompting 122 Few-Shot Prompting 123 Chain-of-Thought Prompting 123 Retrieval-Augmented Generation 124 Semantic Kernel 126 Fine-Tuning 127 Adaptive Fine-Tuning 128 Adapters (Single, Parallel, and Scaled Parallel) 128 Behavioral Fine-Tuning 129 Prefix Tuning 129 Parameter-Efficient Fine-Tuning 130 Instruction Tuning and Reinforcement Learning from Human Feedback 130 Choosing Between Fine-Tuning and Prompt Engineering 131 Mixture of Experts 132 Model Optimization for Resource-Constrained Devices 134 Lessons for Effective LLM Development 135 Scaling Law 136 Chinchilla Models 136 Learning-Rate Optimization 137 Speculative Sampling 138 Conclusion 138 References 138 6. API-First LLM Deployment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Deploying Your Model 141 Step 1: Set Up Your Environment 142 Step 2: Containerize the LLM 142 vi | Table of Contents

📄 Page 9

Step 3: Automate Pipelines with Jenkins 143 Step 4: Workflow Orchestration 143 Step 5: Set Up Monitoring 144 Developing APIs for LLMs 144 API-Led Architecture Strategies 145 REST APIs 145 API Implementation 146 Step 1: Define Your API’s Endpoints 146 Step 2: Choose an API Development Framework 146 Step 3: Test the API 147 Credential Management 148 API Gateways 148 API Versioning and Lifecycle Management 149 LLM Deployment Architectures 150 Modular and Monolithic Architectures 150 Implementing a Microservices-Based Architecture 150 Automating RAG with Retriever Re-ranker Pipelines 153 Automating Knowledge Graph Updates 155 Deployment Latency Optimization 157 Orchestrating Multiple Models 158 Optimizing RAG Pipelines 161 Asynchronous Querying 161 Combining Dense and Sparse Retrieval Methods 162 Cache Embeddings 163 Key–Value Caching 164 Scalability and Reusability 165 Conclusion 166 7. Evaluation for LLMs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Why Evaluation Is a Hard Problem 167 Evaluating Performance 170 Evaluating What Breaks Before It Breaks Everything 173 Metrics for RAG Applications 179 Metrics for Agentic Systems 181 General Evaluation Considerations 185 The Value of Automated Metrics 186 Model Drift 186 Traditional Metrics Aren’t Enough 187 The Observability Pipeline 188 Preprocessing and Prompt Construction 188 Retrieval in RAG Pipelines 189 Table of Contents | vii

📄 Page 10

LLM Inference 190 Postprocessing and Output Validation 191 Capturing Feedback 191 Conclusion 193 References 194 8. Governance: Monitoring, Privacy, and Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 The Data Issue: Scale and Sensitivity 196 Security Risks 198 Prompt Injection 198 Jailbreaking 200 Other Security Risks 201 Defensive Measures: LLMSecOps 202 Conducting an LLMSecOps Audit 202 Step 1: Define Scope and Objectives 205 Step 2: Gather Information 207 Step 3: Perform Risk Analysis and Threat Modeling 208 Step 4: Evaluate Security Controls and Compliance 210 Step 5: Perform Penetration Testing and/or Red Teaming 211 Step 6: Review the Training Data 212 Step 7: Assess Model Performance and Bias 212 Step 8: Document the Audit’s Findings and Recommendations 214 Step 9: Plan Ongoing Monitoring and Review 215 Step 10: Create a Communication and Remediation Plan 216 Safety and Ethical Guardrails 217 Conclusion 218 References 219 9. Scaling: Hardware, Infrastructure, and Resource Management. . . . . . . . . . . . . . . . . . . 221 Choosing the Right Approach 221 Scaling and Resource Allocation 222 Monitoring 222 A/B Testing and Shadow Testing for LLMs 225 Automatic Infrastructure Provisioning and Management 225 Provisioning and Management in Cloud Architectures 225 Provisioning and Management on Owned Hardware 226 Best Practices for Automatic Infrastructure Management 227 Scaling Law and the Compute-Optimal Argument 228 Optimizing LLM Infrastructure 230 Kernel Fusion 231 Precision Scaling 231 viii | Table of Contents

📄 Page 11

Hardware Utilization 232 Parallel and Distributed Computing for LLMs 232 Data Parallelism 233 Model Parallelism 233 Pipeline Parallelism 233 Advanced Frameworks: ZeRO and DeepSpeed 234 Backup and Failsafe Processes for LLM Applications 235 Types of Backup Strategies 236 The Most Important Practice: Test Restores Regularly 236 Conclusion 237 References 237 10. The Future of LLMs and LLMOps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Scaling Beyond Current Boundaries 242 Hybrid Architectures: Merging Neural Networks with Symbolic AI 244 Sparse and Mixture-of-Experts Models 245 Memory-Augmented Models: Toward Persistent, Context-Rich AI 245 Interpretable and Self-Optimizing Models 246 Cross-Model Collaboration, Meta-Learning, and Multi-Modal Fine-Tuning 246 RAG 247 The Future of LLMOps 247 Advances in GPU Technology 247 Data Management and Efficiency 249 Privacy and Security 249 Comprehensive Evaluation Frameworks 250 How to Succeed as an LLMOps Engineer 250 Conclusion 251 References 251 Further Reading 253 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Table of Contents | ix

📄 Page 12

(This page has no text content)

📄 Page 13

Preface I’ve lost count of how many times I’ve been asked, “What’s the difference between an LLM/AI engineer and an LLMOps engineer?” It’s one of those questions that keep popping up, whether I’m in a meeting, at a conference, or just grabbing coffee with someone in the field. I used to start by explaining the technical distinctions between the roles. But over time, I realized the real issue: people don’t fully grasp what it takes to keep the lights on with large language models (LLMs) in production over an extended period. As I write this in early 2025, the top models, techniques, and best practices are chang‐ ing every few days. Thus, very few people understand their complexity. Most people still think of operationalizing, or “Ops,” as deployment, but in the LLM context, Ops is really about streamlining people, processes, and technology to make these models secure, robust, and reliable in production. Enterprises and their human resource departments are scrambling to figure out what it all means for their teams and their projects, and in this book I have done my best to answer that question. This book isn’t a tutorial on defining roles or how to build and deploy an LLM; while it touches on both of those topics, that isn’t enough anymore. Once LLM-based applications are in production, someone has to keep them opti‐ mized, or they risk becoming overengineered solutions to simple problems or, worse, badly maintained houses of cards that crumble under high demand or a prompt injection attack. In traditional software development (or Software 2.0), you wouldn’t ask your lead developer to build and maintain your entire product. Software development engi‐ neers build, and reliability engineers maintain. Building and maintaining LLMs requires a similar separation of duties. In Software 3.0, LLM/AI engineers build and LLMOps engineers maintain! xi

📄 Page 14

Although machine learning operations (MLOps) are foundational to LLMOps, the MLOps skills that engineers gain from working on structured data and discriminative models don’t fully translate to generative models. In short, I’m writing this book to help you appreciate the unique aspects of the full LLM-based application lifecycle, from data engineering to model deployment and API design to monitoring, security, and resource optimization. I want to give you a strong foundation for making decisions as you build, maintain, and optimize your LLM data, models, and applications. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. xii | Preface

📄 Page 15

How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-827-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/LLMOps. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media. Watch us on YouTube: https://youtube.com/oreillymedia. Acknowledgments I would like to thank Lucas Meyer for the incredible support; his ideas have helped shape several chapters of this book. I would also like to thank my editors, Nicole and Sarah, for helping me push through the deadlines; and the technical reviewers, Lalit Chourey, Ammar Mohanna, Nirmal Budhathoki, for their excellent feedback. And most importantly, to my family—thank you for tolerating my obsession with this book and for your endless supply of tea. And finally, to you—the reader. Whether you’ve been in this field for years or you’re just getting started, I hope this book accel‐ erates your path. Preface | xiii

📄 Page 16

(This page has no text content)

📄 Page 17

CHAPTER 1 Introduction to Large Language Models The rise in popularity of large language models (LLMs) is no accident; they’re trans‐ forming how we interact with technology and pushing the boundaries of what machine learning models can do. But here’s the catch: while these models are impressive, scaling them up and manag‐ ing them in production is no walk in the park. The leap from a research project to a fully fledged, reliable tool is filled with obstacles. We’re talking about meeting enor‐ mous computational requirements, managing complex data, and ensuring that every‐ thing runs smoothly and securely whether you are self-hosting or using proprietary models. Before we dive into the nitty-gritty of LLM operations, it’s important to understand why and how these models came to be. Knowing their origins and trajectory helps us appreciate the challenges we face when predicting their behaviors in production. The evolution of LLMs reflects a series of incremental innovations, each addressing specific limitations of previous models. Early models were limited in scope and required extensive human input for even basic tasks. With advancements in architec‐ ture, such as the shift from recurrent neural networks (RNNs) to transformers, and the scaling of model sizes, LLMs have become more sophisticated. This evolution has brought about new challenges, such as managing massive amounts of data and ensur‐ ing efficient training processes. So, let’s get into it. 1

📄 Page 18

Some Key Terms There are three terms we should clarify before going any further: Foundation models Foundation models are advanced ML architectures that serve as the foundational building blocks for creating specialized models. They are pretrained on massive datasets, often consisting of text and recently including other data types such as code, images, audio, and video to develop general language comprehension and pattern recognition capabilities. These models encode statistical relationships and linguistic structures from their training data, forming a robust starting point for further fine-tuning. This fine-tuning tailors the models to specific tasks or applications, such as powering LLMs or other AI-driven solutions. Large language models Large language models are specialized implementations of foundation models that have undergone additional training or fine-tuning to excel in specific language- based tasks. These models are designed to predict and generate human-like text by analyzing and emulating natural language patterns. LLMs are highly versatile, supporting several natural language processing (NLP) applications such as text generation, sentiment analysis, language translation, question answering, and more. Popular use cases include chatbots, content creation, multilingual commu‐ nication, data analysis, code generation, recommendation systems, and virtual assistants. “Enterprise Use Cases for LLMs” on page 13 will look at these applica‐ tions in more detail. Generative AI models Generative AI, or GenAI, refers to foundation models that have been trained specifically to generate content (images, text, audio, or video) based on the pat‐ terns and information they have learned. Some of the earliest generative AI mod‐ els were generative adversarial networks (GANs), introduced in 2018; more recently, diffusion models, LLMs, and multimodal models like Gemini have become available. Given their generative nature, LLMs are considered a subset of generative AI models. In the context of LLMs, generative AI can generate text responses, creative stories, product descriptions, and more, based on input and learned patterns. Confusingly, these three terms are frequently used interchangeably and loosely. For example, a popular image generation model, DALL-E, is better categorized as a gen‐ erative AI model than as a large language model. Recently, however, the DALL-E image generation functionality has been integrated into the ChatGPT chatbot, one of the most popular LLM applications. Therefore, a user can ask an LLM like ChatGPT to generate images. Over time, the language seems to be evolving toward calling all of these AI models, for simplicity. 2 | Chapter 1: Introduction to Large Language Models

📄 Page 19

Transformer Models The transformer model, introduced by the paper “Attention Is All You Need,” marked one of the biggest shifts in how we approach sequence-based tasks. Transformers have set new standards in how to handle language data. Before transformers, the most popular solution for NLP tasks was recurrent neural networks. RNNs process data sequentially, one step at a time, which makes them suit‐ able for handling time-dependent data such as text. However, this sequential process‐ ing introduces a significant drawback: RNNs often struggle to retain information from earlier steps as they move forward in the sequence, especially over long inputs. During neural network training, the model processes input data and generates pre‐ dictions. These predictions are compared to the correct answers using a loss function, which calculates the error (how far the predictions are from the correct answers). An algorithm, such as backpropagation, calculates gradients: values that indicate how the model’s parameters (weights and biases) should be adjusted to reduce the error and improve accuracy. However, in long sequences like those handled by RNNs, gradients can become very small as they are repeatedly multiplied during backpropagation. Over time, these small values may shrink so much that computers treat them as zero, effectively stop‐ ping the model from learning. This issue is known as the vanishing gradient problem, and it prevents the model from learning long-term dependencies in the data. Transformers, on the other hand, overcome this limitation by using self-attention and parallel processing, allowing them to handle sequences more efficiently and capture long-range dependencies effectively. Instead of processing data one step at a time, transformers analyze all input tokens (e.g., words in a sentence) simultaneously. Self- attention is a mechanism that allows each word or token in a sequence to focus on other words in the same sequence, regardless of their position. This is achieved by calculating a set of attention weights that measure the relevance of each token in the sequence to every other token. For instance, in a sentence, self-attention can help a word like it to align itself with its correct reference, even if that reference is several words away. Thus, self-attention allows the model to weigh the importance of each token relative to others in the input, enabling it to capture relationships across the entire input sequence efficiently. This parallel processing not only speeds up compu‐ tation but also eliminates the issues associated with sequential processing, like the vanishing gradient problem. Thanks to their ability to manage long-range dependencies and handle vast amounts of data, transformer-based models excel in various NLP tasks, including translation, summarization, and question answering. Their ability to focus on different parts of the sequence regardless of their relative distance, along with positional encoding to Transformer Models | 3

📄 Page 20

retain sequence order, allows transformers to handle long sequences without losing context. Some people wondered, “Well, since they can be scaled much better now, how about we throw more computing power and a lot more data at these models to see what happens?” Models like GPT-3, LLaMA, and their successors demonstrated that increasing the number of parameters can significantly improve the performance of transformer models. Transformers have extended their influence beyond NLP into image processing with innovations like the vision transformer (ViT), which treats image patches as sequences and applies transformer models to them. ViT has shown promising results in image classification, offering a viable alternative to the previous solution, convolutional neural networks (CNNs). Additionally, in recommender systems, transformers’ abil‐ ity to model complex patterns and dependencies enhances accuracy and personaliza‐ tion. Table 1-1 compares the abilities of the neural network models we’ve discussed. Table 1-1. The evolution of different neural network models CNNs RNNs Transformers Application Best suited for spatial-based tasks (e.g., images) Well suited for sequence-based tasks (e.g., NLP) Well suited for capturing all three modalities: images, NLP, and speech Computation Highly parallelizable input processing Sequential processing Parallel processing of inputs Performance on language-specific tasks Need large number of stacked convolution blocks for handling long-range dependencies Can handle long-range dependencies much better than CNNs but can handle the dependencies well only to a given length Can handle long- to very-long- range dependencies much better than other architectures such as RNNs or LSTMs Scalability Scalable Limited scalability Highly scalable Data requirements Work well even on small datasets Work well even on small datasets Don’t work well on small datasets Ease of training Easy to train and tune Require more tuning than CNNs Difficult to train and tune Interpretability Easy to debug Difficult to debug Difficult to debug Deployment Easy to deploy Easy to deploy Difficult to deploy Small edge devices Works well on edge devices Works well on edge devices Limited support for edge devices Explainability Supports wide variety of explainability Limited explainability Very limited explainability This trend of throwing more compute and data at transformers is what sparked the evolution of LLMs, as well as the shift from an architecture that can do well on a sin‐ gle modality to one that generalizes on most modalities. Understanding this evolution can help you appreciate the differences in model architectures. 4 | Chapter 1: Introduction to Large Language Models

The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00

Total Amount (¥)

Donation Count

Recommended for You

Loading recommended books...

Failed to load, please try again later

← Back to List

LLMOps Managing Large Language Models in Production (Abi Aryan)（Z-Library）

📄 Text Preview (First 20 pages)

Registered users can read the full content for free

💝 Support Author

Recommended for You

{{title}}