M A N N I N G Amit Bahree Foreword by Eric Boyd
2 EPILOGUE Input text (prompt) Token Embedding Encoder Decoder ………………….. ………………….. ………………….. Generated text (completion) Numerical representation Needed for scenarios such as “Bring your own data,” search, etc. LLM ………………….. ………………….. ………………….. ………………….. ………………….. …………………. LLM Input token vector Vector representation of next output token the mat pad … … … Highest probability Second highest probability Less likely Next word ……… ……… The dog sat on ……… ……… ……… ……… ……… ……… ……… ……… Conceptual architecture of an LLM LLM – Next token predictor
Generative AI in Action AMIT BAHREE FOREWORD BY ERIC BOYD M A N N I N G SHELTER ISLAND
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2024 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. The author and publisher have made every effort to ensure that the information in this book was correct at press time. The author and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Manning Publications Co. Development editor: Rebecca Johnson 20 Baldwin Road Technical editor: Wee Hyong Tok PO Box 761 Review editor: Radmila Ercegovac Shelter Island, NY 11964 Production editor: Kathy Rossland Copy editor: Lana Todorovic-Arndt Proofreader: Melody Dolab Technical proofreader: John Aziz Typesetter and cover designer: Marija Tudor ISBN 9781633436947 Printed in the United States of America
To my family, who patiently listened to my tech rambles, although they were no help in writing this book and will never read it, and to you, dear reader, who boldly chose to engage with these ideas— may your neurons spark joy and your circuits never short. Together, let’s build a future where AI is more brains than brawn.
brief contents PART 1 FOUNDATIONS OF GENERATIVE AI ................................... 1 1 ■ Introduction to generative AI 3 2 ■ Introduction to large language models 26 3 ■ Working through an API: Generating text 57 4 ■ From pixels to pictures: Generating images 96 5 ■ What else can AI generate? 127 PART 2 ADVANCED TECHNIQUES AND APPLICATIONS 153 6 ■ Guide to prompt engineering 155 7 ■ Retrieval-augmented generation: The secret weapon 183 8 ■ Chatting with your data 213 9 ■ Tailoring models with model adaptation and fine-tuning 242 PART 3 DEPLOYMENT AND ETHICAL CONSIDERATIONS 281 10 ■ Application architecture for generative AI apps 283 11 ■ Scaling up: Best practices for production deployment 321 12 ■ Evaluations and benchmarks 357 13 ■ Guide to ethical GenAI: Principles, practices, and pitfalls 384 iv
contents foreword xii preface xiv acknowledgments xvi about this book xviii about the author xxiii about the cover illustration xxiv PART 1 FOUNDATIONS OF GENERATIVE AI .................... 1 1 Introduction to generative AI 3 1.1 What is this book about? 5 1.2 What is generative AI? 6 1.3 What can we generate? 9 Entities extraction 9 ■ Generating text 10 ■ Generating images 12 ■ Generating code 12 ■ Ability to solve logic problems 14 ■ Generating music 15 ■ Generating videos 17 1.4 Enterprise use cases 17 1.5 When not to use generative AI 19 1.6 How is generative AI different from traditional AI? 19 1.7 What approach should enterprises take? 21 1.8 Architecture considerations 23 1.9 So your enterprise wants to use generative AI. Now what? 24v
CONTENTSvi2 Introduction to large language models 26 2.1 Overview of foundational models 27 2.2 Overview of LLMs 29 2.3 Transformer architecture 30 2.4 Training cutoff 31 2.5 Types of LLMs 31 2.6 Small language models 33 2.7 Open source vs. commercial LLMs 35 Commercial LLMs 36 ■ Open source LLMs 36 2.8 Key concepts of LLMs 38 Prompts 39 ■ Tokens 40 ■ Counting tokens 42 Embeddings 45 ■ Model configuration 47 ■ Context window 50 ■ Prompt engineering 51 ■ Model adaptation 52 Emergent behavior 52 3 Working through an API: Generating text 57 3.1 Model categories 58 Dependencies 60 ■ Listing models 62 3.2 Completion API 64 Expanding completions 67 ■ Azure content safety filter 68 Multiple completions 69 ■ Controlling randomness 71 Controlling randomness using top_p 74 3.3 Advanced completion API options 75 Streaming completions 75 ■ Influencing token probabilities: logit_bias 77 ■ Presence and frequency penalties 80 Log probabilities 82 3.4 Chat completion API 84 System role 86 ■ Finish reason 88 ■ Chat completion API for nonchat scenarios 88 ■ Managing conversation 89 Best practices for managing tokens 92 ■ Additional LLM providers 93 4 From pixels to pictures: Generating images 96 4.1 Vision models 97 Variational autoencoders 100 ■ Generative adversarial networks 101 ■ Vision transformer models 102 Diffusion models 104 ■ Multimodal models 106 4.2 Image generation with Stable Diffusion 109 Dependencies 109 ■ Generating an image 111
CONTENTS vii4.3 Image generation with other providers 114 OpenAI DALLE 3 114 ■ Bing image creator 114 Adobe Firefly 115 4.4 Editing and enhancing images using Stable Diffusion 116 Generating using image-to-image API 119 ■ Using the masking API 121 ■ Resize using the upscale API 124 ■ Image generation tips 125 5 What else can AI generate? 127 5.1 Code generation 128 Can I trust the code? 130 ■ GitHub Copilot 132 How Copilot works 135 5.2 Additional code-related tasks 136 Code explanation 136 ■ Generate tests 138 ■ Code referencing 139 ■ Code refactoring 140 5.3 Other code generation tools 140 Amazon CodeWhisperer 141 ■ Code Llama 142 Tabnine 143 ■ Check yourself 145 ■ Best practices for code generation 145 5.4 Video generation 146 5.5 Audio and music generation 149 PART 2 ADVANCED TECHNIQUES AND APPLICATIONS 153 6 Guide to prompt engineering 155 6.1 What is prompt engineering? 156 Why do we need prompt engineering? 156 6.2 The basics of prompt engineering 158 6.3 In-context learning and prompting 161 6.4 Prompt engineering techniques 163 System message 163 ■ Zero-shot, few-shot, and many-shot learning 166 ■ Use clear syntax 168 ■ Making in-context learning work 169 ■ Reasoning: Chain of Thought 170 Self-consistency sampling 173 6.5 Image prompting 175 6.6 Prompt injection 176 6.7 Prompt engineering challenges 179
CONTENTSviii6.8 Best practices 180 7 Retrieval-augmented generation: The secret weapon 183 7.1 What is RAG? 184 7.2 RAG benefits 185 7.3 RAG architecture 187 7.4 Retriever system 188 7.5 Understanding vector databases 190 What is a vector index? 191 ■ Vector search 191 7.6 RAG challenges 194 7.7 Overcoming challenges for chunking 195 Chunking strategies 196 ■ Factors affecting chunking strategies 197 ■ Handling unknown complexities 200 Chunking sentences 201 ■ Chunking using natural language processing 203 7.8 Chunking PDFs 208 8 Chatting with your data 213 8.1 Advantages to enterprises using their data 214 What about large context windows? 214 ■ Building a chat application using our data 215 8.2 Using a vector database 216 8.3 Planning for retrieving the information 220 8.4 Retrieving the data 227 Retriever pipeline best practices 230 8.5 Search using Redis 232 8.6 An end-to-end chat implementation powered by RAG 234 8.7 Using Azure OpenAI on your data 237 8.8 Benefits of bringing your data using RAG 240 9 Tailoring models with model adaptation and fine-tuning 242 9.1 What is model adaptation? 244 Basics of model adaptation 244 ■ Advantages and challenges for enterprises 245 9.2 When to fine-tune an LLM 247 Key stages of fine-tuning an LLM 248
CONTENTS ix9.3 Fine-tuning OpenAI models 249 Preparing a dataset for fine-tuning 250 ■ LLM evaluation 254 ■ Fine-tuning 257 ■ Fine-tuning training metrics 261 ■ Fine-tuning using Azure OpenAI 264 9.4 Deployment of a fine-tuned model 266 Inference: Fine-tuned model 267 9.5 Training an LLM 269 Pretraining 269 ■ Supervised fine-tuning 270 ■ Reward modeling 270 ■ Reinforcement learning 270 ■ Direct policy optimization 270 9.6 Model adaptation techniques 271 Low-rank adaptation 273 9.7 RLHF overview 275 Challenges with RLHF 278 ■ Scaling an RLHF implementation 279 PART 3 DEPLOYMENT AND ETHICAL CONSIDERATIONS ......................................... 281 10 Application architecture for generative AI apps 283 10.1 Generative AI: Application architecture 284 Software 2.0 285 ■ The era of copilots 285 10.2 Generative AI: Application stack 286 Integrating the GenAI stack 288 ■ GenAI architecture principles 289 ■ GenAI application architecture: A detailed view 291 10.3 Orchestration layer 293 Benefits of an orchestration framework 294 ■ Orchestration frameworks 296 ■ Managing operations 297 ■ Prompt management 307 10.4 Grounding layer 308 Data integration and preprocessing 308 ■ Embeddings and vector management 310 10.5 Model layer 312 Model ensemble architecture 312 ■ Model serving 318 10.6 Response filtering 318
CONTENTSx11 Scaling up: Best practices for production deployment 321 11.1 Challenges for production deployments 322 11.2 Deployment options 325 11.3 Managed LLMs via API 325 11.4 Best practices for production deployment 326 Metrics for LLM inference 327 ■ Latency 328 Scalability 331 ■ PAYGO 333 ■ Quotas and rate limits 333 Managing quota 335 ■ Observability 337 ■ Security and compliance considerations 345 11.5 GenAI operational considerations 346 Reliability and performance considerations 346 ■ Managed identities 347 ■ Caching 349 11.6 LLMOps and MLOps 352 11.7 Checklist for production deployment 354 12 Evaluations and benchmarks 357 12.1 LLM evaluations 358 12.2 Traditional evaluation metrics 359 BLEU 360 ■ ROUGE 360 ■ BERTScore 361 An example of traditional metric evaluation 361 12.3 LLM task-specific benchmarks 364 G-Eval: A measuring approach for NLG evaluation 366 ■ An example of LLM-based evaluation metrics 368 ■ HELM 372 HEIM 373 ■ HellaSWAG 374 ■ Massive Multitask Language Understanding 375 ■ Using Azure AI Studio for evaluations 376 ■ DeepEval: An LLM evaluation framework 377 12.4 New evaluation benchmarks 378 SWE-bench 378 ■ MMMU 379 ■ MoCa 380 HaluEval 381 12.5 Human evaluation 381 13 Guide to ethical GenAI: Principles, practices, and pitfalls 384 13.1 GenAI risks 385 LLM limitations 386 ■ Hallucination 387 13.2 Understanding GenAI attacks 388 Prompt injection 389 ■ Insecure output handling example 394 Model denial of service 395 ■ Data poisoning and
CONTENTS xibackdoors 396 ■ Sensitive information disclosure 396 Overreliance 397 ■ Model theft 398 13.3 A responsible AI lifecycle 399 Identifying harms 401 ■ Measure and evaluate harms 402 Mitigate harms 403 ■ Transparency and explainability 405 13.4 Red-teaming 406 Red-teaming example 407 ■ Red-teaming tools and techniques 408 13.5 Content safety 411 Azure Content Safety 412 ■ Google Perspective API 418 Evaluating content filters 420 appendix A The book’s GitHub repository 423 appendix B Responsible AI tools 424 References 429 index 433
foreword Generative AI is a transformative force for technology and society. Generative AI in Action, written by Amit Bahree, is a must-read for anyone who wants to build the appli- cations and services that are the future of software. This practical and interesting book introduces the basics of generative AI, diving deep into large language models, the backbone of many generative AI applications, discussing their architecture, training, and various use cases. Written for practitioners, it provides detailed guidance on working through APIs for text generation, a core application of generative AI. You’ll enjoy the examples demonstrating the generation of images, code, and even music, showcasing the versatility of these models. Included prompt-engineering techniques are particularly valuable, offering readers strategies to optimize their interactions with AI models. Amit’s clear explanations and step-by- step instructions make even the advanced topics accessible and actionable. Generative AI in Action doesn’t stop at the technical aspects. You’ll also explore the operational challenges of deploying generative AI at scale, providing best practices for production environments. These include architecture considerations, performance optimization, and maintenance strategies, ensuring the insights are theoretical and actionable. The discussions on responsible AI practices, including fairness, transpar- ency, and security, are essential reading for anyone deploying AI technologies in real- world scenarios. Because every topic is grounded in real-world applications, the theo- retical concepts become tangible and relevant. Amit’s extensive experience and expertise in AI and machine learning are evident throughout this book. His ability to simplify complex topics makes this book an invalu- able resource for newcomers and seasoned professionals.xii
FOREWORD xiii In Generative AI in Action, Amit has created a comprehensive and accessible guide that makes this transformative technology approachable and practical. Whether you are a developer, data scientist, or business leader, this book will equip you with the knowledge and tools to effectively harness the power of generative AI. —ERIC BOYD CVP ENGINEERING, AI PLATFORM, MICROSOFT
preface With nearly 30 years of experience as a developer and applied researcher, I have been involved in fundamental technology shifts from the early days. Generative artificial intelligence (AI) is one of those areas where the hype and the fear of missing out reach stratospheric levels! Organizations are trying to understand this new technology and how to implement it. Some of this means trying to gain an edge; in other cases, it is responding to the market and the pressure from the board and CEOs to join the trend. At Microsoft, I have the privilege of being part of the Azure AI platform engineer- ing team, helping develop some of our advanced AI technologies, such as Azure OpenAI, and Azure AI Services, including speech, vision, and small language models (e.g., the new Phi family of models). Part of my role has been collaborating with many Fortune 500 companies that are our clients. These companies are scattered around the world, representing different industry domains, with many of them being leaders in their fields. My experience with GenAI across various domains and applications, particularly in collaboration with Fortune 500 companies, has revealed that there is a gap between the hype and the reality of generative AI. I’ve noticed that many users and customers are confused or intimidated by the complexity and challenges of this field. In response, I set out to write a book to bridge this gap, providing a practical and accessi- ble guide to generative AI. This guide empowers anyone, regardless of background, to learn and apply generative AI effectively. The technology industry is known for its rapid pace, but the field of GenAI is grow- ing even faster, and we see changes in weeks rather than months and years. While I was writing this book, the technology advanced, and I have had to update many of the new areas in the book several times. However, the basics of GenAI and large languagexiv
PREFACE xvmodels (LLM) remain novel and crucial to grasp. These are the building blocks on which new areas are being developed. Understanding these fundamentals is not just a goal of the book but a necessity in this rapidly evolving field. This book focuses on generative AI aspects, especially LLMs, which are often the most common use cases. I expect newer models with additional multimodal capabilities that combine vision, speech, and video will grow in the future. Here, we’ll mainly use OpenAI and Azure OpenAI, but I also show other providers’ examples. Most LLM pro- viders are similar to OpenAI, so the book is beneficial even if you use a different pro- vider. I also used Python for the examples, as it is easy and common in AI. In addition, there are SDKs for most languages and REST APIs that you can call in any language. Welcome to Generative AI in Action, a book aiming to demystify the generative AI field and help you apply it to your projects. I am excited to share some insights from my learning and assist you on your path.
acknowledgments First and foremost, I want to thank my parents for letting me disappear into the “com- puter room” to tinker with those amazing machines and for buying me my first com- puter. I also thank my wife, Meenakshi, for putting up with me, especially when I conveniently ignored most other things and worked through the graveyard shift after long days to write the book and code. To my daughter Maya, I thank you for never doubting my literal and coding abilities (even if it came with a teenager’s eye roll). This book would not be complete without my dog, Champ, who, as you will see, is a recurring theme. And finally, I thank my dear friend Somya for showing us what true courage looks like and reminding us that most of life’s dramas are just things we get ourselves worked up over. I thank Eric Boyd for writing the foreword and for his time and collaboration on this project. Working under his guidance on the Azure AI team has been an exhilarat- ing experience. Pushing the limits of technology and rekindling that childlike excite- ment in all of us—it reminds me why I fell in love with computers and programming in the first place. A special thanks goes to Wee Hyong Tok, the technical editor of this book, for his incredible time spent assisting, directing, challenging, and verifying everything. Your efforts have been invaluable in my learning and in improving this book! Wee Hyong is a partner director of product at Microsoft. He has a PhD in computer science from the National University of Singapore and is a recognized expert on data and AI. He has also authored over 10 books on AI. To all the reviewers—Amit Basnak, Andres Sacco, Arun Kandregula, Bruno Ricardo Santos, Dan Sheikh, Erim Ertürk, Gregory V, Hariskumar Panakkal, Ike Okonkwo, James Coates, Julien Pohie, Lokesh Kumar, Louis Luangkesorn, Luiz Davi, Manish Jain, Matteo Battista, Maxim Volgin, Nathan B. Crocker, Pradeep Bhattiprolu,xvi
ACKNOWLEDGMENTS xviiRadhakrishna MV, Raj Kumar, Rambabu Posa, Roy Wilsker, Rui Liu, Sanjeev Jaiswal, Scott Ling, Simon Verhoeven, Sumit Pal, Sushil Singh, Swaminathan Subramanian, Swapneelkumar Deshpande, Victor Durán, and Weronika Burman—your suggestions helped make this a better book. Finally, I would like to thank the team at Manning. I have immense empathy and gratitude for my development editor, Rebecca Johnson, and acquisitions editor, Mike Stephens. Rebecca especially deserves a medal for making sense of my initial drafts and turning gibberish into coherent content. Thank you all for your patience and dedication!
about this book Generative AI in Action is designed to equip enterprise professionals and enthusiasts with the knowledge and skills to effectively use generative AI technologies. This book provides a comprehensive understanding of generative AI, covering its fundamental principles, practical applications, and the challenges associated with implementing it in real-world scenarios. The book teaches you how to create and use generative models for tasks and use cases. It focuses on this technology’s practical and hands-on aspects and how it works. It does not dive deep into the science, but it references the papers and scientific breakthroughs that have helped develop some of the technology—you can see these at the end of the book. This book is designed to provide a comprehensive understanding of generative AI and its potential within an enterprise context. It explores foundational models, large language models, and related algorithms and architectures, offering readers a thor- ough grasp of these advanced technologies. Practical insights and examples are pro- vided to help develop and deploy generative AI models, ensuring that readers can apply these concepts in real-world scenarios. Advanced topics such as prompt engineering, retrieval-augmented generation, and model adaptation are discussed in detail, giving readers an in-depth understand- ing of these cutting-edge techniques. The book also highlights best practices for inte- grating generative AI into existing systems and workflows, ensuring a smooth and efficient implementation. Furthermore, it addresses the ethical considerations, gover- nance, and safety measures necessary for responsible AI deployment, guiding readers on how to responsibly navigate the complexities of this rapidly evolving field.xviii
Comments 0
Loading comments...
Reply to Comment
Edit Comment