Generative AI on AWS Building Context-Aware Multimodal Reasoning Applications Chris Fregly, Antje Barth & Shelbee Eigenbrode
DATA “I am very excited about this book—it has a great mix of all-important background/theoretical info and detailed, hands-on code, scripts, and walk-throughs. I enjoyed reading it, and I know that you will too!” —Jeff Barr VP and Chief Evangelist @ AWS Generative AI on AWS Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia Companies today are moving rapidly to integrate generative AI into their products and services. But there’s a great deal of hype (and misunderstanding) about the impact and promise of this technology. With this book, Chris Fregly, Antje Barth, and Shelbee Eigenbrode from AWS help CTOs, ML practitioners, application developers, business analysts, data engineers, and data scientists find practical ways to use this exciting new technology. You’ll learn the generative AI project life cycle including use case definition, model selection, model fine-tuning, retrieval-augmented generation, reinforcement learning from human feedback, and model quantization, optimization, and deployment. And you’ll explore different types of models including large language models (LLMs) and multimodal models such as Stable Diffusion for generating images and Flamingo/IDEFICS for answering questions about images. • Apply generative AI to your business use cases • Determine which generative AI models are best suited to your task • Perform prompt engineering and in-context learning • Fine-tune generative AI models on your datasets with low-rank adaptation (LoRA) • Align generative AI models to human values with reinforcement learning from human feedback (RLHF) • Augment your model with retrieval-augmented generation (RAG) • Explore libraries such as LangChain and ReAct to develop agents and actions • Build generative AI applications with Amazon Bedrock 9 7 8 1 0 9 8 1 5 9 2 2 1 5 7 9 9 9 US $79.99 CAN $99.99 ISBN: 978-1-098-15922-1 Chris Fregly is a Principal Solutions Architect for generative AI at Amazon Web Services and coauthor of Data Science on AWS (O’Reilly). Antje Barth is Principal Developer Advocate for generative AI at Amazon Web Services and coauthor of Data Science on AWS. Shelbee Eigenbrode is a Principal Solutions Architect for generative AI at Amazon Web Services. She holds over 35 patents across various technology domains.
Praise for Generative AI on AWS I am very excited about this book—it has a great mix of all-important background/ theoretical info and detailed, hands-on code, scripts, and walk-throughs. I enjoyed reading it, and I know that you will too! Starting from the basics, you will learn about generative foundation models, prompt engineering, and much more. From there you will proceed to large language models (LLMs) and will see how to use them from within Amazon SageMaker. After you master the basics, you will have the opportunity to learn about multiple types of fine-tuning, and then you will get to the heart of the book and learn to build applications that have the power to perform context-aware reasoning with generative models of different modalities including text and images. —Jeff Barr, VP and Chief Evangelist @ AWS This book is a comprehensive resource for building generative AI–based solutions on AWS. Using real-world examples, Chris, Antje, and Shelbee have done a spectacular job explaining key concepts, pitfalls, and best practices for LLMs and multimodal models. A very timely resource to accelerate your journey for building generative AI solutions from concept to production. —Geeta Chauhan, Applied AI Leader @ Meta In the process of developing and deploying a generative AI application, there are many complex decision points that collectively determine whether the application will produce high quality output and can be run in a cost-efficient, scalable, and reliable manner. This book demystifies the underlying technologies and provides thoughtful guidance to help readers understand and make these decisions, and ultimately launch successful generative AI applications. — Brent Rabowsky, Sr. Manager AI/ML Specialist SA @ AWS
It’s very rare to find a book that comprehensively covers the full end-to-end process of model development and deployment! If you’re an ML practitioner, this book is a must! —Alejandro Herrera, Data Scientist @ Snowflake This book goes deep into how GenAI models are actually built and used. And it covers the whole life cycle, not just prompt engineering or tuning. If you’re thinking about using GenAI for anything nontrivial, you should read this book to understand what skill sets and tools you’ll need to be successful. —Randy DeFauw, Sr. Principal Solution Architect @ AWS There’s no better book to get started with generative AI. With all the information on the internet about the topic, it’s extremely overwhelming for anyone. But this book is a clear and structured guide: it goes from the basics all the way to advanced topics like parameter-efficient fine-tuning and LLM deployment. It’s also very practical and covers deployment on AWS too. This book is an extremely valuable resource for any data scientist or engineer! —Alexey Grigorev, Principal Data Scientist @ OLX Group and Founder @ DataTalks.Club This is by far the best book I have come across that makes building generative AI very practical. Antje, Chris, and Shelbee put together an exceptional resource that will be very valuable for years—if possible, converted to a learning resource for universities. Definitely a must-read for anyone building generative AI applications at scale on AWS. —Olalekan Elesin, Director of Data Science Platform @ HRS Group If you’re looking for a robust learning foundation for building and deploying generative AI products or services, look no further than Generative AI on AWS. Guided by the deep expertise of authors Chris Fregly, Antje Barth, and Shelbee Eigenbrode, this book will transition you from a GenAI novice to a master of the intricate nuances involved in training, fine-tuning, and application development. This manual is an indispensable guide and true necessity for every budding AI engineer, product manager, marketer, or business leader. —Lillian Pierson, PE, Founder @ Data-Mania
Generative AI on AWS provides an in-depth look at the innovative techniques for creating applications that comprehend diverse data types and make context-driven decisions. Readers get a comprehensive view, bridging both the theoretical aspects and practical tools needed for generative AI applications. This book is a must-read for those wanting to harness the full potential of AWS in the realm of generative AI. —Kesha Williams, Director @ Slalom Consulting and AWS Machine Learning Hero The generative AI landscape evolves so fast that it’s incredible to see so much relevant knowledge condensed into a comprehensive book. Well done! —Francesco Mosconi, Head of Data Science @ Catalit
(This page has no text content)
Chris Fregly, Antje Barth, and Shelbee Eigenbrode Generative AI on AWS Building Context-Aware Multimodal Reasoning Applications Boston Farnham Sebastopol TokyoBeijing
978-1-098-15922-1 [LSI] Generative AI on AWS by Chris Fregly, Antje Barth, and Shelbee Eigenbrode Copyright © 2024 Flux Capacitor, LLC, Antje Barth, and Shelbee Eigenbrode. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Butterfield Development Editor: Sara Hunter Production Editor: Gregory Hyman Copyeditor: nSight, Inc. Proofreader: Tove Innis Indexer: Sue Klefstad Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea November 2023: First Edition Revision History for the First Edition 2023-11-13: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098159221 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Generative AI on AWS, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. Generative AI Use Cases, Fundamentals, and Project Life Cycle. . . . . . . . . . . . . . . . . . . . . 1 Use Cases and Tasks 1 Foundation Models and Model Hubs 4 Generative AI Project Life Cycle 5 Generative AI on AWS 8 Why Generative AI on AWS? 11 Building Generative AI Applications on AWS 12 Summary 13 2. Prompt Engineering and In-Context Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Prompts and Completions 15 Tokens 16 Prompt Engineering 16 Prompt Structure 18 Instruction 18 Context 18 In-Context Learning with Few-Shot Inference 20 Zero-Shot Inference 21 One-Shot Inference 21 Few-Shot Inference 22 In-Context Learning Gone Wrong 23 In-Context Learning Best Practices 23 Prompt-Engineering Best Practices 24 Inference Configuration Parameters 29 Summary 34 iii
3. Large-Language Foundation Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Large-Language Foundation Models 36 Tokenizers 37 Embedding Vectors 38 Transformer Architecture 40 Inputs and Context Window 42 Embedding Layer 42 Encoder 42 Self-Attention 42 Decoder 44 Softmax Output 44 Types of Transformer-Based Foundation Models 46 Pretraining Datasets 48 Scaling Laws 49 Compute-Optimal Models 51 Summary 52 4. Memory and Compute Optimizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Memory Challenges 55 Data Types and Numerical Precision 58 Quantization 59 fp16 60 bfloat16 62 fp8 64 int8 64 Optimizing the Self-Attention Layers 66 FlashAttention 67 Grouped-Query Attention 67 Distributed Computing 68 Distributed Data Parallel 69 Fully Sharded Data Parallel 70 Performance Comparison of FSDP over DDP 72 Distributed Computing on AWS 74 Fully Sharded Data Parallel with Amazon SageMaker 75 AWS Neuron SDK and AWS Trainium 77 Summary 77 5. Fine-Tuning and Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Instruction Fine-Tuning 80 Llama 2-Chat 80 Falcon-Chat 80 FLAN-T5 80 iv | Table of Contents
Instruction Dataset 81 Multitask Instruction Dataset 81 FLAN: Example Multitask Instruction Dataset 82 Prompt Template 83 Convert a Custom Dataset into an Instruction Dataset 84 Instruction Fine-Tuning 86 Amazon SageMaker Studio 87 Amazon SageMaker JumpStart 88 Amazon SageMaker Estimator for Hugging Face 89 Evaluation 90 Evaluation Metrics 91 Benchmarks and Datasets 92 Summary 94 6. Parameter-Efficient Fine-Tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Full Fine-Tuning Versus PEFT 96 LoRA and QLoRA 98 LoRA Fundamentals 99 Rank 100 Target Modules and Layers 100 Applying LoRA 101 Merging LoRA Adapter with Original Model 103 Maintaining Separate LoRA Adapters 104 Full-Fine Tuning Versus LoRA Performance 104 QLoRA 105 Prompt Tuning and Soft Prompts 106 Summary 109 7. Fine-Tuning with Reinforcement Learning from Human Feedback. . . . . . . . . . . . . . . . 111 Human Alignment: Helpful, Honest, and Harmless 112 Reinforcement Learning Overview 112 Train a Custom Reward Model 115 Collect Training Dataset with Human-in-the-Loop 115 Sample Instructions for Human Labelers 116 Using Amazon SageMaker Ground Truth for Human Annotations 116 Prepare Ranking Data to Train a Reward Model 118 Train the Reward Model 121 Existing Reward Model: Toxicity Detector by Meta 123 Fine-Tune with Reinforcement Learning from Human Feedback 124 Using the Reward Model with RLHF 125 Proximal Policy Optimization RL Algorithm 126 Perform RLHF Fine-Tuning with PPO 126 Table of Contents | v
Mitigate Reward Hacking 128 Using Parameter-Efficient Fine-Tuning with RLHF 130 Evaluate RLHF Fine-Tuned Model 131 Qualitative Evaluation 131 Quantitative Evaluation 132 Load Evaluation Model 133 Define Evaluation-Metric Aggregation Function 133 Compare Evaluation Metrics Before and After 134 Summary 135 8. Model Deployment Optimizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Model Optimizations for Inference 137 Pruning 139 Post-Training Quantization with GPTQ 140 Distillation 142 Large Model Inference Container 144 AWS Inferentia: Purpose-Built Hardware for Inference 145 Model Update and Deployment Strategies 147 A/B Testing 148 Shadow Deployment 149 Metrics and Monitoring 151 Autoscaling 152 Autoscaling Policies 152 Define an Autoscaling Policy 153 Summary 154 9. Context-Aware Reasoning Applications Using RAG and Agents. . . . . . . . . . . . . . . . . . . 155 Large Language Model Limitations 156 Hallucination 157 Knowledge Cutoff 157 Retrieval-Augmented Generation 158 External Sources of Knowledge 159 RAG Workflow 160 Document Loading 161 Chunking 162 Document Retrieval and Reranking 163 Prompt Augmentation 164 RAG Orchestration and Implementation 165 Document Loading and Chunking 166 Embedding Vector Store and Retrieval 168 Retrieval Chains 171 Reranking with Maximum Marginal Relevance 173 vi | Table of Contents
Agents 174 ReAct Framework 176 Program-Aided Language Framework 178 Generative AI Applications 181 FMOps: Operationalizing the Generative AI Project Life Cycle 187 Experimentation Considerations 188 Development Considerations 190 Production Deployment Considerations 192 Summary 193 10. Multimodal Foundation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Use Cases 196 Multimodal Prompt Engineering Best Practices 197 Image Generation and Enhancement 198 Image Generation 198 Image Editing and Enhancement 199 Inpainting, Outpainting, Depth-to-Image 204 Inpainting 204 Outpainting 206 Depth-to-Image 207 Image Captioning and Visual Question Answering 209 Image Captioning 211 Content Moderation 211 Visual Question Answering 211 Model Evaluation 216 Text-to-Image Generative Tasks 216 Forward Diffusion 219 Nonverbal Reasoning 219 Diffusion Architecture Fundamentals 221 Forward Diffusion 221 Reverse Diffusion 222 U-Net 223 Stable Diffusion 2 Architecture 224 Text Encoder 225 U-Net and Diffusion Process 226 Text Conditioning 228 Cross-Attention 228 Scheduler 229 Image Decoder 229 Stable Diffusion XL Architecture 230 U-Net and Cross-Attention 230 Refiner 230 Table of Contents | vii
Conditioning 231 Summary 233 11. Controlled Generation and Fine-Tuning with Stable Diffusion. . . . . . . . . . . . . . . . . . . . 235 ControlNet 235 Fine-Tuning 240 DreamBooth 241 DreamBooth and PEFT-LoRA 243 Textual Inversion 245 Human Alignment with Reinforcement Learning from Human Feedback 249 Summary 252 12. Amazon Bedrock: Managed Service for Generative AI. . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Bedrock Foundation Models 253 Amazon Titan Foundation Models 254 Stable Diffusion Foundation Models from Stability AI 254 Bedrock Inference APIs 254 Large Language Models 256 Generate SQL Code 257 Summarize Text 257 Embeddings 258 Fine-Tuning 261 Agents 264 Multimodal Models 267 Create Images from Text 267 Create Images from Images 269 Data Privacy and Network Security 270 Governance and Monitoring 272 Summary 272 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 viii | Table of Contents
Preface After reading this book, you will understand the most common generative AI use cases and tasks addressed by industry and academia today. You will gain in-depth knowledge of how these cutting-edge generative models are built, as well as practical experience to help you choose between reusing an existing generative model or building one from scratch. You will then learn to adapt these generative AI models to your domain-specific datasets, tasks, and use cases that support your business applications. This book is meant for AI/ML enthusiasts, data scientists, and engineers who want to learn the technical foundations and best practices for generative AI model train‐ ing, fine-tuning, and deploying into production. We assume that you are already familiar with Python and basic deep-learning components like neural networks, forward propagation, activations, gradients, and back propagations to understand the concepts used here. A basic understanding of Python and deep learning frameworks such as TensorFlow or PyTorch should be sufficient to understand the code samples used throughout the book. Familiarity with AWS is not required to learn the concepts, but it is useful for some of the AWS-specific samples. You will dive deep into the generative AI life cycle and learn topics such as prompt engineering, few-shot in-context learning, generative model pretraining, domain adaptation, model evaluation, parameter-efficient fine-tuning (PEFT), and reinforce‐ ment learning from human feedback (RLHF). You will get hands-on with popular large language models such as Llama 2 and Falcon as well as multimodal generative models, including Stable Diffusion and IDEFICS. You will access these foundation models through the Hugging Face Model Hub, Amazon SageMaker JumpStart, or Amazon Bedrock managed service for gener‐ ative AI. ix
1 Patrick Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”, arXiv, 2021. 2 Jason Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, arXiv, 2022. 3 Shunyu Yao et al., “ReAct: Synergizing Reasoning and Acting in Language Models”, arXiv, 2023. You will also learn how to implement context-aware retrieval-augmented generation (RAG)1 and agent-based reasoning workflows.2 You will explore application frame‐ works and libraries, including LangChain, ReAct,3 and Program-Aided-Language models (PAL). You can use these frameworks and libraries to access your own custom data sources and APIs or integrate with external data sources such as web search and partner data systems. Lastly, you will explore all of these generative concepts, frameworks, and libraries in the context of multimodal generative AI use cases across different content modalities such as text, images, audio, and video. And don’t worry if you don’t understand all of these concepts just yet. Throughout the book, you will dive into each of these topics in much more detail. With all of this knowledge and hands-on experience, you can start building cutting-edge generative AI applications that help delight your customers, outperform your competition, and increase your revenue! Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Used to call attention to snippets of interest in code blocks, as well as to differen‐ tiate among multiple speakers in dialogue, or between the human user and the AI assistant. x | Preface
This element signifies a tip or suggestion. This element signifies a general note. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://oreil.ly/generative-ai-on-aws-code. If you have a technical question or a problem using the code examples, please send email to support@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Generative AI on AWS by Chris Fregly, Antje Barth, and Shelbee Eigenbrode (O’Reilly). Copyright 2024 Flux Capacitor, LLC, Antje Barth, and Shelbee Eigenbrode, 978-1-098-15922-1.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. Preface | xi
O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-829-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://www.oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/generative-ai-on-aws. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media Follow us on Twitter: https://twitter.com/oreillymedia Watch us on YouTube: https://youtube.com/oreillymedia xii | Preface
Acknowledgments We’d like to thank all of our reviewers, including Brent Rabowsky, Randy DeFauw, Sean Owen, Akhil Behl, and Sireesha Muppala, PhD. Your feedback was critical to the narrative that we followed in this book. Additionally, your guidance and intuition helped us modulate the technical depth of the code examples we included. Chris I dedicate this book to my mom, who has always inspired me to share knowledge with others. In addition, you have always listened patiently as I navigate life, question things, and seek answers. Antje I would like to thank my family for providing a great education and supporting me throughout my professional endeavors. In particular, I want to thank my brother, Kai, who bought me my first laptop and made sure I had the right tools for university. This was the initial catalyst to my career in computer science. Shelbee To my husband, Steve, and daughter, Emily, for always being “my why” and for their continued support, especially the late nights and long weekends writing this book. I also want to thank my dog, Molly, for sitting patiently while I took pictures of her to use as input for some of the multimodal models in this book! Preface | xiii
(This page has no text content)
Comments 0
Loading comments...
Reply to Comment
Edit Comment