(This page has no text content)
Praise for Quick Start Guide to Large Language Models “By balancing the potential of both open- and closed-source models, Quick Start Guide to Large Language Models stands as a comprehensive guide to understanding and using LLMs, bridging the gap between theoretical concepts and practical application.” —Giada Pistilli, Principal Ethicist at Hugging Face “A refreshing and inspiring resource. Jam-packed with practical guidance and clear explanations that leave you smarter about this incredible new field.” —Pete Huang, author of The Neuron “When it comes to building Large Language Models (LLMs), it can be a daunting task to find comprehensive resources that cover all the essential aspects. However, my search for such a resource recently came to an end when I discovered this book. “One of the stand-out features of Sinan is his ability to present complex concepts in a straightforward manner. The author has done an outstanding job of breaking down intricate ideas and algorithms, ensuring that readers can grasp them without feeling overwhelmed. Each topic is carefully explained, building upon examples that serve as steppingstones for better understanding. This approach greatly enhances the learning experience, making even the most intricate aspects of LLM development accessible to readers of varying skill levels. “Another strength of this book is the abundance of code resources. The inclusion of practical examples and code snippets is a game-changer for anyone who wants to experiment and apply the concepts they learn. These code resources provide readers with hands-on experience, allowing them to test and refine their understanding. This is an invaluable asset, as it fosters a deeper comprehension of the material and enables readers to truly engage with the content. “In conclusion, this book is a rare find for anyone interested in building LLMs. Its exceptional quality of explanation, clear and concise writing style, abundant code resources, and comprehensive coverage of all essential aspects make it an indispensable resource. Whether you are a beginner or an experienced practitioner, this book will undoubtedly elevate your understanding and practical skills in LLM development. I highly recommend Quick Start Guide to Large Language Models to anyone looking to embark on the exciting journey of building LLM applications.” —Pedro Marcelino, Machine Learning Engineer, Co-Founder and CEO @overfit.study “Ozdemir’s book cuts through the noise to help readers understand where the LLM revolution has come from—and where it is going. Ozdemir breaks down complex topics into practical explanations and easy to follow code examples.” —Shelia Gulati, Former GM at Microsoft and current Managing Director of Tola Capital
This page intentionally left blank
Quick Start Guide to Large Language Models
T he Pearson Addison-Wesley Data & Analytics Series provides readers with practical knowledge for solving problems and answering questions with data. Titles in this series primarily focus on three areas: 1. Infrastructure: how to store, move, and manage data 2. Algorithms: how to mine intelligence or make predictions based on data 3. Visualizations: how to represent data and insights in a meaningful and compelling way The series aims to tie all three of these areas together to help the reader build end-to-end systems for fighting spam; making recommendations; building personalization; detecting trends, patterns, or problems; and gaining insight from the data exhaust of systems and user interactions. Visit informit.com/awdataseries for a complete list of available publications. Make sure to connect with us! i n f o r m i t . c o m / c o n n e c t The Pearson Addison-Wesley Data & Analytics Series
Quick Start Guide to Large Language Models Strategies and Best Practices for Using ChatGPT and Other LLMs Sinan Ozdemir Hoboken, New Jersey
Cover image: ioat/Shutterstock Permissions and credits appear on page 252, which is a continuation of this copyright page. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trade- marks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The author and publisher have taken care in the preparation of this book, but make no expressed or implied war- ranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. For information about buying this title in bulk quantities, or for special sales opportunities (which may include electronic versions; custom cover designs; and content particular to your business, training goals, marketing focus, or branding interests), please contact our corporate sales department at corpsales@pearsoned.com or (800) 382-3419. For government sales inquiries, please contact governmentsales@pearsoned.com. For questions about sales outside the U.S., please contact intlcs@pearson.com. Visit us on the Web: informit.com/aw Library of Congress Control Number: 2023941567 Copyright © 2024 Pearson Education, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, request forms and the appropriate contacts within the Pearson Education Global Rights & Permissions Department, please visit www.pearson.com/permissions. ISBN-13: 978-0-13-819919-7 ISBN-10: 0-13-819919-1 $PrintCode
Pearson's Commitment to Diversity, Equity, and Inclusion Pearson is dedicated to creating bias-free content that reflects the diversity of all learners. We embrace the many dimensions of diversity, including but not limited to race, ethnicity, gender, socioeconomic status, ability, age, sexual orientation, and religious or political beliefs. Education is a powerful force for equity and change in our world. It has the potential to deliver opportunities that improve lives and enable economic mobility. As we work with authors to create content for every product and service, we acknowledge our responsibility to demonstrate inclusivity and incorporate diverse scholarship so that everyone can achieve their potential through learning. As the world’s leading learning company, we have a duty to help drive change and live up to our purpose to help more people create a better life for themselves and to create a better world. Our ambition is to purposefully contribute to a world where: Everyone has an equitable and lifelong opportunity to succeed through learning. Our educational products and services are inclusive and represent the rich diversity of learners. Our educational content accurately reflects the histories and experiences of the learners we serve. Our educational content prompts deeper discussions with learners and motivates them to expand their own learning (and worldview). While we work hard to present unbiased content, we want to hear from you about any concerns or needs with this Pearson product so that we can investigate and address them. Please contact us with concerns about any potential bias at https://www.pearson.com/report-bias.html.
This page intentionally left blank
Contents Foreword xv Preface xvii Acknowledgments xxi About the Author xxiii I Introduction to Large Language Models 1 1 Overview of Large Language Models 3 What Are Large Language Models? 4 Definition of LLMs 6 Key Characteristics of LLMs 8 How LLMs Work 9 Popular Modern LLMs 20 BERT 20 GPT-3 and ChatGPT 21 T5 21 Domain-Specific LLMs 22 Applications of LLMs 23 Classical NLP Tasks 24 Free-Text Generation 26 Information Retrieval/Neural Semantic Search 27 Chatbots 27 Summary 29 2 Semantic Search with LLMs 31 Introduction 31 The Task 32 Asymmetric Semantic Search 33 Solution Overview 34 The Components 35 Text Embedder 35 Document Chunking 40
Vector Databases 47 Pinecone 48 Open-Source Alternatives 48 Re-ranking the Retrieved Results 48 API 49 Putting It All Together 51 Performance 51 The Cost of Closed-Source Components 54 Summary 55 3 First Steps with Prompt Engineering 57 Introduction 57 Prompt Engineering 57 Alignment in Language Models 58 Just Ask 59 Few-Shot Learning 61 Output Structuring 62 Prompting Personas 63 Working with Prompts Across Models 65 ChatGPT 65 Cohere 65 Open-Source Prompt Engineering 66 Building a Q/A Bot with ChatGPT 69 Summary 74 II Getting the Most Out of LLMs 75 4 Optimizing LLMs with Customized Fine-Tuning 77 Introduction 77 Transfer Learning and Fine-Tuning: A Primer 78 The Fine-Tuning Process Explained 79 Closed-Source Pre-trained Models as a Foundation 80 A Look at the OpenAI Fine-Tuning API 82 The GPT-3 Fine-Tuning API 82 Case Study: Amazon Review Sentiment Classification 82 Guidelines and Best Practices for Data 83 Preparing Custom Examples with the OpenAI CLI 84 Setting Up the OpenAI CLI 87 Hyperparameter Selection and Optimization 87 Contentsx
xiContents Our First Fine-Tuned LLM 88 Evaluating Fine-Tuned Models with Quantitative Metrics 88 Qualitative Evaluation Techniques 91 Integrating Fine-Tuned GPT-3 Models into Applications 93 Case Study: Amazon Review Category Classification 93 Summary 95 5 Advanced Prompt Engineering 97 Introduction 97 Prompt Injection Attacks 97 Input/Output Validation 99 Example: Using NLI to Build Validation Pipelines 99 Batch Prompting 103 Prompt Chaining 104 Chaining as a Defense Against Prompt Injection 106 Chaining to Prevent Prompt Stuffing 107 Example: Chaining for Safety Using Multimodal LLMs 110 Chain-of-Thought Prompting 111 Example: Basic Arithmetic 112 Revisiting Few-Shot Learning 113 Example: Grade-School Arithmetic with LLMs 113 Testing and Iterative Prompt Development 123 Summary 124 6 Customizing Embeddings and Model Architectures 125 Introduction 125 Case Study: Building a Recommendation System 126 Setting Up the Problem and the Data 126 Defining the Problem of Recommendation 127 A 10,000-Foot View of Our Recommendation System 130 Generating a Custom Description Field to Compare Items 132 Setting a Baseline with Foundation Embedders 134
Contentsxii Preparing Our Fine-Tuning Data 135 Fine-Tuning Open-Source Embedders Using Sentence Transformers 139 Summary of Results 141 Summary 144 III Advanced LLM Usage 145 7 Moving Beyond Foundation Models 147 Introduction 147 Case Study: Visual Q/A 147 Introduction to Our Models: The Vision Transformer, GPT-2, and DistilBERT 149 Hidden States Projection and Fusion 152 Cross-Attention: What Is It, and Why Is It Critical? 153 Our Custom Multimodal Model 156 Our Data: Visual QA 159 The VQA Training Loop 160 Summary of Results 161 Case Study: Reinforcement Learning from Feedback 163 Our Model: FLAN-T5 165 Our Reward Model: Sentiment and Grammar Correctness 166 Transformer Reinforcement Learning 168 The RLF Training Loop 168 Summary of Results 172 Summary 173 8 Advanced Open-Source LLM Fine-Tuning 175 Introduction 175 Example: Anime Genre Multilabel Classification with BERT 176 Using the Jaccard Score to Measure Performance for Multilabel Genre Prediction of Anime Titles 176 A Simple Fine-Tuning Loop 178 General Tips for Fine-Tuning Open-Source LLMs 179 Summary of Results 187
xiiiContents Example: LaTeX Generation with GPT2 189 Prompt Engineering for Open-Source Models 191 Summary of Results 193 Sinan's Attempt at Wise Yet Engaging Responses: SAWYER 193 Step 1: Supervised Instruction Fine- Tuning 195 Step 2: Reward Model Training 197 Step 3: Reinforcement Learning from (Estimated) Human Feedback 201 Summary of Results 201 The Ever-Changing World of Fine-Tuning 206 Summary 207 9 Moving LLMs into Production 209 Introduction 209 Deploying Closed-Source LLMs to Production 209 Cost Projections 209 API Key Management 210 Deploying Open-Source LLMs to Production 210 Preparing a Model for Inference 210 Interoperability 211 Quantization 211 Pruning 212 Knowledge Distillation 212 Cost Projections with LLMs 221 Pushing to Hugging Face 221 Summary 225 Your Contributions Matter 226 Keep Going! 226 IV Appendices 227 A LLM FAQs 229 B LLM Glossary 233 C LLM Application Archetypes 239 Index 243
This page intentionally left blank
Foreword Though the use of Large Language Models (LLMs) has been growing the past five years, interest exploded with the release of OpenAI’s ChatGPT. The AI chatbot showcased the power of LLMs and introduced an easy-to-use interface that enabled people from all walks of life to take advantage of the game-changing tool. Now that this subset of natural language processing (NLP) has become one of the most discussed areas of machine learning, many people are looking to incorporate it into their own offerings. This technology actually feels like it could be artificial intelligence, even though it may just be predicting sequential tokens using a probabilistic model. The Quick Guide to Large Language Models is an excellent overview of both the concept of LLMs and how to use them on a practical level, both for programmers and non-programmers. The mix of explanations, visual representations, and practical code examples makes for an engaging and easy read that encourages you to keep turning the page. Sinan Ozdemir covers many topics in an engaging fashion, making this one of the best resources available to learn about LLMs, their capabilities, and how to engage with them to get the best results. Sinan deftly moves between different aspects of LLMs, giving the reader all the information they need to use LLMs effectively. Starting with the discussion of where LLMs sit within NLP and the explanation of transformers and encoders, he goes on to discuss transfer learning and fine- tuning, embeddings, attention, and tokenization in an approachable manner. He then covers many other aspects of LLMs, including the trade-offs between open-source and commercial options; how to make use of vector databases (a very popular topic in its own right); writing your own APIs with Fast API; creating embeddings; and putting LLMs into production, something that can prove challenging for any type of machine learning project. A great part of this book is the coverage of using both visual interfaces—such as ChatGPT— and programmatic interfaces. Sinan includes helpful Python code that is approachable and clearly illustrates what is being done. His coverage of prompt engineering illuminates how to get dramatically better results from LLMs and, better yet, he demonstrates how to provide those prompts both in the visual GUI and through the Python Open AI library. This book is so transformative that I was tempted to use ChatGPT to write this Foreword as a demonstration of everything I had learned. That is a testament to it being so well written, engaging, and informative. While I may have felt enabled to do so, I wrote the Foreword myself to articulate my thoughts and experiences about LLMs in the most authentic and personal way I knew. Except for the last part of that last sentence, that was written by ChatGPT, just because I could. For someone looking to learn about any of the many aspects of LLMs, this is the book. It will help you with your understanding of the models and how to effectively use them in your day-to- day life. Perhaps most importantly, you will enjoy the journey. —Jared Lander, Series Editor
This page intentionally left blank
Preface Hello! My name is Sinan Ozdemir. I’m a former theoretical mathematician turned university lecturer turned AI enthusiast turned successful startup founder/AI textbook author/venture capitalist advisor. Today I am also your tour guide through the vast museum of knowledge that is large language model (LLM) engineering and applications. The purposes of this book are twofold: to demystify the field of LLMs and to equip you with practical knowledge to be able to start experimenting, coding, and building with LLMs. But this isn’t a classroom, and I’m not your typical professor. I’m here not to shower you with complicated terminology. Instead, my aim is to make complex concepts digestible, relatable, and more importantly, applicable. Frankly, that’s enough about me. This book isn’t for me—it’s for you. I want to give you some tips on how to read this book, reread this book (if I did my job right), and make sure you are getting everything you need from this text. Audience and Prerequisites Who is this book for, you ask? Well, my answer is simple: anyone who shares a curiosity about LLMs, the willing coder, the relentless learner. Whether you’re already entrenched in machine learning or you’re on the edge, dipping your toes into this vast ocean, this book is your guide, your map to navigate the waters of LLMs. However, I’ll level with you: To get the most out of this journey, having some experience with machine learning and Python will be incredibly beneficial. That’s not to say you won’t survive without it, but the waters might seem a bit choppy without these tools. If you’re learning on the go, that’s great, too! Some of the concepts we’ll explore don’t necessarily require heavy coding, but most do. I’ve also tried to strike a balance in this book between deep theoretical understanding and practical hands-on skills. Each chapter is filled with analogies to make the complex simple, followed by code snippets to bring the concepts to life. In essence, I’ve written this book as your LLM lecturer + TA, aiming to simplify and demystify this fascinating field, rather than shower you with academic jargon. I want you to walk away from each chapter with a clearer understanding of the topic and knowledge of how to apply it in real-world scenarios. How to Approach This Book As just stated, if you have some experience with machine learning, you’ll find the journey a bit easier than if you are starting without it. Still, the path is open to anyone who can code in Python and is ready to learn. This book allows for different levels of involvement, depending on your
xviii Preface background, your aims, and your available time. You can dive deep into the practical sections, experimenting with the code and tweaking the models, or you can engage with the theoretical parts, getting a solid understanding of how LLMs function without writing a single line of code. The choice is yours. As you navigate through the book, remember that every chapter tends to build upon previous work. The knowledge and skills you gain in one section will become valuable tools in the subsequent ones. The challenges you will face are part of the learning process. You might find yourself puzzled, frustrated, and even stuck at times. When I was developing the visual question-answering (VQA) system for this book, I faced repeated failures. The model would spew out nonsense, the same phrases over and over again. But then, after countless iterations, it started generating meaningful output. That moment of triumph, the exhilaration of achieving a breakthrough, was worth every failed attempt. This book will offer you similar challenges and, consequently, similar triumphs. Overview The book is organized into four parts. Part I: Introduction to Large Language Models The Part I chapters provide an introduction to LLMs. Chapter 1: Overview of Large Language Models This chapter provides a broad overview of the world of LLMs. It covers the basics: what they are, how they work, and why they’re important. By the end of the chapter, you’ll have a solid foundation to understand the rest of the book. Chapter 2: Semantic Search with LLMs Building on the foundations laid in Chapter 1, Chapter 2 dives into how LLMs can be used for one of the most impactful applications of LLMs—semantic search. We will work on creating a search system that understands the meaning of your query rather than just matching keywords. Chapter 3: First Steps with Prompt Engineering The art and science of crafting effective prompts is essential for harnessing the power of LLMs. Chapter 3 provides a practical introduction to prompt engineering, with guidelines and techniques for getting the most out of your LLMs. Part II: Getting the Most Out of LLMs Part II steps things up another level. Chapter 4: Optimizing LLMs with Customized Fine-Tuning One size does not fit all in the world of LLMs. Chapter 4 covers how to fine-tune LLMs using your own datasets, with hands-on examples and exercises that will have you customizing models in no time.
xixPreface Chapter 5: Advanced Prompt Engineering We’ll take a deeper dive into the world of prompt engineering. Chapter 5 explores advanced strategies and techniques that can help you get even more out of your LLMs—for example, output validation and semantic few-shot learning. Chapter 6: Customizing Embeddings and Model Architectures In Chapter 6, we explore the more technical side of LLMs. We’ll cover how to modify model architectures and embeddings to better suit your specific use-cases and requirements. We will be adapting LLM architectures to fit our needs while fine-tuning a recommendation engine that outperforms OpenAI’s models. Part III: Advanced LLM Usage Chapter 7: Moving Beyond Foundation Models Chapter 7 explores some of the next-generation models and architectures that are pushing the boundaries of what’s possible with LLMs. We’ll combine multiple LLMs and establish a framework for building our own custom LLM architectures using PyTorch. This chapter also introduces the use of reinforcement learning from feedback to align LLMs to our needs. Chapter 8: Advanced Open-Source LLM Fine-Tuning Continuing from Chapter 7, Chapter 8 provides hands-on guidelines and examples for fine- tuning advanced open-source LLMs, with a focus on practical implementation. We’ll fine-tune LLMs using not only generic language modeling, but also advanced methods like reinforcement learning from feedback to create our very own instruction-aligned LLM—SAWYER. Chapter 9: Moving LLMs into Production This final chapter brings everything together by exploring the practical considerations of deploying LLMs in production environments. We’ll cover how to scale models, handle real-time requests, and ensure our models are robust and reliable. Part IV: Appendices The three appendices include a list of FAQs, a glossary of terms, and an LLM archetype reference. Appendix A: LLM FAQs As a consultant, engineer, and teacher, I get a lot of questions about LLMs on a daily basis. I compiled some of the more impactful questions here. Appendix B: LLM Glossary The glossary provides a high-level reference to some of the main terms used throughout this book. Appendix C: LLM Application Archetypes We build many applications using LLMs in this book, so Appendix C is meant to be a jumping-off point for anyone looking to build an application of their own. For some common applications of LLMs, this appendix will suggest which LLMs to focus on and which data you might need, as well as which common pitfalls you might face and how to deal with them.
Comments 0
Loading comments...
Reply to Comment
Edit Comment