📄 Page
1
Large Language Models Projects Apply and Implement Strategies for Large Language Models — Pere Martra
📄 Page
2
Large Language Models Projects Apply and Implement Strategies for Large Language Models Pere Martra
📄 Page
3
Large Language Models Projects: Apply and Implement Strategies for Large Language Models ISBN-13 (pbk): 979-8-8688-0514-1 ISBN-13 (electronic): 979-8-8688-0515-8 https://doi.org/10.1007/979-8-8688-0515-8 Copyright © 2024 by Pere Martra This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Managing Director, Apress Media LLC: Welmoed Spahr Acquisitions Editor: Celestin Suresh John Development Editor: Laura Berendson Editorial Assistant: Gryffin Winkler Cover designed by eStudioCalamar Cover image designed by Shubham Dhage on Unsplash Distributed to the book trade worldwide by Springer Science+Business Media New York, 1 New York Plaza, Suite 4600, New York, NY 10004-1562, USA. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders- ny@springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation. For information on translations, please e-mail booktranslations@springernature.com; for reprint, paperback, or audio rights, please e-mail bookpermissions@springernature.com. Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Print and eBook Bulk Sales web page at http://www.apress.com/bulk-sales. Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub. For more detailed information, please visit https://www.apress.com/gp/services/ source-code. If disposing of this product, please recycle the paper Pere Martra Barcelona, Spain
📄 Page
5
v About the Author ����������������������������������������������������������������������������������������������������� ix About the Technical Reviewer ��������������������������������������������������������������������������������� xi Acknowledgments ������������������������������������������������������������������������������������������������� xiii Introduction �������������������������������������������������������������������������������������������������������������xv Part I: Techniques and Libraries ��������������������������������������������������������������������� 1 Chapter 1: Introduction to Large Language Models with OpenAI ����������������������������� 3 1.1 Create Your First Chatbot with OpenAI .................................................................................... 4 A Brief Introduction to the OpenAI API ..................................................................................... 5 The Roles in OpenAI Messages ............................................................................................... 5 Memory in Conversations with OpenAI.................................................................................... 6 Creating a Chatbot with OpenAI .............................................................................................. 8 Key Takeaways and More to Learn ........................................................................................ 14 1.2 Create a Simple Natural Language to SQL Using OpenAI ...................................................... 15 Key Takeaways and More to Learn ........................................................................................ 23 1.3 Influencing the Model’s Response with In-Context Learning ................................................ 24 Key Takeaways and More to Learn ........................................................................................ 28 1.4 Summary............................................................................................................................... 29 Chapter 2: Vector Databases and LLMs ������������������������������������������������������������������ 31 2.1 Brief Introduction to Hugging Face and Kaggle .................................................................... 31 Hugging Face ......................................................................................................................... 32 Kaggle ................................................................................................................................... 36 Key Takeaways and More to Learn ........................................................................................ 38 Table of Contents
📄 Page
6
vi 2.2 RAG and Vector Databases .................................................................................................... 39 How Do Vector Databases Work? .......................................................................................... 40 Key Takeaways ...................................................................................................................... 42 2.3 Creating a RAG System with News Dataset .......................................................................... 43 What Technology Will We Use? .............................................................................................. 43 Preparing the Dataset ............................................................................................................ 46 Working with Chroma ............................................................................................................ 51 Loading the Model and Testing the Solution .......................................................................... 55 Different Ways to Load ChromaDB ........................................................................................ 58 Key Takeaways and More to Learn ........................................................................................ 61 2.4 Summary............................................................................................................................... 62 Chapter 3: LangChain and Agents �������������������������������������������������������������������������� 63 3.1 Create a RAG System with LangChain .................................................................................. 64 Reviewing the Embeddings ................................................................................................... 64 Using LangChain to Create the RAG System .......................................................................... 67 Key Takeaways and More to Learn ........................................................................................ 80 3.2 Create a Moderation System Using LangChain ..................................................................... 81 Create a Self-moderated Commentary System with LangChain and OpenAI ........................ 82 Create a Self-moderated Commentary System with LLAMA- 2 and OpenAI .......................... 92 3.3 Create a Data Analyst Assistant Using a LLM Agent............................................................ 109 Key Takeaways and More to Learn ...................................................................................... 118 3.4 Create a Medical Assistant RAG System ............................................................................. 119 Loading the Data and Creating the Embeddings ................................................................. 121 Creating the Agent ............................................................................................................... 125 Key Takeaways and More to Learn ...................................................................................... 130 3.5 Summary............................................................................................................................. 131 Chapter 4: Evaluating Models ������������������������������������������������������������������������������� 133 4.1 BLEU, ROUGE, and N-Grams ................................................................................................ 134 N-Grams .............................................................................................................................. 134 Measuring Translation Quality with BLEU ............................................................................ 135 Table of ConTenTs
📄 Page
7
vii Measuring Summary Quality with ROUGE ........................................................................... 143 Key Takeaways and More to Learn ...................................................................................... 155 4.2 Evaluation and Tracing with LangSmith .............................................................................. 155 Evaluating LLM Summaries Using Embedding Distance with LangSmith ........................... 156 Tracing a Medical Agent with LangSmith ............................................................................ 169 Key Takeaways and More to Learn ...................................................................................... 178 4.3 Evaluating Language Models with Language Models ......................................................... 179 Evaluating a RAG Solution with Giskard .............................................................................. 180 Key Takeaways and More to Learn ...................................................................................... 185 4.4 An Overview of Generalist Benchmarks .............................................................................. 186 MMLU .................................................................................................................................. 187 ThruthfulQA ......................................................................................................................... 188 Key Takeaways .................................................................................................................... 188 4.5 Summary............................................................................................................................. 189 Chapter 5: Fine-Tuning Models ����������������������������������������������������������������������������� 191 5.1 A Brief Introduction to the Concept of Fine-Tuning ............................................................. 191 5.2 Efficient Fine-Tuning with LoRA .......................................................................................... 196 Brief Introduction to LoRA ................................................................................................... 196 Creating a Prompt Generator with LoRA .............................................................................. 197 Key Takeaways and More to Learn ...................................................................................... 210 5.3 Size Optimization and Fine-Tuning with QLoRA .................................................................. 210 Brief Introduction to Quantization ........................................................................................ 212 QLoRA: Fine-Tuning a 4-Bit Quantized Model Using LoRA ................................................... 215 Key Takeaways and More to Learn ...................................................................................... 223 5.4 Prompt Tuning ..................................................................................................................... 224 Prompt Tuning: Prompt Generator ....................................................................................... 226 Detecting Hate Speech Using Prompt Tuning ...................................................................... 236 Key Takeaways and More to Learn ...................................................................................... 240 5.5 Summary............................................................................................................................. 241 Table of ConTenTs
📄 Page
8
viii Part II: Projects ������������������������������������������������������������������������������������������� 243 Chapter 6: Natural Language to SQL ��������������������������������������������������������������������� 245 6.1 Creating a Super NL2SQL Prompt for OpenAI ..................................................................... 245 6.2 Setting UP a NL2SQL Project with Azure OpenAI Studio ..................................................... 257 Calling Azure OpenAI Services from a Notebook ................................................................. 271 Key Takeaways and More to Learn ...................................................................................... 275 6.3 Setting Up a NL2SQL Solution with AWS Bedrock ............................................................... 276 Calling AWS Bedrock from Python ....................................................................................... 281 Key Takeaways and More to Learn ...................................................................................... 285 6.4 Setting UP a NL2SQL Project with Ollama........................................................................... 285 Calling Ollama from a Notebook .......................................................................................... 289 Key Takeaways and More to Learn ...................................................................................... 294 6.5 Summary............................................................................................................................. 295 Chapter 7: Creating and Publishing Your Own LLM ���������������������������������������������� 297 7.1 Introduction to DPO: Direct Preference Optimization .......................................................... 298 A Look at Some DPO Datasets ............................................................................................. 300 7.2 Aligning with DPO a phi3-3 Model ...................................................................................... 301 Save and Upload .................................................................................................................. 312 7.3 Summary............................................................................................................................. 317 Part III: Enterprise Solutions ���������������������������������������������������������������������� 319 Chapter 8: Architecting a NL2SQL Project for Immense Enterprise Databases ���� 321 8.1 Brief Project Overview ........................................................................................................ 321 8.2 Solution Architecture........................................................................................................... 322 Prompt Size Reduction ........................................................................................................ 322 Using Different Models to Create SQL ................................................................................. 327 Semantic Caching to Reduce LLM Access ........................................................................... 329 8.3 Summary............................................................................................................................. 331 Table of ConTenTs
📄 Page
9
ix Chapter 9: Decoding Risk: Transforming Banks with Customer Embeddings ������ 333 9.1 Actual Client Risk System ................................................................................................... 334 9.2 How Can a Large Language Model (LLM) Help Us Improve This Process and, Above All, Simplify It? ......................................................................................................... 336 9.3 First Picture of the Solution ................................................................................................ 338 9.4 Preparatory Steps When Initiating the Project .................................................................... 340 9.5 Conclusion .......................................................................................................................... 341 Chapter 10: Closing ���������������������������������������������������������������������������������������������� 343 Index ��������������������������������������������������������������������������������������������������������������������� 345 Table of ConTenTs
📄 Page
10
xi About the Author Pere Martra is a seasoned IT engineer and AI enthusiast with years of experience in the financial sector. He is currently pursuing a Master’s in Research on Artificial Intelligence. Initially, he delved into the world of AI through his passion for game development. Applying reinforcement learning techniques, he infused video game characters with personality and autonomy, sparking his journey into the realm of AI. Today, AI is not just his passion but a pivotal part of his profession. Collaborating with startups on NLP-based solutions, he plays a crucial role in defining technological stacks, architecting solutions, and guiding team inception. As the author of a course on large language models and their applications, available on GitHub, Pere shares his expertise in this cutting-edge field. He serves as a mentor in the TensorFlow Advanced Techniques Specialization at DeepLearning.AI, assisting students in solving problems within their tasks. He holds the distinction of being one of the few TensorFlow Certified Developers in Spain, complementing this achievement with an Azure Data Scientist Associate certification. Follow Pere on Medium, where he writes about AI, emphasizing large language models and deep learning with TensorFlow, contributing valuable insights to TowardsAI.net. His top skills include Keras, artificial intelligence (AI), TensorFlow, generative AI, and large language models (LLM). Connect with Pere at www. linkedin.com/in/pere-martra/ for project collaborations or insightful discussions in the dynamic field of AI.
📄 Page
11
xiii About the Technical Reviewer Dilyan Grigorov is a software developer with a passion for Python software development, generative deep learning and machine learning, data structures, and algorithms. He is an advocate for open source and the Python language itself. He has 16 years of industry experience programming in Python and has spent 5 of those years researching and testing generative AI solutions. Dilyan is a Stanford student in the Graduate Program on Artificial Intelligence in the classes of people like Andrew Ng, Fei-Fei Li, and Christopher Manning. He has been mentored by software engineers and AI experts from Google and Nvidia. His passion for AI and ML stems from his background as an SEO specialist dealing with search engine algorithms daily. He enjoys engaging with the software community, often giving talks at local meetups and larger conferences. In his spare time, he enjoys reading books, hiking in the mountains, taking long walks, playing with his son, and playing the piano.
📄 Page
12
xv Acknowledgments I mainly want to thank my family; they have shown immense patience and endurance, seeing me devote hours to this small project instead of them, seeing me retreating behind a screen and only emerging to smile at them. I would also like to mention my colleagues at DeepLearning.AI. For a few months, I set aside my responsibilities as a mentor, and I was unable to help as a tester in the short courses they have been releasing. I cannot forget my friends at Kaizen Dojo in Barcelona, whom I have abandoned in many of the Karate Kyokushin trainings, but those that I have maintained have helped me stay sane and keep going.
📄 Page
13
xvii Introduction At the end of 2022, the field of artificial intelligence garnered significant attention from many individuals. A tool emerged that, through a large language model, was capable of answering a wide variety of questions and maintaining conversations that seemed to be conducted by a human. It’s possible that even the people at OpenAI weren’t fully aware of the impact ChatGPT would have. Although the origin of large language models can be traced back to 2017 with the publication of Google’s famous paper “Attention is All You Need,” they had never before enjoyed the fame and attention they have received since the introduction of ChatGPT. The focus that the developer community and AI professionals have placed on these models has been so immense that a whole set of solutions, tools, and use cases have been created that didn’t exist before. Among these new tools, we find vector databases, traceability tools, and a multitude of models of all sizes. These are used to create code generators, customer chatbots, forecasting tools, text analysis tools, and more. This is just the beginning. The number of solutions currently in development and the amount of money being invested in this new area of artificial intelligence is of a magnitude difficult to measure. In this book, I have attempted to provide an explanation that guides the reader from merely using large language models via API to defining large solutions where these models play a significant role. To achieve this, various techniques are explained, including prompt engineering, model training and evaluation, and the use of tools such as vector databases. The importance of these large language models is not only discussed, but great emphasis is also placed on the handling of embeddings, which is essentially the language understood by large language models. The book is accompanied by more than 20 notebooks where we use different models. I would ask you to focus more on the techniques used and their purpose rather than the specific models employed. New models appear every week, but what’s truly important is understanding how they can be used and manipulated to adapt to the specific use case you have in mind. I would say that the last two chapters of the book are of particular importance. Although they contain the least technical content, they provide
📄 Page
14
xviii the structure of two projects that utilize different language models to work together and solve a problem. If you’ve gone through the previous chapters, I’m confident you’ll understand the project structure and, more importantly, be able to create a similar solution on your own. Throughout this journey, you’ll create various projects that will allow you to acquire knowledge gradually, step by step: Chatbot Creation: Building a chatbot using the OpenAI API. (Chapter 1) Basic NL2SQL System: Creating a simple natural language to SQL (NL2SQL) system with OpenAI. (Chapter 1) RAG System with LangChain: Building a Retrieval Augmented Generation (RAG) system using LangChain, a vector database (ChromaDB), and a Hugging Face LLM (TinyLlama). (Chapter 2) Moderation System with LangChain: Developing a self- moderated comment response system using two OpenAI models or a Llama 2 model from Hugging Face. (Chapter 3) Data Analyst Assistant Agent: Creating an LLM agent capable of analyzing data from Excel spreadsheets. (Chapter 3) Medical Assistant RAG System: Building a medical assistant RAG system using LangChain and a vector database. (Chapter 3) Prompt Generator with LoRA, QLoRA, and Prompt Tuning: Fine-tuning an LLM using the LoRA, QLoRA, and Prompt Tuning techniques to make it capable of generating prompts for other models. (Chapter 5) Hate Speech Detector: Using Prompt Tuning, the most efficient fine-tuning technique, to adapt an LLM’s behavior. (Chapter 5) NL2SQL Solution in Azure and AWS: Creating an NL2SQL solution using cloud platforms like Azure OpenAI Studio and AWS Bedrock. (Chapter 6) NL2SQL Project with Ollama: Setting up a local server using Ollama to run an NL2SQL model. (Chapter 6) InTroduCTIon
📄 Page
15
xix Publishing an LLM on Hugging Face: Creating and publishing a custom LLM on Hugging Face. (Chapter 7) Architecting an NL2SQL Project for Enterprise Databases: Designing an NL2SQL solution for complex databases, incorporating techniques like prompt size reduction, semantic caching, and the use of multiple models. (Chapter 8) Transforming Banks with Customer Embeddings: A conceptual project exploring the use of customer embeddings to enhance risk assessment and decision-making in the banking sector. (Chapter 9) These projects are the perfect way to introduce the techniques and tools that make up the current stack for working with large language models. By developing these projects, you will work with: Prompt Engineering: Designing effective prompts to guide LLM responses. (Chapter 1) OpenAI API: Utilizing OpenAI’s API to access and interact with their powerful language models. (Chapter 1) Hugging Face: Leveraging the Hugging Face platform for accessing open source LLMs and the Transformers library for working with them. (Chapter 2) Vector Databases (ChromaDB): Employing vector databases to store and retrieve information based on semantic similarity for RAG systems. (Chapter 2) Kaggle: Utilizing Kaggle datasets for training and evaluating LLMs. (Chapter 2) LangChain: Using the LangChain framework to develop LLM- powered applications, chain multiple models, and build agents. (Chapter 3) Evaluation Metrics (BLEU, ROUGE): Assessing the quality of LLM-generated text using established metrics like BLEU (for translation) and ROUGE (for summarization). (Chapter 4) InTroduCTIon
📄 Page
16
xx LangSmith: Employing LangSmith for tracing and evaluating LLM interactions and performance. (Chapter 4) Fine-tuning Techniques (LoRA, QLoRA, Prompt Tuning): Adapting pretrained LLMs to specific tasks or domains using parameter-efficient fine-tuning methods. (Chapter 5) Direct Preference Optimization (DPO): Aligning LLMs with human preferences through reinforcement learning techniques. (Chapter 7) Cloud Platforms (Azure OpenAI, AWS Bedrock): Deploying and utilizing LLMs on cloud infrastructure for enterprise-level solutions. (Chapter 6) Ollama: Setting up and using a local server (Ollama) to run LLMs for development and experimentation. (Chapter 6) As you can see, this extensive list of projects provides a glimpse into a good part of this new universe of techniques and tools that have emerged around large language models. By the end of this book, you’ll be part of the small group of people capable of creating new models that meet their specific needs. Not only that, but you’ll be able to find solutions to new problems through the use of these models. I hope you enjoy the journey. InTroduCTIon
📄 Page
17
PART I Techniques and Libraries In this part, you will establish the foundations upon which you will build your developments based on large language models. You are going to explore different techniques through small, and practical, examples that will enable you to build more advanced projects in the following parts and chapters of this book. You will learn how to use the most common libraries in the world of large language models, always with a practical focus, while drawing on published papers, established research, and methodologies. Some of the topics and technologies covered in this part include chatbots, code generation, OpenAI API, Hugging Face, vector databases, LangChain, PEFT Fine-Tuning, soft prompt tuning, LoRA, QLoRA, evaluating models, prompt engineering, and RAG, among others. In most chapters within this part, you will find practical examples in the form of easily executable notebooks on Google Colab. Please note that some of the notebooks may require more memory than what is available in the free version of Google Colab. However, given that we are working with large language models, their resource requirements are typically high. If additional memory is required, you can opt for the Pro version of Colab, providing access not only to environments with high RAM capabilities but also to more powerful GPUs. Alternatively, other options could include utilizing the environment provided by Kaggle, where the available memory is greater, or running the notebooks in your local environment. The notebooks have been prepared to be executed on NVIDIA and Apple Silicon GPUs. There’s also the option to run the notebooks in your local development environment using Jupyter Notebooks. In fact, all the notebooks you’ll find in the book can be run in both Colab and Jupyter; just keep in mind the memory and processing constraints.
📄 Page
18
2 The choice of using one system or the other will depend on the resources you have access to. If your machine has a good GPU with 16GB or more of memory and you already have the environment set up, you may prefer to run them locally instead of in Colab. If you have a good GPU, you probably already have the environment ready to run these notebooks and Jupyter installed, but if not, I remind you that installing Jupyter is as simple as running a pip command. pip install jupyter. To start Jupyter Notebooks, you should run the command: jupyter notebook A tab will open in your default browser, and you will be able to navigate through your file system to find the notebook you want to open. The main advantage of using Google Colab is that the environment is already set up and many of the libraries needed to run the notebooks in the book are pre-installed. The interface is very similar to Jupyter’s; in fact, it’s a version of Jupyter running on Google’s cloud, with access to their GPUs. PART I TECHNIQUES AND LIBRARIES
📄 Page
19
3 © Pere Martra 2024 P. Martra, Large Language Models Projects, https://doi.org/10.1007/979-8-8688-0515-8_1 CHAPTER 1 Introduction to Large Language Models with OpenAI You will begin by understanding the workings of large language models using the OpenAI API. OpenAI enables you to use powerful models in a straightforward manner. Often, when validating the architecture of a project, a solution is initiated by employing these kinds of powerful models that are accessible via API such as those from OpenAI or Anthropic. Once the architecture’s effectiveness is validated and the results are known from a state-of-the-art model, the next step is to explore how to achieve similar or better outcomes using open source models. Before proceeding, you’ll need to create an OpenAI account and obtain an API key to access the OpenAI API. To obtain this, you’ll be required to provide a credit card, as OpenAI charges for the requests made to their models. Not to worry, the cost will depend on your usage and the tests you conduct. Personally, I haven’t paid more than 20 dollars to execute all the examples and tests outlined in this book, and I can assure you, there have been quite a few. I’m sure that you can pass through all the samples in this for just a small portion of this cost. Here you can create your OpenAI keys: https://platform.openai.com/api-keys. Do not forget to store them in a private space, since it cannot be accessed from the OpenAI site. I also recommend setting a usage limit, Figure 1-1, on the account. This way, in case one of your keys becomes public due to an error, we can restrict the expenses that a third party can incur.
📄 Page
20
4 Figure 1-1. Configure a monthly budget and a notification threshold that fits you. https://platform.openai.com/account/limits Another option to control spending could be to use the pay-as-you-go option offered by OpenAI, where you load an amount into your OpenAI account. This way you ensure that you won’t spend more than the amount you have loaded in the account. When the credit runs out, the API will return an error indicating that you have run out of credit. You will only have to go back to your account and add more balance. OpenAI gives you the option to make an automatic recharge, of the amount you indicate, every time you run out of balance, but then I think the purpose of spending control is somewhat lost. Now that you have your OpenAI key ready, you can begin with the first example in the book. You’ll explore how to create a prompt for OpenAI and how to use one of its models in a conversation. 1.1 Create Your First Chatbot with OpenAI In this chapter you are going to explore how the OpenAI API works and how we can use one of its famous models to make your own chatbot. The supporting code is available on Github via the book’s product page, located at https://github.com/Apress/Large-Language-Models-Projects. The notebook for this example is called: 1_1-First_Chatbot_OpenAI.ipynb. Chapter 1 IntroduCtIon to Large Language ModeLs wIth openaI