Building Generative AI Services with FastAPI (for True Epub) (Alireza Parandeh)（Z-Library）

(This page has no text content)

Praise for Building Generative AI Services with FastAPI A masterclass in turning cutting-edge AI into real-world impact. Ali distills the complexity of generative models and FastAPI into an approachable, empowering guide for builders at every level. —Alan King, founder, AI Your Org A must-have for software developers and data scientists to learn to build production-grade generative AI services with FastAPI. Ali’s clear explanations and depth of technical expertise will keep you ahead in this exciting and emerging field. —Joe Rowe, head of technical assurance & compliance, Applied Data Science Partners

This book is superb at taking complicated topics and explaining them in a simple, easy-to-understand way for developers and non-developers alike. It’s a fascinating deep dive into using GenAI in your development projects, which is only going to become more and more important as time goes on! —Lee Dalchow, software engineer A practical introduction to generative AI with valuable insights on building real-world services. This book is a good starting point for aspiring AI developers. —Julian Brendel, senior Python developer, Vitol The book is well-structured, and the way it presents the topics gives you a solid foundation in the subject. I’ve recommended it to my colleagues! —Daniel Saad, software engineer, Mercedes- Benz Tech Innovation

Building Generative AI Services with FastAPI A Practical Approach to Developing Context-Rich Generative AI Applications Alireza Parandeh Foreword by David Foster

Building Generative AI Services with FastAPI by Alireza Parandeh Copyright © 2025 Ali Parandeh. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Amanda Quinn Development Editor: Rita Fernando Production Editor: Clare Laylock Copyeditor: Kim Wimpsett Proofreader: Vanessa Moore

Indexer: WordCo Indexing Services, Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea April 2025: First Edition Revision History for the First Edition 2025-04-15: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098160302 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Building Generative AI Services with FastAPI, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all

responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-098-16030-2 [LSI]

Foreword I remember the day Ali, our head of engineering at ADSP, walked confidently into the office and declared he wanted to write a book on building generative AI services. Knowing the mammoth-sized undertaking that is writing a technical book, I offered him a strong cup of coffee and regaled him with a few tales of my own late-night writing sessions, fueled by caffeine and the sheer will to meet a deadline. I might have even thrown in a cautionary whisper about the ever-present temptation to rewrite entire chapters at 3 a.m. But Ali was steadfast. He had that glint in his eye—a mix of determination and a clear vision. He knew he wanted to create something special, something that would demystify the complexities of generative AI and empower others to build. Having now read Building Generative AI Applications with FastAPI, I can say he’s done far more than that. Ali has crafted a truly indispensable guide for anyone looking to move beyond theoretical discussions about AI and into the realm of practical, real-world application. And somehow, he’s made the whole process look deceptively easy.

As co-founder of an AI consultancy, I’ve seen firsthand the growing need for engineers who not only can understand how AI works but also build production-grade solutions with AI. We are in a period of profound transformation, where AI is rapidly changing how we live and work. It’s no longer enough to be a passive consumer of AI-powered products. The future belongs to those who can harness the power of generative models to create, innovate, and solve real problems. This book is the perfect starting point for that journey. Ali’s technical expertise is evident on every page. He effortlessly blends complex concepts with clear, concise explanations and practical examples. The code snippets aren’t just toy examples; they are building blocks for real applications. He guides you through the intricacies of FastAPI, authentication, authorization, and database integration with the confidence of a seasoned engineer who has spent countless hours wrestling with these challenges in the real world. Building Generative AI Applications with FastAPI is a vital resource for any engineer looking to navigate the rapidly evolving landscape of AI. It’s a testament to Ali’s technical leadership and his remarkable ability to make the complex accessible. This book isn’t just about building AI services; it’s about empowering a new generation of engineers to shape the

future. And that future, thanks to works like this, looks incredibly bright. David Foster Partner at ADSP Author of Generative Deep Learning (O’Reilly, 2024)

Preface Generative AI (GenAI) is taking the world by storm since the release of technologies like ChatGPT. This new type of AI can create content in various modalities (such as text, audio, video, etc.) by learning to mimic patterns from its training data. With the increased advancement in GenAI capabilities, many businesses are investing in off-the-shelf or custom AI tools. These tools require maintainable and scalable backend services that can adapt to high demand. AI capabilities are exciting because they open the door to endless possibilities that unlock the potential for new tools. Before generative AI, developers had to write scripts and train optimization models to build automation and data pipelines for their processing of unstructured data like corpora of texts. This process could be tedious, error-prone, and applicable only to limited use cases. However, with the rise of GenAI models such as large language models (LLMs), we can now digest, compare, and summarize unstructured datasets and documents; reword complex ideas; and generate visualizations and illustrations. While most generative models such as ChatGPT are excellent at what they do on their own, can you imagine the possibilities

when we connect them to the internet, our own databases, and other services? If we can just “talk” to our services in natural language or give them some image, video, or audio and get them to do things for us, it opens up so many opportunities to create newly accessible and automated applications. Chatbots are not the only apps that we can create with such generative models. There is so much more we can do. We can create backend service agents that can perform various complex tasks requiring comprehension, logical reasoning, and analysis of texts. By connecting our generative models to existing services and the internet, we are giving our AI services additional data to enrich their understanding of the problem at hand. For instance, a company can use an open source, in-house, fine- tuned LLM to parse purchase orders, generate invoices, and validate data against their customer database before placing an order with a payment system. This is where generative models shine. Other use cases can include content management systems that can help users with generating content and website builders that can suggest imagery, icons, and user interface (UI) components to fast-track the site’s design.

There is a catch. LLMs and other generative models require heavy processing power and memory to function, and it is not clear what deployment patterns and integration layers the developers should use to leverage these models. Building generative AI services is challenging because you need to balance scalability, security, performance, and data privacy. You’ll also want the ability to moderate, retrain, and optimize these services for real-time inference. These challenges will be different for every organization, and how you build your generative AI services will depend on your existing software systems and services. Existing resources and documentation provide the necessary information to get started with training custom models and fine-tuning large language models. However, most developers may continue to face challenges in packaging and deploying these novel generative models as part of existing software systems and services. My aim with this book is to show you how to productionize GenAI by understanding the end-to-end process in building and deploying your own AI services with tools such as the FastAPI web framework.

Objective and Approach The objective of this book is to help you explore the challenges of developing, securing, testing, and deploying generative AI as services integrated with your own external systems and applications. This book centers on constructing modular, type-safe generative AI services in FastAPI with seamless database schema handling support and model integration to power backends that can generate new data. The significance of these topics stems from the growing demand for building flexible services that can adapt to changing requirements, maintain high performance, and scale efficiently using the microservice pattern. You will also learn the process of enriching your services with contextual data from a variety of sources such as databases, the web, external systems, and files uploaded by users. A few generative models require heavy processing power and memory to function. You will explore how to handle these models in production and how to scale your services to handle

the load. You will also explore how to handle long-running tasks such as model inference. Finally, we will discuss authentication concepts, security considerations, performance optimization, testing, and deployment of production-ready generative AI services. Prerequisites This book assumes no prior knowledge of generative AI and won’t require you to fully understand how generative models work. I will be covering the intuition of how such models generate data but will not dive into their underlying mathematics. However, if you want to learn more about building your own generative AI models in detail, I recommend Generative Deep Learning by David Foster (O’Reilly, 2024). As this is a FastAPI book for generative AI applications, I do assume some familiarity with this web framework. If you need a refresher or would like to expand your understanding of FastAPI features, I recommend reading FastAPI by Bill Lubanovic (O’Reilly, 2023). However, this is not a requirement for following along with this book.

Furthermore, the book does assume some experience with Python, with Docker for deployment, with how the web works, and with communicating through the HTTP protocol. To brush up on your Python skills, I highly recommend visiting realpython.org for excellent tutorials on more advanced concepts. The official Docker website also provides an excellent practical tutorial on containerization and writing Dockerfiles. I will not be covering the fundamentals of the web in this book, but I highly recommend MDN’s documentation as a starting point. Finally, the book won’t require knowledge of deep learning frameworks such as Tensorflow and Keras. Where relevant, you’ll be introduced to these frameworks. Instead, we will mostly work with pretrained models hosted on the Hugging Face model repository.

Book Structure The book is broken into three parts: Part I, “Developing AI Services” This part covers all the necessary steps to set up a FastAPI project that will power your GenAI service. You will learn to integrate various generative models into a type-safe FastAPI application and expose endpoints to interact with them. Chapter 1, “Introduction”: This chapter discusses the importance of GenAI in the future and introduces the practical projects you’ll build throughout the book. Chapter 2, “Getting Started with FastAPI”: This chapter introduces FastAPI, a modern framework for building AI services. You will understand its features, limitations, and how it compares to other web frameworks. By the end of this chapter, you will be able to start creating FastAPI applications, progressively organize projects, and migrate from frameworks like Flask or Django.

Chapter 3, “AI Integration and Model Serving”: This chapter covers the full process of integrating and serving various GenAI models (including language, audio, vision, and 3D models) as a FastAPI service using application lifespan. We’ll review various strategies for model serving like preloading, externalizing, and monitoring models with middleware. Chapter 4, “Implementing Type-Safe AI Services”: This chapter introduces the concept of type-safety and how Python’s type annotations and data validation tools like Pydantic can help validate and serialize data running past your AI services. Part II, “Communicating with External Systems” In this part, we’ll integrate our AI services with external systems such as databases and learn how to serve concurrent users. We will also implement real-time streaming of model outputs. Chapter 5, “Achieving Concurrency in AI Workloads”: This chapter introduces the concepts of concurrency and parallelism alongside comparing different strategies for solving concurrency problems. We’ll review the purpose of asynchronous programming in

handling long-running and blocking tasks and review the limitations of Python’s Global Interpreter Lock (GIL) when handling these asynchronous processes. To practice, we’ll implement a working “talk to the web and your documents” chatbot using a technique called retrieval augmented generation (RAG). Finally, we’ll cover FastAPI’s background tasks feature for tackling long-running operations. Chapter 6, “Real-Time Communication with Generative Models”: In this chapter, we will focus on enabling real- time client-server communication with generative models. As part of this, we’ll compare various mechanisms such as web sockets and server streaming events when streaming data to/from generative models with practical examples. Chapter 7, “Integrating Databases into AI Services”: This chapter provides an overview of database technologies suitable for GenAI services. We’ll cover best practices when working with databases using battle-tested tools such as SQLAlchemy ORM and Alembic for facilitating migrations. Finally, we’ll introduce Prisma, an upcoming tool for generating a

fully typed database client and automatic handling of migrations. Part III, “Securing, Optimizing, Testing, and Deploying AI Services” In this part, we focus on implementing the authentication layer for user management, alongside security and optimization enhancements. We’ll then shift our focus on testing and finally deploying our AI service through containerization. Chapter 8, “Authentication and Authorization”: In this chapter, we will cover the implementation of authentication layers for user management to secure, protect, and restrict access to AI services. We’ll review and implement various authentication strategies including basic, token-based, and OAuth. We’ll then introduce authorization models including role-based access control (RBAC) and explain the role of FastAPI’s dependency graph in the process. This will include adding restrictive permissions for users based on roles where AI service interactions can be automatically moderated. Chapter 9, “Securing AI Services”: This chapter provides an overview of common attack vectors for

Statistics

Uploader

Building Generative AI Services with FastAPI (for True Epub) (Alireza Parandeh)（Z-Library）

AI Reading Assistant

Passage locations

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Blog & Notes

Recommended for You

Statistics

Uploader

Building Generative AI Services with FastAPI (for True Epub) (Alireza Parandeh)（Z-Library）

AI Reading Assistant

Passage locations

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment

Blog & Notes

Recommended for You