Support Statistics
¥.00 ·
0times
Text Preview (First 20 pages)
Registered users can read the full content for free
Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.
Page
1
(This page has no text content)
Page
2
Build AI Applications with Spring AI RAG, MCP and Agents with Spring AI 1.1 Fu Cheng This book is available at https://leanpub.com/spring-ai This version was published on 2025-12-08 This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do. © 2025 Fu Cheng
Page
3
Also By Fu Cheng Exploring Java 25 Text-to-SQL, Spring AI Implementation with RAG Understanding Java Virtual Threads From Java 21 to Java 25 From Java 17 to Java 21 Build Native Java Apps with GraalVM From Java 11 to Java 17 ES6 Generators A Practical Guide for Java 8 Lambdas and Streams Lodash 4 Cookbook JUnit 5 Cookbook
Page
4
Contents Spring AI Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Spring Boot Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Use Model Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Consolidate Local and Production Environment . . . . . . . . . . . . . . 7 Chat Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 ChatModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Create Prompts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chat Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 ChatClient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Use Custom Chat Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Prompt Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Advisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Streaming Chat Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 StreamingChatModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Streaming Web Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Streaming JSON Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Structured Output Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 StructuredOutputConverter . . . . . . . . . . . . . . . . . . . . . . . . . . 54 ListOutputConverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 MapOutputConverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 BeanOutputConverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Use ChatClient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Multimodal Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Page
5
CONTENTS Image Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Retrieval-Augmented Generation . . . . . . . . 67 RAG Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Reduce Hallucinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Naive RAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Embedding Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 EmbeddingModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 EmbeddingOptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 BatchingStrategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Use EmbeddingModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Create Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Document Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Document Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Document Writer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Vector Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 VectorStore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Create VectorStore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Add Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Delete Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Similarity Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 VectorStore REST API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 SimpleVectorStore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Pgvector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Vector Store Cloud Services . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 RAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Simple RAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Modular RAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Pre-Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Post-Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Page
6
CONTENTS RetrievalAugmentationAdvisor . . . . . . . . . . . . . . . . . . . . . . . . . 116 RAG Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Text-to-SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 PDF Q&A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 MCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 MCP Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Quick Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Java Development Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 stdio Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 HTTP SSE Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 MCP Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Spring Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Spring AI Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 MCP Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Shared Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 ServerExchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Prompt Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Completions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Pagination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 MCP Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 MCP Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 File System MCP Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Agent Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Cooking Suggestion Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Page
7
Agent Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Persona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Knowledge and memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Reasoning and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Planning and feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Agentic Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Task Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Evaluator-Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Parallelization Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Routing Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Chain Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Agent as Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Tool as Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Orchestrator-Workers Workflow . . . . . . . . . . . . . . . . . . . . . . . 222 Agent Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Persona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Source Code and Materials . . . . . . . . . . . . . . . . 233
Page
8
Spring AI Basics This part covers content related to Spring AI itself. Source code of this book can be found at the last chapter.
Page
9
Getting Started Let’s start the journey with Spring AI from a simple application. Prerequisites Before writing Spring AI applications, we need to prepare the local develop- ment environment. Obviously, we need to have Java installed and configured. We also need to have a large language model (LLM) ready for testing. Java Spring AI requires a minimal Java version of 17. It’s recommended to use Java 21 or Java 25 LTS, so we can leverage the power of virtual threads. Source code of this book is tested using Java 21 with virtual threads enabled. Spring AI This book uses Spring AI 1.1.0. Example applications in this book use Maven to manage dependencies. To simplify dependency management of related modules, the spring-ai-bom dependency can be imported to set versions of Spring AI dependencies. Figure 1. Spring AI dependency management using BOM 1 <dependencyManagement> 2 <dependencies> 3 <dependency> 4 <groupId>org.springframework.ai</groupId> 5 <artifactId>spring-ai-bom</artifactId> 6 <version>${spring-ai.version}</version> 7 <type>pom</type> 8 <scope>import</scope> 9 </dependency> 10 </dependencies> 11 </dependencyManagement>
Page
10
Getting Started 3 Language Model A language model is required for development, testing and production de- ployments. This language model can run locally or on the cloud, as long as it provides an API endpoint to access its service. • To run amodel locally, there aremany options available, including Ollama, vLLM, and LM Studio. • To use a cloud-basedmodel service, you need to open an account and pay the service by tokens. Here let’s start from using Ollama. Ollama is a tool to run large language models locally. You can simply download Ollama and install it on your local machine. After installation, you can open a terminal window and use Ollama CLI command ollama to work with it. There aremanymodels available for use with Ollama, see Ollama’s models page for a full list. We can use ollama pull to pull a model. Here we are using Qwen3. Figure 2. Ollama pull a model 1 ollama pull qwen3:0.6b The size of qwen3:0.6b is only 523MB. It’s good for local development and testing. After the model is pulled, it can be run using ollama run. Figure 3. Ollama run a model 1 ollama run qwen3:0.6b ollama run command pulls non-existing models automatically. ollama run starts a command-line session with the LLM. You can simply type any text to receive completions from LLM.
Page
11
Getting Started 4 Figure 4. ollama run By default, Ollama provides its API endpoint at port 11434. Spring Boot Application The easiest way to create a new Spring AI application is using Spring Initializr. When adding the project’s dependencies, Ollama should be selected. This enables Spring AI to interact with Ollama. Spring Web is also added to create a simple REST API. Below is the screenshot of Spring Initializr. Figure 5. Spring Initializr UI
Page
12
Getting Started 5 Now we can download the created application and open it using IntelliJ IDEA. Adding the Ollama dependency actually includes the spring-ai-starter- model-ollama to the Maven project. This Spring Boot starter will create necessary beans to work with Spring AI. Figure 6. Spring Boot Ollama starter 1 <dependency> 2 <groupId>org.springframework.ai</groupId> 3 <artifactId>spring-ai-starter-model-ollama</artifactId> 4 </dependency> Here we need to add an application.yaml file to configure the Spring Boot application. This is because qwen3 model should be used. By default, Ollama uses Mistral model. The property to configure the Ollama model is spring.ai.ollama.chat.options.model. Figure 7. Spring Boot configuration for Spring AI using Ollama 1 spring: 2 ai: 3 ollama: 4 chat: 5 options: 6 model: "qwen3:0.6b" Now we add a REST endpoint to chat with an LLM. The ChatClient.Builder instance is injected into the REST controller to create ChatClient instances. This instance is provided by Ollama Spring Boot starter. A ChatClient is created from this ChatClient.Builder using the build method. chat- Client.prompt().user(message).call().content() sends a request to Ollama API endpoint and receives the output.
Page
13
Getting Started 6 Figure 8. REST Controller 1 @RestController 2 public class ChatController { 3 4 private final ChatClient chatClient; 5 6 public ChatController(ChatClient.Builder builder, 7 LoggingAdvisor loggingAdvisor) { 8 this.chatClient = builder.defaultAdvisors(loggingAdvisor).build(); 9 } 10 11 @GetMapping("/chat") 12 public String chat(@RequestParam(value = "message") String message) { 13 return chatClient.prompt().user(message).call().content(); 14 } 15 } Now we can start the Spring Boot application. Once the application is started, we can use any REST client tool to interact with the REST API. Here we use SpringDoc to expose OpenAPI endpoint and Swagger UI to test the API. Figure 9. SpringDoc dependency 1 <dependency> 2 <groupId>org.springdoc</groupId> 3 <artifactId>springdoc-openapi-starter-webmvc-ui</artifactId> 4 <version>2.8.9</version> 5 </dependency> Wecanopen a browserwindowandnavigate to http://localhost:8080/swagger- ui/, then use Swagger UI to try the API. Below is the result of testing the API using Swagger UI.
Page
14
Getting Started 7 Figure 10. Use Swagger UI Use Model Service While Ollama is great for local development and testing, we usually use cloud- based model services for production. All major cloud platforms provide AI models as services, including Google, Amazon, and Microsoft. Spring AI supports major AI model services. Here OpenAI is used as an example. For Spring Boot, the easiest way is adding the Spring Boot starter dependency. For OpenAI support, the dependency is spring-ai-starter-model-openai. Figure 11. OpenAI Spring Boot starter dependency 1 <dependency> 2 <groupId>org.springframework.ai</groupId> 3 <artifactId>spring-ai-starter-model-openai</artifactId> 4 </dependency> An OpenAI API key is required to use OpenAI services. In the configuration below, OpenAI API key is read from environment variable OPENAI_API_KEY. Figure 12. Set OpenAI API key 1 spring: 2 ai: 3 openai: 4 apiKey: ${OPENAI_API_KEY}
Page
15
Getting Started 8 Consolidate Local and Production Environment If we use Ollama for local development and OpenAI for production, we need to have add both model dependencies in Spring AI. These two dependencies will conflict with each other. We should consolidate these two models. We can use only OpenAI model but different API endpoints in development and production. Many model services provide an API which is compatible with OpenAI. Ollama also has this API. After Ollama is started, this API can be accessed from base URL http://localhost:11434/v1/. OpenAI compatibility of Ollama is experimental and is subject to major adjustments including breaking changes. Only parts of OpenAI API are supported. We can use Spring profiles to apply configurations for different environments. For the development profile, spring.ai.openai.baseUrl is configured to http://localhost:11434/v1. The API key is required for configuration, but will be ignored, so the value can be anything. Figure 13. Use Ollama OpenAI compatible API in development profile 1 spring: 2 ai: 3 openai: 4 baseUrl: http://localhost:11434/v1 5 apiKey: ollama In the production profile application-prod.yaml, spring.ai.openai.baseUrl is configured to https://api.openai.com/v1, which is the endpoint of OpenAI API. Figure 14. Use OpenAI in production profile 1 spring: 2 ai: 3 openai: 4 baseUrl: https://api.openai.com/v1 5 apiKey: ${OPENAI_API_KEY}
Page
16
Getting Started 9 Profiles can be switched using the option -Dspring.profiles.active, e.g. -Dspring.profiles.active=prod. Depends on whether you want to runmodels locally, there are two recommen- dations about setup of development environment. Cloud‐based Model Services Cloud-based model services are actually cheap to use. One option is to simply use model services for both development and production. Spring AI provides integration modules for popular model service platforms. We only need to include the Spring AI module and configure it. Let’s use Anthropic Claude as an example. In a Spring Boot application, we can add the dependency of spring-ai-starter-model-anthropicmodule. Figure 15. Anthropic module dependency 1 <dependency> 2 <groupId>org.springframework.ai</groupId> 3 <artifactId>spring-ai-starter-model-anthropic</artifactId> 4 </dependency> Then we can configure Anthropic Claude. The prefix of configuration prop- erties is spring.ai.anthropic. An API key is required to be configured as environment variable ANTHROPIC_API_KEY. The model claude-opus-4-0 is used. Figure 16. Configure Anthropic Claude 1 spring: 2 ai: 3 anthropic: 4 apiKey: ${ANTHROPIC_API_KEY} 5 chat: 6 options: 7 model: claude-opus-4-0 Use Container If you want to run models locally, It’s recommended to run models in a container. Container tools like Docker and Podman have already been used
Page
17
Getting Started 10 extensively in development. You may already use containers to run databases, message brokers, and other tools. Running models in a container means that you don’t need to install other tools. llama.cpp A popular choice is using llama.cpp to run models. llama.cpp provides an OpenAI compatible API to interact with the model. Model files can be downloaded from Hugging Face. In the Docker compose file below, the model file of Qwen3-0.6B is downloaded from Hugging Face, then llama.cpp is started to serve this model. Figure 17. Docker compose file to run models using llama.cpp 1 services: 2 model-runner: 3 image: ghcr.io/ggml-org/llama.cpp:server 4 volumes: 5 - model-files:/models 6 command: 7 - "--host" 8 - "0.0.0.0" 9 - "--port" 10 - "8080" 11 - "-n" 12 - "512" 13 - "-m" 14 - "/models/Qwen3-0.6B-Q8_0.gguf" 15 ports: 16 - "8180:8080" 17 depends_on: 18 model-downloader: 19 condition: service_completed_successfully 20 21 model-downloader: 22 image: ghcr.io/alexcheng1982/model-downloader 23 restart: "no" 24 volumes: 25 - model-files:/models 26 command: 27 - "hf" 28 - "download" 29 - "unsloth/Qwen3-0.6B-GGUF" 30 - "Qwen3-0.6B-Q8_0.gguf" 31 - "--local-dir" 32 - "/models" 33
Page
18
Getting Started 11 34 volumes: 35 model-files: After the container is started, the model API can be accessed from http://localhost:8180. In Spring AI, we can create a new profile which sets the configuration key spring.ai.openai.baseUrl to http://localhost:8180. The apiKey can be set to anything. Figure 18. Use OpenAI compatible API running in the container 1 spring: 2 ai: 3 openai: 4 baseUrl: http://localhost:8180 5 apiKey: demo llama.cpp provides a web UI to interact with the model. You can access this UI at http://localhost:8180 using a browser. Ollama Ollama can also run in a container, whichmeans we don’t need to install Ollama on local machine. In the Docker compose file below, Ollama is started in a container. Another container is used to pull the qwen3:0.6bmodel. Figure 19. Docker compose file to run models using Ollama 1 services: 2 ollama: 3 image: ollama/ollama 4 container_name: ollama 5 ports: 6 - "11434:11434" 7 volumes: 8 - ollama:/root/.ollama 9 restart: unless-stopped 10 healthcheck: 11 test: ["CMD", "curl", "-f", "http://localhost:11434"] 12 interval: 30s 13 timeout: 10s
Page
19
Getting Started 12 14 retries: 5 15 command: ["/bin/ollama", "serve"] 16 17 ollama-pull-qwen3: 18 image: ollama/ollama 19 container_name: ollama-pull-qwen3 20 volumes: 21 - ollama:/root/.ollama 22 depends_on: 23 ollama: 24 condition: service_healthy 25 command: ["/bin/ollama", "pull", "qwen3:0.6b"] 26 27 volumes: 28 ollama: 29 driver: local
Page
20
Chat Completion ChatClient is the entry point of interacting with language models’ text com- pletion capability. It’s a fluent API to build requests and handle responses. ChatClient uses ChatModel to interact with LLMs. Model Model is a generic interface to work with various types of AI models. Model is a simple interface with only onemethod call. The callmethod sends a request to the AI model and receives a response. The types of request and response are generic. Figure 20. Model 1 public interface Model<TReq extends ModelRequest<?>, 2 TRes extends ModelResponse<?>> { 3 4 TRes call(TReq request); 5 6 } Below is the class hierarchy of Model interface. Figure 21. Model hierarchy
Comments 0
Loading comments...
Reply to Comment
Edit Comment