me/SpringAI/0_使用SpringAI接入AI模型.md

## 引言

自从 Open AI 发布 ChatGPT 后，获得了全球大量的关注。生成式AI的强大能力，改变了许多人的生活方式。在编程语言的社区中，正积极地建设生成式AI的能力。在Java语言为主的Spring社区，发布了 Spring AI 1.0.0-SNAPSHAT 版本。下文将介绍如何安装并部署AI模型，重点探讨如何通过 Spring AI 框架构建AI服务。

## 什么是Hugging Face？

Hugging Face是一个公开的AI模型社区，托管着来自世界各地AI领域的开发者、企业、组织上传的模型、数据集等内容。方便那些对AI感兴趣的爱好者分享并下载。它就像一个AI模型的超市一样，AI模型的生产制造商会把生产出来的模型上架到这个“超市”，消费者在这里可以挑选自己感兴趣的模型下载、试用。官网地址（需要魔法）：www.huggingface.co

## 什么是Ollama？

前面已经提到了从哪里获取AI模型，但是这个模型和应用软件是不一样的，windows、linux、macos等操作系统无法直接运行。Ollama就是一个AI模型的安装和管理工具，Ollama可能不是最好的AI模型管理工具，但它的兼容性很强，这是它的优势。我们可以在windows、linux、macos系统里安装Ollama，再通过Ollama获取并安装Hugging Face里的AI模型。

可以在 https://ollama.com/download 网站下载并安装Ollama。安装好以后可以通过命令行拉取模型了，以deepseek-r1举例：

```
ollama run hf.co/deepseek-ai/deepseek-r1:7b
```

命令的格式：`ollama run hf.co/用户名/模型名:参数量级`，上面的7b就是指R1的70亿参数模型。没有魔法的话，`hf.co`可能访问不了，可以换成镜像站`hf-mirror.com`：

```
ollama run hf-mirror.com/deepseek-ai/deepseek-r1:7b
```

关于Ollama更多的能力这里不再继续展开，感兴趣的话可以在网上找视频继续学习。

## 什么是SpringAI？

- 官方文档地址：https://spring.io/projects/spring-ai
- Github仓库地址：https://github.com/spring-projects/spring-ai

官方的定位：

> Spring AI is an application framework for AI engineering. Its goal is to apply to the AI domain Spring ecosystem design principles such as portability and modular design and promote using POJOs as the building blocks of an application to the AI domain.

翻译：Spring AI 是一个用于 AI 工程的应用程序框架。 其目标是将 Spring 生态系统设计原则（如可移植性和模块化设计）应用于 AI 领域，并将使用 POJO 构建应用程序推广到 AI 领域。

<img src='https://images.ctfassets.net/mnrwi97vnhts/4mda205vy509Dx3vGkMwFr/af520e66dc79fb80cd1bc129a11d6d23/spring-ai-integration-diagram-3.svg'>

Spring AI 的核心是解决企业如何集成 AI 模型。

## Spring AI的功能

- 对 AI 模型供应商的支持，例如：DeepSeek、Qwen（Alibaba）、qianfan（Baidu）、Anthropic、OpenAI、Microsoft、Amazon、Google、Ollama。支持的模型类型有：
  + [Chat Completion](https://docs.spring.io/spring-ai/reference/api/chatmodel.html)：聊天模型
  + [Embedding](https://docs.spring.io/spring-ai/reference/api/embeddings.html)：嵌入模型
  + [Text to Image](https://docs.spring.io/spring-ai/reference/api/imageclient.html)：文生图
  + [Audio Transcription](https://docs.spring.io/spring-ai/reference/api/audio/transcriptions.html)：音频转换
  + [Text to Speech](https://docs.spring.io/spring-ai/reference/api/audio/speech.html)：文字转语音
  + [Moderation](https://docs.spring.io/spring-ai/reference/api/index.html#api/moderation)：内容审核
- [结构化输出](https://docs.spring.io/spring-ai/reference/api/structured-output-converter.html)：就像是传统应用的ORM一样，把AI模型的输出内容映射到POJO。
- 对向量数据库的支持以及跨向量存储的便携式API，包括一种新颖的类似SQL的 Metadata Filter API，对向量数据库的支持包括：
  + Apache Cassandra
  + Azure Cosmos DB
  + Azure Vector Search
  + Chroma
  + Elasticsearch
  + GemFire
  + MariaDB
  + Milvus
  + MongoDB Atlas
  + Neo4j
  + OpenSearch
  + Oracle
  + PostgreSQL/PGVector
  + PineCone
  + Qdrant
  + Redis
  + SAP Hana
  + Typesense
  + Weaviate
- [Tools/Function Calling](https://docs.spring.io/spring-ai/reference/api/functions.html)：允许模型请求执行客户端工具和函数，从而按需访问必要的实时信息。
- [Observability](https://docs.spring.io/spring-ai/reference/observability/index.html)：提供对AI相关操作的可观测性。
- 用于数据工程的文档注入 [ETL 框架](https://docs.spring.io/spring-ai/reference/api/etl-pipeline.html)
- [AI 模型评估](https://docs.spring.io/spring-ai/reference/api/testing.html)：帮助评估生成的内容并防止幻觉响应的实用程序。
- [ChatClient](https://docs.spring.io/spring-ai/reference/api/chatclient.html)：用于与 AI 聊天模型通信的链式调用API，类似于 WebClient 和 RestClient。
- [Advisors](https://docs.spring.io/spring-ai/reference/api/advisors.html)：封装了常见的生成式AI使用模式，能够转换发送至语言模型（LLMs）及从模型接收的数据，并确保在不同模型和应用场景间的兼容性和可移植性。
- [Chat Conversation Memory](https://docs.spring.io/spring-ai/reference/api/chatclient.html#_chat_memory)：在聊天机器人或对话系统中用于存储和管理对话历史记录的功能或组件。这个概念对于创建连贯且上下文相关的对话体验至关重要。具体来说，Chat Conversation Memory能够记住用户与系统之间的多轮对话内容，并在后续交互中使用这些信息来维持对话的连续性。例如，如果用户在一段对话中提到了某个特定的信息（如他们的名字或者他们感兴趣的产品），系统可以通过记忆这一信息，在之后的对话中正确引用，从而提供更加个性化和流畅的用户体验。这种记忆机制可以实现于多种层面，包括但不限于 **短期记忆**：仅保留最近几轮对话的信息，适合处理即时的、短暂的会话需求。**长期记忆**：能够持久化用户的偏好、个人信息等长期有效的数据，支持更深层次的个性化服务。**全局记忆**：跨越多个会话保存用户数据，允许跨会话追踪用户的行为和偏好。通过有效利用Chat Conversation Memory，可以构建出更加智能和人性化的对话应用。
- [Retrieval Augmented Generation（RAG）](https://docs.spring.io/spring-ai/reference/api/chatclient.html#_retrieval_augmented_generation)：一种结合了信息检索和文本生成的技术框架，旨在增强生成模型的能力。将检索组件、生成组件结合，使得生成的文本不仅基于预训练模型中的知识，还能动态地从文档、其他数据源中检索最新的或特定领域的信息来辅助生成过程。例如在问答系统中，可以根据最新的资料提供答案，而不受限于模型训练时的知识库。总的来说，RAG为解决传统生成模型面临的知识限制问题，提供了有效的解决方案，尤其是在需要引用具体事实或最新信息的任务上表现尤为突出。
- 适用于所有AI模型和向量存储的Spring Boot自动配置和启动器，使用 https://start.spring.io 选择您想要的模型或向量存储。

## Quick Start

Spring AI 支持 spring boot 3.2.x 及更高版本，对JDK的最低要求是JDK17+。建议采用 Open JDK Zulu 或 Open JDK Temurin 的最新版本。在 pom.xml 中引入 Spring AI 物料清单，使用 BOM 可以避免依赖冲突，并且是经过测试的推荐版本。

```xml
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.0-SNAPSHOT</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>
```

添加 Milestone 和 Snapshot 存储库，因为 Spring AI 还没有 Release，所以没有上传到Maven中央仓库。

```xml
<repositories>
  <repository>
    <id>spring-milestones</id>
    <name>Spring Milestones</name>
    <url>https://repo.spring.io/milestone</url>
    <snapshots>
      <enabled>false</enabled>
    </snapshots>
  </repository>
  <repository>
    <id>spring-snapshots</id>
    <name>Spring Snapshots</name>
    <url>https://repo.spring.io/snapshot</url>
    <releases>
      <enabled>false</enabled>
    </releases>
  </repository>
</repositories>
```

以千问举例，引入阿里的starter，这官方适配Spring AI的依赖包，版本和 Spring AI 的版本是一致的。

```
<dependency>
    <groupId>com.alibaba.cloud.ai</groupId>
    <artifactId>spring-ai-alibaba-starter</artifactId>
    <version>1.0.0-SNAPSHOT</version>
</dependency>
```

申请 API Key，https://bailian.console.aliyun.com/?apiKey=1#/api-key，添加相关配置：

```yml
spring:
  ai:
    dash-scope: # 这是阿里的配置根路径
      api-key: xxx
```

指定聊天模型

```yml
spring:
  ai:
    dash-scope:
      chat:
        options:
          model: qwen-max # 阿里云的文档中有提供模型名称
```

### 使用ChatClient发送消息

```java
// 注入ChatModel，如果需要根据名称注入，则可以指定为：dashScopeChatModel
private final ChatModel chatModel;
```

#### 阻塞式传输

```java
@GetMapping("chat")
public String chat(@RequestParam String prompt) {
    return ChatClient.create(chatModel)
            .prompt()
            .user(prompt) // 用户输入的prompt
            .call() // 阻塞等待返回，结果可以是：ChatResponse、JaveBean、String
            .content();
}
```

#### 流式传输

```java
 @GetMapping(value = "chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
  public Flux<ServerSentEvent<String>> chat(@RequestParam String prompt) {
      return ChatClient.create(chatModel)
              .prompt()
              .user(prompt)
              .stream()
              .chatResponse()
              // ServerSendEvent（SSE）是一个轻量级的服务端向客户端单向推送的流式传输工具
              .map(chatResponse -> ServerSentEvent.builder(JSONUtil.toJsonStr(chatResponse)).event("message").build());
  }
```

#### ChatMermory

ChatMermory是一个记录与用户对话的组件，在聊天的模型中将用户与大模型API前几轮对话消息，发送给大模型的API是一个很常见的需求。它本身是一个接口，比如**InMemoryChatMemory**就是一个在JVM内存中记录的实现。可以按照自己的需求实现不同形式的存储，比如Redis、或数据库持久化存储。

#### MessageChatMemoryAdvisor

ChatMermory仅仅是一个存储和获取历史对话消息的接口，而MessageChatMemoryAdvisor则是ChatClient中的一部分，比如这样做：

```java
// 这里用InMemoryChatMemory做示例
private static final ChatMemory chatMemory = new InMemoryChatMemory();
```

```java
ChatClient.create(chatModel)
          .prompt()
          .user(prompt)
          // 从历史记录里取6条对话消息一起发送至模型的API。历史消息也是算在这一次对话Token消耗，要关注Token膨胀
          .advisors(new MessageChatMemoryAdvisor(chatMemory, sessionId, 6))
          .stream()
          .content()
          .map(chatResponse -> ServerSentEvent.builder(chatResponse).event("message").build());
```

### ETL

ETL的全称是Extract, Transform, Load，即抽取、转换、加载。我们可以利用ETL框架 [Apache Tika](https://tika.apache.org/2.9.0/formats.html)，将文档（.pdf .xlsx .docx .pptx .md .json等）导入至向量数据库，让AI模型能够从向量数据库中检索并生成答案。整体的流程图：

<img width='80%' src='https://www.jarcheng.top/blog/assets/etl-pipeline-tlEpEE9G.jpg'>

#### DocumentReader读取文档

- JsonReader：读取JSON
- TextReader：读取text文档
- PagePdfDocumentReader：读取PDF
- TikaDocumentReader：读取各种文件（.pdf .xlsx .docx .pptx .md .json）都支持

#### DocumentTransformer加工处理

- TextSplitter：文档切割成小块
- ContentFormatTransformer：将文档转换成键值对
- SummaryMetadataEnricher：使用大模型总结文档
- KeywordMetadataEnricher：使用大模型提取文档关键词

#### DocumentWriter负责文档写入

- VectorStore：写入到向量数据库
- FileDocumentWriter：写入到文件

#### 使用方式

Document对象是ETL的核心，它包含了文档的元数据和内容。

##### 引入相关依赖

```xml
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-tika-document-reader</artifactId>
</dependency>
```

##### 从输入流读取

```java
// 适合前端上传 MultipartFile 的场景
Resource resource = new InputStreamResource(file.getInputStream());
List<Document> documents = new TikaDocumentReader(resource).read();
```

##### 从本地文件读取

```java
Resource resource = new FileSystemResource("D:\\xxx.pdf");
List<Document> documents = new TikaDocumentReader(resource).read();
```

##### 从URL读取

```java
Resource resource = new UrlResource("http://oss.com/xxx.pdf");
List<Document> documents = new TikaDocumentReader(resource).read();
```

##### 内容转换

- TokenTextSplitter 可以把内容切割成更小的块，在RAG的时候可以提升响应速度、节省Token。
- ContentFormatTransformer 可以把元数据的内容变成字符串键值对。

```java
List<Document> documents = new TikaDocumentReader(resource).read();
// 这里示例用 TokenTextSplitter 分块
List<Document> splitDocuments = new TokenTextSplitter().apply(documents);
```

##### 元数据转换

- SummaryMetadataEnricher 使用大模型总结文档，在元数据里增加一个**summary**字段。
- KeywordMetadataEnricher 使用大模型提取文档关键词，在元数据里面增加一个**keywords**字段。

##### 存储

这里需要引入一个新的组件**向量数据库**（VectorStore），这是AI记忆的核心组件。前面提到的**ChatMemory**属于短期记忆的组件，一般只在聊天对话的上下文中生效。而**VectorStore**是持久化存储的，也就是大家常说的AI知识库。

什么是向量？我这里贴一段通义千问的回答：

向量在向量数据库中指的是数学意义上的向量，即一维数组，它可以包含实数或复数。但在计算机科学和信息技术领域，特别是在机器学习、人工智能以及数据检索的上下文中，向量通常是指特征向量。这些向量用来表示数据点或对象的特征，每个元素代表一个特定的特征值。

例如，在文本处理中，文档可以用词频-逆文档频率（TF-IDF）向量或者词嵌入（如Word2Vec或GloVe模型生成的向量）来表示；在图像识别中，图像可以转换为一个描述其视觉特征的向量；在推荐系统中，用户偏好和物品属性也可以被编码成向量形式。

向量数据库就是专门设计用来存储、索引和查询这些高维度向量数据的数据库系统。它们优化了相似度搜索（比如通过计算向量之间的距离，如欧氏距离、余弦相似度等），使得能够快速找到与给定向量最接近的数据点。这种能力对于实现诸如图像搜索、语音识别、自然语言处理等任务非常有用。

还记得前面提到的**Embedding**（嵌入模型）吗？这个组件就是SpringAI框架中用于把文档、音视频转换成向量的。向量化以后，可以使用向量数据库进行存储，下面用**RedisStack**来进行示例。

##### 引入redis相关依赖

```xml
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-redis-store</artifactId>
</dependency>
<dependency>
    <groupId>redis.clients</groupId>
    <artifactId>jedis</artifactId>
</dependency>
```

##### 配置连接参数

```yml
spring:
  data:
    redis:
      host: 地址
      port: 端口
      password: 密码
      repositories:
        enabled: false
```

如果项目本身也用到了redis做为缓存或者分布式锁，可能会导致配置冲突，可以排除RedisVectorStoreAutoConfiguration，手动配置来规避。

```java
@Configuration
@EnableAutoConfiguration(exclude = {RedisVectorStoreAutoConfiguration.class})
@EnableConfigurationProperties({RedisVectorStoreProperties.class})
@AllArgsConstructor
public class RedisVectorConfig {
    @Bean
    public VectorStore vectorStore(EmbeddingModel embeddingModel,
                                   RedisVectorStoreProperties properties,
                                   RedisConnectionDetails redisConnectionDetails) {
        RedisVectorStore.RedisVectorStoreConfig config =
                RedisVectorStore.RedisVectorStoreConfig.builder().withIndexName(properties.getIndex()).withPrefix(properties.getPrefix()).build();
        return new RedisVectorStore(config, embeddingModel,
                new JedisPooled(redisConnectionDetails.getStandalone().getHost(),
                        redisConnectionDetails.getStandalone().getPort()
                        , redisConnectionDetails.getUsername(),
                        redisConnectionDetails.getPassword()),
                properties.isInitializeSchema());
    }
}
```

##### 声明Embedding的模型

```yml
dash-scope:
  embedding:
    options:
      model: text-embedding-v2
```

##### 使用示例

注入模型

```java
private final EmbeddingModel embeddingModel;
```

注入VectorStore组件

```java
private final VectorStore vectorStore;
```

向量化

```
public void embedding() {
    float[] embed = embeddingModel.embed("Hello World");
}
```

向量化存储文档

```java
List<Document> documents = new TikaDocumentReader(resource).read();
List<Document> splitDocuments = new TokenTextSplitter().apply(documents);
vectorStore.add(splitDocuments);
```
**vectorStore.add**会自动调用embeddingModel完成向量化并存储。

查询向量化存储的文档

```java
// query可以是用户输入的文本字符串
List<Document> list = vectorStore.similaritySearch(query);
```