pdf-qa-server/src/main/java/com/supervision/pdfqaserver/cache/PromptCache.java

186 lines
6.7 KiB
Java

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

package com.supervision.pdfqaserver.cache;
import java.util.HashMap;
import java.util.Map;
/**
* 提示词缓存
*/
public class PromptCache {
public static final String DOERE_TEXT = "DOERE_TEXT";
public static final String DOERE_TABLE = "DOERE_TABLE";
public static final Map<String, String> promptMap = new HashMap<>();
static {
init();
}
private static void init(){
promptMap.put(DOERE_TEXT, DOERE_TEXT_PROMPT);
promptMap.put(DOERE_TABLE, DOERE_TABLE_PROMPT);
}
private static final String DOERE_TEXT_PROMPT = """
你是一个高级信息抽取引擎请从给定文本中提取以下结构化信息并以JSON格式输出
1. **节点提取**
- 识别所有实体作为节点
- 自动推断每个节点的类型
- 记录节点的所有相关属性(键值对形式)
2. **关系提取**
- 识别所有节点间的关系
- 自动推断关系类型
- 记录关系的所有相关属性(键值对形式)
3. **类型化三元组**
- 生成由 (头节点类型, 关系类型, 尾节点类型) 组成的元组
**输出要求**
- 使用如下JSON Schema
{
"nodes": [
{
"name": "",
"type": "",
"attributes": {
"1": "1",
"2": "2"
}
}
],
"relations": [
{
"source": "",
"target": "",
"type": "",
"attributes": {
"1": "1"
}
}
],
"typed_triplets": [
["", "", ""]
]
}
**处理规则**
1. 节点类型和关系类型由你根据上下文语义自动创建(如""/""/""
2. 属性字段应包含文本中明确提及或可推导的特征(如数值、时间、状态等)
3. 对同一实体的不同指代需进行合并(如""和"·"
**示例文本**
"1905"
**期望输出**
{
"nodes": [
{
"name": "",
"type": "",
"attributes": {
"": ""
}
},
{
"name": "",
"type": "",
"attributes": {
"": 1905,
"": ""
}
},
{
"name": "",
"type": "",
"attributes": {
"": ""
}
}
],
"relations": [
{
"source": "",
"target": "",
"type": "",
"attributes": {
"": 1905
}
},
{
"source": "",
"target": "",
"type": "",
"attributes": {
"": ""
}
}
],
"typed_triplets": [
["", "", ""],
["", "", ""]
]
}
请处理以下文本:
{}
""";
private static final String DOERE_TABLE_PROMPT = """
你是一个表格数据处理专家,请严格按以下要求从给出的表格中提取数据:
**处理规则:**
1. 完全保留原始表头字段名称,不做任何中英文转换或修改
2. 将每行数据转换为一个独立对象
3. 所有数值保留原始格式(包括逗号分隔符和小数点)
4. 表格第一列作为主键字段
**输出格式:**
```json
{
"table_data": [
{
"[]": "[]",
"[]": "[]",
"[]": "[]"
},
// 后续行...
]
}
```
**示例表格:**
| 账龄 | 期末余额 | 年初余额 |
| --- | --- | --- |
| 1年以内 | 310,844,201.27 | 337,641,834.84 |
| 1至2年 | 52,374,904.35 | 15,041,750.36 |
**期望输出:**
{
"table_data": [
{
"": "1",
"": "310,844,201.27",
"": "337,641,834.84"
},
{
"": "12",
"": "52,374,904.35",
"": "15,041,750.36"
}
]
}
请处理以下表格:
{}
""";
}