query

package

v1.1.8 Latest Latest Go to latest Published: Apr 6, 2026 License: MIT Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/DotNetAge/gorag

Links

Open Source Insights

README ¶

Query Processing - 查询处理

什么是查询处理？

查询处理是 RAG 系统的前处理模块，负责对用户查询进行一系列智能分析与转换，以提高检索质量和准确性。

核心组件

组件	文件	功能
IntentRouter	classifier.go	将查询分类为 5 种意图
Decomposer	decomposer.go	将复杂问题分解为子问题
EntityExtractor	extractor.go	从查询中提取关键实体
HyDE	hyde.go	生成假设性文档
Rewriter	rewriter.go	重写和优化查询
StepBack	stepback.go	生成抽象背景问题

核心原理

用户查询 "How does the semantic cache improve RAG performance?"
        ↓
    [查询处理管道]
        ↓
    ┌─ IntentRouter: domain_specific（领域特定查询）
    ├─ EntityExtractor: [semantic cache, RAG]
    ├─ Decomposer: 拆分为子问题
    ├─ Rewriter: 优化查询表达
    └─ StepBack: 生成背景问题
        ↓
    [增强后的查询用于向量检索]

有什么作用？

1. IntentRouter（意图路由）

智能分类：将查询分为 5 种意图
- chat: 闲聊问答
- domain_specific: 领域知识检索
- relational: 关系型查询（图检索）
- global: 全局总结
- fact_check: 事实核查
动态路由：根据意图选择最适合的检索策略

2. Decomposer（问题分解）

复杂问题拆分：将多跳问题分解为 2-5 个独立子问题
并行检索：子问题可并行检索，提高效率
完整覆盖：确保复杂查询的各个方面都被覆盖

3. EntityExtractor（实体提取）

关键实体识别：提取查询中的人名、地点、概念等
关系发现：为图检索提供节点信息
检索增强：实体信息可用于精确匹配

4. HyDE（假设性文档）

答案预生成：生成假设性的高质量答案
检索增强：用假设答案去找相似文档
克服语义鸿沟：假设答案可能包含原查询未明确表达的关键词

5. Rewriter（查询重写）

消除歧义：将模糊表述改为清晰描述
去除冗余：移除无意义的闲聊词汇
指代消解：将代词解析为具体实体

6. StepBack（回退问题）

抽象提升：生成更高层次的背景问题
双重检索：既检索原问题，也检索背景问题
防止遗漏：确保重要基础概念被检索到

怎么工作的？

1. 查询处理流程

用户查询
    ↓
IntentRouter 分类
    ↓
    ├── chat/fact_check → 直接生成答案（跳过检索）
    └── domain_specific/relational/global
            ↓
        EntityExtractor 提取实体
            ↓
        Decomposer 分解问题（可选）
            ↓
        Rewriter 重写查询（可选）
            ↓
        StepBack 生成背景问题（可选）
            ↓
        HyDE 生成假设文档（可选）
            ↓
        多路检索
            ↓
        答案生成

2. 组件协作模式

┌─────────────┐     ┌─────────────┐
│IntentRouter │────▶│  Router     │
└─────────────┘     │  Decision   │
                    └──────┬──────┘
                           │
         ┌─────────────────┼─────────────────┐
         ▼                 ▼                 ▼
┌─────────────┐     ┌─────────────┐   ┌─────────────┐
│ Decomposer  │     │   HyDE      │   │  Rewriter   │
└─────────────┘     └─────────────┘   └─────────────┘
         │                 │                 │
         └─────────────────┼─────────────────┘
                           ▼
                    ┌─────────────┐
                    │   检索增强   │
                    └─────────────┘

我们怎么实现的？

包结构

pkg/retrieval/query/
├── classifier.go    # IntentRouter 意图路由
├── decomposer.go    # Decomposer 问题分解
├── extractor.go     # EntityExtractor 实体提取
├── hyde.go          # HyDE 假设性文档
├── rewriter.go      # Rewriter 查询重写
└── stepback.go      # StepBack 回退问题

1. IntentRouter（classifier.go）

router := query.NewIntentRouter(llm,
    query.WithDefaultIntent(core.IntentDomainSpecific),
    query.WithMinConfidence(0.7),
)

// 分类查询
result, err := router.Classify(ctx, core.NewQuery("1", "What is RAG?", nil))
// result.Intent: domain_specific
// result.Confidence: 0.95

2. Decomposer（decomposer.go）

decomposer := query.NewDecomposer(llm,
    query.WithMaxSubQueries(5),
)

// 分解复杂问题
result, err := decomposer.Decompose(ctx, core.NewQuery("1", complexQuery, nil))
// result.SubQueries: ["子问题1", "子问题2", ...]
// result.IsComplex: true

3. EntityExtractor（extractor.go）

extractor := query.NewEntityExtractor(llm)

// 提取实体
result, err := extractor.Extract(ctx, core.NewQuery("1", "Who is the CEO of Apple?", nil))
// result.Entities: ["Apple", "CEO"]

4. HyDE（hyde.go）

hyde := query.NewHyDE(llm)

// 生成假设文档
doc, err := hyde.Generate(ctx, core.NewQuery("1", "What is semantic cache?", nil))
// doc: "A semantic cache is a ..."

5. Rewriter（rewriter.go）

rewriter := query.NewRewriter(llm)

// 重写查询
newQuery, err := rewriter.Rewrite(ctx, core.NewQuery("1", "tell me about it", nil))
// newQuery.Text: "What is semantic cache?"

6. StepBack（stepback.go）

stepback := query.NewStepBack(llm)

// 生成回退问题
backQuery, err := stepback.GenerateStepBackQuery(ctx, core.NewQuery("1", "How does Go channel work?", nil))
// backQuery.Text: "What are the concurrency primitives in Go?"

如何与项目集成？

方式一：独立使用各组件

// 根据需要选择组件
router := query.NewIntentRouter(llm)
decomposer := query.NewDecomposer(llm)
rewriter := query.NewRewriter(llm)

// 单独使用
intent, _ := router.Classify(ctx, userQuery)
subs, _ := decomposer.Decompose(ctx, userQuery)
rewritten, _ := rewriter.Rewrite(ctx, userQuery)

方式二：组合使用（Pipeline）

// 构建查询处理管道
steps := []pipeline.Step{
    query.Classify(router),
    query.Extract(extractor),
    query.Decompose(decomposer),
    query.Rewrite(rewriter),
    query.StepBack(stepback),
    query.GenerateHyDE(hyde),
}

p := pipeline.New[*core.Query]()
for _, s := range steps {
    p.AddStep(s)
}

方式三：在 RAG 中启用

// Advanced RAG 可配置查询处理选项
app, _ := gorag.DefaultAdvancedRAG(
    gorag.WithWorkDir("./data"),
    gorag.WithQueryDecomposition(true),   // 启用问题分解
    gorag.WithQueryRewriting(true),       // 启用查询重写
    gorag.WithStepBack(true),             // 启用回退问题
)

测试状态

组件	测试覆盖	状态
IntentRouter	11 个测试	✅ 通过
Decomposer	5 个测试	✅ 通过
Rewriter	4 个测试	✅ 通过
StepBack	3 个测试	✅ 通过
EntityExtractor	无测试	⏳ 待补充
HyDE	无测试	⏳ 待补充

适用于哪些场景？

✅ IntentRouter 适用

多模态检索：需要根据意图选择不同检索路径
智能路由：自动判断查询类型
系统优化：闲聊直接返回，跳过昂贵检索

✅ Decomposer 适用

多跳问答：需要多步推理的复杂问题
全面检索：确保各个方面都被检索到
组合问题：可以分解为多个独立问题

✅ HyDE 适用

语义模糊查询：查询与文档表述差异大
专业领域：需要专业术语补充
冷启动场景：没有太多相关文档时

✅ Rewriter 适用

口语化查询：去除闲聊成分
指代不明确：代词消解
检索效果差：原始查询检索质量不佳

✅ StepBack 适用

专业问题：需要背景知识支撑
深层理解：防止遗漏基础概念
学术写作：需要全面背景调研

❌ 不适合使用

简单查询：单跳直接回答
实时性要求高：额外 LLM 调用增加延迟
资源受限：LLM 调用成本敏感

配置推荐

场景	推荐配置
高质量检索	全组件启用
平衡模式	IntentRouter + Rewriter
快速响应	仅 IntentRouter
复杂推理	Decomposer + StepBack + HyDE

// 生产环境推荐（平衡质量与性能）
app, _ := gorag.DefaultAdvancedRAG(
    gorag.WithWorkDir("./data"),
    gorag.WithQueryDecomposition(true),
    gorag.WithQueryRewriting(true),
)

Documentation ¶

Overview ¶

Package query provides query processing components for the RAG system. It includes query decomposition, rewriting, expansion, and classification capabilities to improve retrieval quality and handle complex user queries.

Index ¶

func NewEntityExtractor(llm chat.Client, opts ...EntityExtractorOption) *entityExtractor
func NewIntentRouter(llm chat.Client, opts ...IntentRouterOption) *intentRouter
func NewKeywordExtractor(opts ...KeywordExtractorOption) *keywordExtractor
func NewVectorExtractor(graphStore core.GraphStore, embedder embedding.Provider, ...) *vectorExtractor
type Decomposer
- func NewDecomposer(llm chat.Client, opts ...DecomposerOption) *Decomposer
- func (d *Decomposer) Decompose(ctx context.Context, query *core.Query) (*core.DecompositionResult, error)
type DecomposerOption
- func WithDecomposerCollector(collector observability.Collector) DecomposerOption
- func WithDecomposerLogger(logger logging.Logger) DecomposerOption
- func WithDecompositionPromptTemplate(tmpl string) DecomposerOption
- func WithMaxSubQueries(max int) DecomposerOption
type EntityExtractorOption
- func WithEntityExtractionPromptTemplate(tmpl string) EntityExtractorOption
- func WithEntityExtractorCollector(collector observability.Collector) EntityExtractorOption
- func WithEntityExtractorLogger(logger logging.Logger) EntityExtractorOption
type HyDE
- func NewHyDE(llm chat.Client) *HyDE
- func (h *HyDE) Generate(ctx context.Context, query *core.Query, chunks []*core.Chunk) (*core.Result, error)
- func (h *HyDE) GenerateHypotheticalDocument(ctx context.Context, query *core.Query) (string, error)
type IntentRouterOption
- func WithDefaultIntent(intent core.IntentType) IntentRouterOption
- func WithIntentPromptTemplate(tmpl string) IntentRouterOption
- func WithIntentRouterCollector(collector observability.Collector) IntentRouterOption
- func WithIntentRouterLogger(logger logging.Logger) IntentRouterOption
- func WithMinConfidence(v float32) IntentRouterOption
type KeywordExtractorOption
- func WithKeywordExtractorCollector(collector observability.Collector) KeywordExtractorOption
- func WithKeywordExtractorFilter(filter func(word string) bool) KeywordExtractorOption
- func WithKeywordExtractorLogger(logger logging.Logger) KeywordExtractorOption
- func WithKeywordExtractorMaxEntities(max int) KeywordExtractorOption
- func WithKeywordExtractorMinWordLen(minLen int) KeywordExtractorOption
- func WithKeywordExtractorStopWords(stopWords []string) KeywordExtractorOption
type Rewriter
- func NewRewriter(llm chat.Client) *Rewriter
- func (r *Rewriter) Rewrite(ctx context.Context, query *core.Query) (*core.Query, error)
type StepBack
- func NewStepBack(llm chat.Client) *StepBack
- func (s *StepBack) GenerateStepBackQuery(ctx context.Context, query *core.Query) (*core.Query, error)
type VectorExtractorOption
- func WithVectorExtractorCollector(collector observability.Collector) VectorExtractorOption
- func WithVectorExtractorLogger(logger logging.Logger) VectorExtractorOption
- func WithVectorExtractorMinLen(minLen int) VectorExtractorOption
- func WithVectorExtractorThreshold(threshold float64) VectorExtractorOption
- func WithVectorExtractorTopK(topK int) VectorExtractorOption

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewEntityExtractor ¶

func NewEntityExtractor(llm chat.Client, opts ...EntityExtractorOption) *entityExtractor

NewEntityExtractor creates a new entity extractor.

func NewIntentRouter ¶

func NewIntentRouter(llm chat.Client, opts ...IntentRouterOption) *intentRouter

NewIntentRouter creates a new intent router.

func NewKeywordExtractor ¶ added in v1.1.6

func NewKeywordExtractor(opts ...KeywordExtractorOption) *keywordExtractor

NewKeywordExtractor creates a new keyword-based entity extractor. It identifies potential entities using heuristics like capitalization, quoted strings, and non-stopwords, without requiring an LLM.

func NewVectorExtractor ¶ added in v1.1.6

func NewVectorExtractor(
	graphStore core.GraphStore,
	embedder embedding.Provider,
	opts ...VectorExtractorOption,
) *vectorExtractor

NewVectorExtractor creates a new vector-based entity extractor. It extracts candidate entities using heuristics and validates them against the graph store. If an embedder is provided, it can optionally re-rank entities by semantic similarity.

Types ¶

type Decomposer ¶

type Decomposer struct {
	// contains filtered or unexported fields
}

Decomposer implements core.QueryDecomposer to break down complex queries. It uses an LLM to analyze queries and generate simpler sub-queries that can be processed independently, enabling multi-hop retrieval for complex questions.

Query decomposition is useful for:

Multi-hop questions: "What is the revenue of the company that acquired GitHub?"
Comparative questions: "Compare the features of Product A and Product B"
Temporal questions: "How has the market changed since 2020?"

Example:

llm := openai.NewClient(apiKey)
decomposer := query.NewDecomposer(llm,
    query.WithMaxSubQueries(3),
    query.WithDecomposerLogger(logger),
)
result, err := decomposer.Decompose(ctx, query)
// result.SubQueries contains ["What company acquired GitHub?", "What is the revenue of that company?"]

func NewDecomposer ¶

func NewDecomposer(llm chat.Client, opts ...DecomposerOption) *Decomposer

NewDecomposer creates a new query decomposer with the given LLM client. The decomposer uses default settings unless modified by options.

Parameters:

llm: LLM client for generating decompositions
opts: Optional configuration functions

Returns:

*Decomposer: Configured decomposer instance

Example:

decomposer := query.NewDecomposer(llm,
    query.WithMaxSubQueries(3),
    query.WithDecomposerLogger(logger),
)

func (*Decomposer) Decompose ¶

func (d *Decomposer) Decompose(ctx context.Context, query *core.Query) (*core.DecompositionResult, error)

Decompose breaks down a complex query into simpler sub-queries. It uses the configured LLM to analyze the query and generate sub-queries that can be answered independently.

Parameters:

ctx: Context for cancellation and timeout
query: The query to decompose

Returns:

*core.DecompositionResult: Contains sub-queries, reasoning, and complexity flag
error: Any error that occurred during decomposition

The result includes:

SubQueries: List of simpler questions
Reasoning: Explanation of the decomposition strategy
IsComplex: Whether the original query was deemed complex

If decomposition fails or the query is simple, returns the original query as a single sub-query.

type DecomposerOption ¶

type DecomposerOption func(*Decomposer)

DecomposerOption configures a Decomposer instance.

func WithDecomposerCollector ¶

func WithDecomposerCollector(collector observability.Collector) DecomposerOption

WithDecomposerCollector sets an observability collector for metrics.

Parameters:

collector: Metrics collector (if nil, no-op collector is used)

Returns:

DecomposerOption: Configuration function

func WithDecomposerLogger ¶

func WithDecomposerLogger(logger logging.Logger) DecomposerOption

WithDecomposerLogger sets a structured logger for the decomposer.

Parameters:

logger: Logger implementation (if nil, no-op logger is used)

Returns:

DecomposerOption: Configuration function

func WithDecompositionPromptTemplate ¶

func WithDecompositionPromptTemplate(tmpl string) DecomposerOption

WithDecompositionPromptTemplate sets a custom prompt template for decomposition. The template should contain one %s placeholder for the query text.

Parameters:

tmpl: Custom prompt template (must contain %s placeholder)

Returns:

DecomposerOption: Configuration function

Example:

decomposer := query.NewDecomposer(llm,
    query.WithDecompositionPromptTemplate("Custom prompt: %s"),
)

func WithMaxSubQueries ¶

func WithMaxSubQueries(max int) DecomposerOption

WithMaxSubQueries sets the maximum number of sub-queries to generate. Default is 5. If decomposition produces more, they are truncated.

Parameters:

max: Maximum number of sub-queries (must be > 0)

Returns:

DecomposerOption: Configuration function

type EntityExtractorOption ¶

type EntityExtractorOption func(*entityExtractor)

EntityExtractorOption configures an entityExtractor instance.

func WithEntityExtractionPromptTemplate ¶

func WithEntityExtractionPromptTemplate(tmpl string) EntityExtractorOption

WithEntityExtractionPromptTemplate overrides the default extraction prompt.

func WithEntityExtractorCollector ¶

func WithEntityExtractorCollector(collector observability.Collector) EntityExtractorOption

WithEntityExtractorCollector sets an observability collector.

func WithEntityExtractorLogger ¶

func WithEntityExtractorLogger(logger logging.Logger) EntityExtractorOption

WithEntityExtractorLogger sets a structured logger.

type HyDE ¶

type HyDE struct {
	// contains filtered or unexported fields
}

HyDE generates hypothetical answers to improve search results.

func NewHyDE ¶

func NewHyDE(llm chat.Client) *HyDE

NewHyDE creates a new HyDE generator.

func (*HyDE) Generate ¶

func (h *HyDE) Generate(ctx context.Context, query *core.Query, chunks []*core.Chunk) (*core.Result, error)

Generate implements core.Generator.

func (*HyDE) GenerateHypotheticalDocument ¶

func (h *HyDE) GenerateHypotheticalDocument(ctx context.Context, query *core.Query) (string, error)

GenerateHypotheticalDocument generates a hypothetical document.

type IntentRouterOption ¶

type IntentRouterOption func(*intentRouter)

IntentRouterOption configures an intentRouter instance.

func WithDefaultIntent ¶

func WithDefaultIntent(intent core.IntentType) IntentRouterOption

WithDefaultIntent sets the fallback intent when LLM confidence is low.

func WithIntentPromptTemplate ¶

func WithIntentPromptTemplate(tmpl string) IntentRouterOption

WithIntentPromptTemplate overrides the default intent classification prompt.

func WithIntentRouterCollector ¶

func WithIntentRouterCollector(collector observability.Collector) IntentRouterOption

WithIntentRouterCollector sets an observability collector.

func WithIntentRouterLogger ¶

func WithIntentRouterLogger(logger logging.Logger) IntentRouterOption

WithIntentRouterLogger sets a structured logger.

func WithMinConfidence ¶

func WithMinConfidence(v float32) IntentRouterOption

WithMinConfidence sets the minimum confidence threshold.

type KeywordExtractorOption ¶ added in v1.1.6

type KeywordExtractorOption func(*keywordExtractor)

KeywordExtractorOption configures a keywordExtractor instance.

func WithKeywordExtractorCollector ¶ added in v1.1.6

func WithKeywordExtractorCollector(collector observability.Collector) KeywordExtractorOption

WithKeywordExtractorCollector sets an observability collector.

func WithKeywordExtractorFilter ¶ added in v1.1.6

func WithKeywordExtractorFilter(filter func(word string) bool) KeywordExtractorOption

WithKeywordExtractorFilter sets a custom filter function for entity candidates.

func WithKeywordExtractorLogger ¶ added in v1.1.6

func WithKeywordExtractorLogger(logger logging.Logger) KeywordExtractorOption

WithKeywordExtractorLogger sets a structured logger.

func WithKeywordExtractorMaxEntities ¶ added in v1.1.6

func WithKeywordExtractorMaxEntities(max int) KeywordExtractorOption

WithKeywordExtractorMaxEntities sets maximum entities to return. Default is 10.

func WithKeywordExtractorMinWordLen ¶ added in v1.1.6

func WithKeywordExtractorMinWordLen(minLen int) KeywordExtractorOption

WithKeywordExtractorMinWordLen sets minimum word length to consider. Default is 2.

func WithKeywordExtractorStopWords ¶ added in v1.1.6

func WithKeywordExtractorStopWords(stopWords []string) KeywordExtractorOption

WithKeywordExtractorStopWords sets custom stop words to exclude.

type Rewriter ¶

type Rewriter struct {
	// contains filtered or unexported fields
}

Rewriter uses an LLM to rewrite and improve user queries. Query rewriting enhances search quality by:

Removing conversational filler words
Resolving ambiguous pronouns and references
Making queries more specific and searchable
Expanding abbreviated terms

This is particularly useful for:

Conversational queries: "What about their revenue?" → "What is [Company X]'s revenue?"
Vague queries: "the thing" → specific entity name
Informal language: "how do I fix it" → "How do I fix [specific error]"

Example:

llm := openai.NewClient(apiKey)
rewriter := query.NewRewriter(llm)
rewritten, err := rewriter.Rewrite(ctx, originalQuery)
// rewritten.Text contains the improved query

func NewRewriter ¶

func NewRewriter(llm chat.Client) *Rewriter

NewRewriter creates a new query rewriter with the given LLM client.

Parameters:

llm: LLM client for query rewriting

Returns:

*Rewriter: Configured rewriter instance

Example:

rewriter := query.NewRewriter(llm)
rewritten, err := rewriter.Rewrite(ctx, query)

func (*Rewriter) Rewrite ¶

func (r *Rewriter) Rewrite(ctx context.Context, query *core.Query) (*core.Query, error)

Rewrite rewrites the user's query to improve search quality. It uses an LLM to transform the query into a clearer, more specific form that is better suited for vector database search.

Parameters:

ctx: Context for cancellation and timeout
query: The original query to rewrite

Returns:

*core.Query: New query object with rewritten text
error: Any error that occurred during rewriting

The rewritten query:

Has a new unique ID
Contains improved, clearer text
Is better suited for semantic search

If rewriting fails, returns the original query text in a new query object.

type StepBack ¶

type StepBack struct {
	// contains filtered or unexported fields
}

StepBack abstracts queries into higher-level background questions.

func NewStepBack ¶

func NewStepBack(llm chat.Client) *StepBack

NewStepBack creates a new StepBack generator.

func (*StepBack) GenerateStepBackQuery ¶

func (s *StepBack) GenerateStepBackQuery(ctx context.Context, query *core.Query) (*core.Query, error)

GenerateStepBackQuery generates a step-back query.

type VectorExtractorOption ¶ added in v1.1.6

type VectorExtractorOption func(*vectorExtractor)

VectorExtractorOption configures a vectorExtractor instance.

func WithVectorExtractorCollector ¶ added in v1.1.6

func WithVectorExtractorCollector(collector observability.Collector) VectorExtractorOption

WithVectorExtractorCollector sets an observability collector.

func WithVectorExtractorLogger ¶ added in v1.1.6

func WithVectorExtractorLogger(logger logging.Logger) VectorExtractorOption

WithVectorExtractorLogger sets a structured logger.

func WithVectorExtractorMinLen ¶ added in v1.1.6

func WithVectorExtractorMinLen(minLen int) VectorExtractorOption

WithVectorExtractorMinLen sets minimum entity length. Default is 2.

func WithVectorExtractorThreshold ¶ added in v1.1.6

func WithVectorExtractorThreshold(threshold float64) VectorExtractorOption

WithVectorExtractorThreshold sets the similarity threshold for entity matching. Default is 0.7. Lower values return more entities but with lower precision.

func WithVectorExtractorTopK ¶ added in v1.1.6

func WithVectorExtractorTopK(topK int) VectorExtractorOption

WithVectorExtractorTopK sets the maximum number of entities to return. Default is 5.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL