observability

package
v1.1.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 6, 2026 License: MIT Imports: 13 Imported by: 0

README

Observability (可观测性:监控与链路追踪)

pkg/observability 为 GoRAG 提供了工业级的监控指标(Metrics)和分布式链路追踪(Tracing)支持。

为什么可观测性对 RAG 至关重要?

RAG 系统通常涉及多个外部调用(如向量数据库、LLM API)。当用户反馈“回答太慢”或“回答不对”时,如果没有可观测性,排查将极其困难。

  • Metrics 告诉你系统哪里慢、调用了多少次 Token。
  • Tracing 告诉你具体的请求在哪个组件发生了延迟或错误。
  • Quality Analysis 自动评估 RAG 回答的忠诚度(Faithfulness)与相关性(Relevance)。

1. 监控指标 (Metrics)

核心接口:core.Metrics

我们提供了一个符合 Cloud Native 标准的 Prometheus 实现。

导出指标清单
指标名称 类型 描述 标签 (Labels)
gorag_queries_total Counter QPS 监控:累计接收到的查询总数 engine
gorag_search_duration_seconds Histogram 性能监控:向量搜索/生成耗时分布 engine
gorag_qa_quality_score Histogram 质量评估:RAGAS 指标评分 (0.0-1.0) metric (faithfulness, relevance)
gorag_llm_tokens_total Counter 成本监控:累计消耗的 Token 数量 model, type
gorag_search_result_count Counter 召回监控:每次搜索命中的 Chunk 数量 engine
gorag_indexing_duration_seconds Histogram 工程监控:文档解析与入库耗时 parser
gorag_embedding_count Counter 累计生成的向量数量 -
如何使用
// 启动一个指标服务器在 :8080/metrics 供 Prometheus 抓取
metrics := observability.DefaultPrometheusMetrics(":8080")

// 在构建 RAG 时注入
rag, _ := gorag.DefaultNativeRAG(
    gorag.WithMetrics(metrics),
)

2. 链路追踪 (Tracing)

核心接口:Tracer & Span

我们提供了一个符合工业标准 OpenTelemetry (OTel) 的实现。

功能特点
  • 分布式上下文传递:支持将 Trace ID 从业务上层传递到最底层的向量数据库调用。
  • 导出器支持:支持导出到 Jaeger, Zipkin, Honeycomb 或任何 OTel 兼容的后端。
  • 自动语义映射:将 RAG 内部的“分块”、“召回”、“重排序”自动映射为 Trace 中的不同阶段。
如何使用
// 初始化 OTel 追踪并导出到本地 Jaeger 代理
tracer, _ := observability.DefaultOpenTelemetryTracer(ctx, "localhost:4317", "MyRAGApp")

// 在构建 RAG 时注入
rag, _ := gorag.DefaultNativeRAG(
    gorag.WithTracer(tracer),
)

我们是如何计算质量指标的?

GoRAG 采用 异步评估机制

  1. 当你调用 rag.Search 获得答案后。
  2. 如果你在容器中注册了 RAGEvaluator 接口。
  3. 框架会启动一个后台 Goroutine,利用 LLM 作为 Judge 自动计算 Faithfulness(是否忠实于原文,无幻觉)和 Relevance(是否精准回答了用户问题)。
  4. 结果将实时上报到 Prometheus 的 gorag_qa_quality_score

提示:建议配合 Grafana 面板监控 gorag_qa_quality_score 的 P95 值,确保你的知识库质量始终在线。

Documentation

Overview

Package observability provides metrics collection and distributed tracing capabilities. It offers interfaces and implementations for monitoring RAG system performance, tracking operations, and integrating with observability platforms.

The package provides:

  • Metrics collection for search, indexing, and LLM operations
  • Distributed tracing support for debugging and performance analysis
  • No-op implementations for testing and default usage

Example usage:

// Create a metrics collector
metrics := observability.NewPrometheusMetrics()
metrics.RecordSearchDuration("vector", time.Second)
metrics.RecordQueryCount("hybrid")

// Create a tracer
tracer := observability.NewOpenTelemetryTracer()
ctx, span := tracer.StartSpan(context.Background(), "search")
defer span.End()

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Collector

type Collector interface {
	// RecordDuration records the duration of an operation.
	// Use this to track performance metrics like search latency.
	//
	// Parameters:
	//   - operation: Name of the operation (e.g., "search", "index")
	//   - duration: How long the operation took
	//   - labels: Additional labels for filtering and grouping
	RecordDuration(operation string, duration time.Duration, labels map[string]string)

	// RecordCount records the count of an operation.
	// Use this to track operation frequency and success rates.
	//
	// Parameters:
	//   - operation: Name of the operation
	//   - status: Status of the operation (e.g., "success", "failure")
	//   - labels: Additional labels for filtering and grouping
	RecordCount(operation string, status string, labels map[string]string)

	// RecordValue records a custom metric value.
	// Use this for domain-specific metrics that don't fit other categories.
	//
	// Parameters:
	//   - metricName: Name of the metric
	//   - value: The metric value
	//   - labels: Additional labels for filtering and grouping
	RecordValue(metricName string, value float64, labels map[string]string)
}

Collector defines the interface for collecting metrics. Deprecated: Use core.Metrics instead for more comprehensive RAG-specific metrics.

This interface provides basic metric collection capabilities for:

  • Recording operation durations
  • Counting operations (success/failure)
  • Recording custom metric values

func DefaultNoopCollector added in v1.1.3

func DefaultNoopCollector() Collector

DefaultNoopCollector creates a no-op collector that discards all metrics. Use this when metrics collection is not needed or during testing.

Returns:

  • Collector: A collector that does nothing

type NoopMetrics added in v1.1.4

type NoopMetrics struct{}

NoopMetrics is a no-op implementation of core.Metrics. It implements all metrics methods as no-ops, useful for testing or when metrics collection is disabled.

func (*NoopMetrics) RecordEmbeddingCount added in v1.1.4

func (m *NoopMetrics) RecordEmbeddingCount(int)

func (*NoopMetrics) RecordIndexingDuration added in v1.1.4

func (m *NoopMetrics) RecordIndexingDuration(string, any)

func (*NoopMetrics) RecordIndexingResult added in v1.1.4

func (m *NoopMetrics) RecordIndexingResult(string, int)

func (*NoopMetrics) RecordLLMTokenUsage added in v1.1.4

func (m *NoopMetrics) RecordLLMTokenUsage(string, int, int)

func (*NoopMetrics) RecordQueryCount added in v1.1.4

func (m *NoopMetrics) RecordQueryCount(string)

func (*NoopMetrics) RecordRAGEvaluation added in v1.1.4

func (m *NoopMetrics) RecordRAGEvaluation(string, float32)

func (*NoopMetrics) RecordSearchDuration added in v1.1.4

func (m *NoopMetrics) RecordSearchDuration(string, any)

func (*NoopMetrics) RecordSearchError added in v1.1.4

func (m *NoopMetrics) RecordSearchError(string, error)

func (*NoopMetrics) RecordSearchResult added in v1.1.4

func (m *NoopMetrics) RecordSearchResult(string, int)

func (*NoopMetrics) RecordVectorStoreOperations added in v1.1.4

func (m *NoopMetrics) RecordVectorStoreOperations(string, int)

type PrometheusMetrics added in v1.1.3

type PrometheusMetrics struct {
	// contains filtered or unexported fields
}

PrometheusMetrics implements core.Metrics using Prometheus (duck typed)

func DefaultPrometheusMetrics added in v1.1.3

func DefaultPrometheusMetrics(addr string) *PrometheusMetrics

DefaultPrometheusMetrics creates a new prometheus-based metrics collector and optionally starts an HTTP server for scraping at the given address if addr != "".

func (*PrometheusMetrics) RecordEmbeddingCount added in v1.1.3

func (m *PrometheusMetrics) RecordEmbeddingCount(count int)

func (*PrometheusMetrics) RecordIndexingDuration added in v1.1.3

func (m *PrometheusMetrics) RecordIndexingDuration(parser string, duration any)

func (*PrometheusMetrics) RecordIndexingResult added in v1.1.3

func (m *PrometheusMetrics) RecordIndexingResult(parser string, count int)

func (*PrometheusMetrics) RecordLLMTokenUsage added in v1.1.4

func (m *PrometheusMetrics) RecordLLMTokenUsage(model string, prompt int, completion int)

func (*PrometheusMetrics) RecordQueryCount added in v1.1.4

func (m *PrometheusMetrics) RecordQueryCount(engine string)

func (*PrometheusMetrics) RecordRAGEvaluation added in v1.1.4

func (m *PrometheusMetrics) RecordRAGEvaluation(metric string, score float32)

func (*PrometheusMetrics) RecordSearchDuration added in v1.1.3

func (m *PrometheusMetrics) RecordSearchDuration(engine string, duration any)

func (*PrometheusMetrics) RecordSearchError added in v1.1.3

func (m *PrometheusMetrics) RecordSearchError(engine string, err error)

func (*PrometheusMetrics) RecordSearchResult added in v1.1.3

func (m *PrometheusMetrics) RecordSearchResult(engine string, count int)

func (*PrometheusMetrics) RecordVectorStoreOperations added in v1.1.3

func (m *PrometheusMetrics) RecordVectorStoreOperations(op string, count int)

type Span

type Span interface {
	// SetTag sets a key-value tag on the span.
	// Tags are used to filter and query traces.
	//
	// Parameters:
	//   - key: Tag name
	//   - value: Tag value (can be any type)
	//
	// Example:
	//
	//	span.SetTag("user_id", 123)
	//	span.SetTag("operation", "search")
	SetTag(key string, value interface{})

	// LogEvent logs an event within the span.
	// Events represent point-in-time occurrences during the span's lifetime.
	//
	// Parameters:
	//   - eventName: Name of the event
	//   - fields: Additional structured data about the event
	//
	// Example:
	//
	//	span.LogEvent("cache_hit", map[string]interface{}{
	//	    "key": "query_123",
	//	    "latency_ms": 5,
	//	})
	LogEvent(eventName string, fields map[string]interface{})

	// End completes the span.
	// This should be called when the operation is finished.
	// After End() is called, the span should not be modified.
	End()
}

Span represents a single operation within a distributed trace. Spans can have tags and events attached for debugging and analysis.

type Tracer

type Tracer interface {
	// StartSpan starts a new span within a trace.
	// A span represents a unit of work in the system.
	//
	// Parameters:
	//   - ctx: Parent context (may contain existing trace information)
	//   - operationName: Name of the operation being traced
	//
	// Returns:
	//   - context.Context: New context containing the span
	//   - Span: The created span that should be ended when complete
	//
	// Example:
	//
	//	ctx, span := tracer.StartSpan(ctx, "vector_search")
	//	defer span.End()
	//	// ... perform search ...
	StartSpan(ctx context.Context, operationName string) (context.Context, Span)

	// GetSpan retrieves the current span from context.
	// Returns a no-op span if no span is found in the context.
	//
	// Parameters:
	//   - ctx: Context that may contain span information
	//
	// Returns:
	//   - Span: The current span or a no-op span
	GetSpan(ctx context.Context) Span
}

Tracer defines the interface for distributed tracing. Distributed tracing helps track requests across multiple services and components, making it easier to debug performance issues and understand system behavior.

Implementations should integrate with tracing backends like:

  • OpenTelemetry
  • Jaeger
  • Zipkin

func DefaultNoopTracer added in v1.1.3

func DefaultNoopTracer() Tracer

DefaultNoopTracer creates a no-op tracer that discards all trace data. Use this when distributed tracing is not needed or during testing.

Returns:

  • Tracer: A tracer that creates no-op spans

func DefaultOpenTelemetryTracer added in v1.1.3

func DefaultOpenTelemetryTracer(ctx context.Context, endpoint string, serviceName string) (Tracer, error)

DefaultOpenTelemetryTracer creates a new OTel tracer.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL