rag

package module
v0.0.0-...-98bc6df Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 17, 2025 License: Apache-2.0 Imports: 13 Imported by: 0

README

go-rag

基于eino实现知识库的rag

存储层

  • es8存储向量相关数据

功能列表

  • md、pdf、html 文档解析
  • 网页解析
  • 文档检索
  • 长文档自动切割(chunk)
  • rerank
  • 提供http接口 rag-api

未来计划

  • 使用mysql存储chunk和文档的映射关系,目前放在es的ext字段

使用

安装依赖

go get github.com/wangle201210/go-rag@latest

安装es8

docker run -d --name elasticsearch \
  -e "discovery.type=single-node" \
  -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
  -e "xpack.security.enabled=false" \
  -p 9200:9200 \
  -p 9300:9300 \
  elasticsearch:8.18.0
  
  
  docker run -d --name elasticsearch \
   -p 9200:9200 \
   -p 9300:9300 \
   -e "discovery.type=single-node" \
   elasticsearch:8.18.0

安装mysql

docker run -p 3306:3306 --name mysql \
    -v /Users/wanna/docker/mysql/log:/var/log/mysql \
    -v /Users/wanna/docker/mysql/data:/var/lib/mysql \
    --restart=always \
    -e MYSQL_ROOT_PASSWORD=123456 \
    -d mysql:8.0

初始化Rag对象

    client, err := elasticsearch.NewClient(elasticsearch.Config{
		Addresses: []string{"http://localhost:9200"},
	})
	if err != nil {
		log.Printf("NewClient of es8 failed, err=%v", err)
		return
	}
	ragSvr, err = New(context.Background(), &config.Config{
		Client:    client,
		IndexName: "rag",
		APIKey:    os.Getenv("OPENAI_API_KEY"),
		BaseURL:   os.Getenv("OPENAI_BASE_URL"),
		Model:     "text-embedding-3-large",
	})
	if err != nil {
		log.Printf("New of rag failed, err=%v", err)
		return
	}

加载各种数据源的数据,并将其向量化后存储进向量数据库。

func TestIndex(t *testing.T) {
	ctx := context.Background()
	uriList := []string{
		"./test_file/readme.md",
		"./test_file/readme2.md",
		"./test_file/readme.html",
		"./test_file/test.pdf",
		"https://deepchat.thinkinai.xyz/docs/guide/advanced-features/shortcuts.html",
	}
	for _, s := range uriList {
		req := &IndexReq{
			URI:           s,
			KnowledgeName: "wanna",
		}
		ids, err := ragSvr.Index(ctx, req)
		if err != nil {
			t.Fatal(err)
		}
		for _, id := range ids {
			t.Log(id)
		}
	}
}

检索

func TestRetriever(t *testing.T) {
	ctx := context.Background()
	req := &RetrieveReq{
		Query:         "这里有很多内容",
		TopK:          5,
		Score:         1.2,
		KnowledgeName: "wanna",
	}
	msg, err := ragSvr.Retrieve(ctx, req)
	if err != nil {
		t.Fatal(err)
	}
	for _, m := range msg {
		t.Logf("content: %v, score: %v", m.Content, m.Score())
	}
}

详情可以参照test文件

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type IndexReq

type IndexReq struct {
	URI           string // 文档地址,可以是文件路径(pdf,html,md等),也可以是网址
	KnowledgeName string // 知识库名称
}

type Rag

type Rag struct {
	// contains filtered or unexported fields
}

func New

func New(ctx context.Context, conf *config.Config) (*Rag, error)

func (*Rag) Index

func (x *Rag) Index(ctx context.Context, req *IndexReq) (ids []string, err error)

Index uri: ids: 文档id

func (*Rag) Retrieve

func (x *Rag) Retrieve(ctx context.Context, req *RetrieveReq) (msg []*schema.Document, err error)

Retrieve 检索

type RetrieveReq

type RetrieveReq struct {
	Query         string  // 检索关键词
	TopK          int     // 检索结果数量
	Score         float64 //  分数阀值(0-2, 0 完全相反,1 毫不相干,2 完全相同,一般需要传入一个大于1的数字,如1.5)
	KnowledgeName string  // 知识库名字
}

Directories

Path Synopsis
server module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL