knowledge

package

v1.87.0 Latest Latest Go to latest Published: Jun 22, 2026 License: Apache-2.0 Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/txn2/mcp-data-platform

Links

Documentation ¶

Overview ¶

Package knowledge is the unified read path for platform knowledge (#632).

The platform holds knowledge in several stores that an agent must today search separately (captured memory, reviewed insights, managed assets, and later the technical catalog and prompts). Each store has its own tool, its own scope rules, and its own relevance scoring, so the agent pays a discovery tax to find anything and usually declines to pay it.

This package collapses those stores behind one Provider interface and a Router that fans a single query across every registered provider, normalizes each provider's local relevance score onto a common scale, fuses the results into one ranked list, and enforces per-user scope so a shared search can never surface one user's private records to another.

The same Router is exposed two ways from one code path: as the search agent tool (pull), and later as a retriever wired into the enrichment middleware (push). PR1 (#632) builds the pull path with the memory, insights, and assets providers; the technical catalog (datahub) and prompt providers, and push injection, land in follow-up PRs.

Index ¶

Constants
type AssetsProvider
- func NewAssetsProvider(searcher assetSearcher) *AssetsProvider
- func (*AssetsProvider) Name() string
- func (*AssetsProvider) Scope() Scope
- func (p *AssetsProvider) Search(ctx context.Context, q Query) ([]Hit, error)
type Caller
- func (c Caller) Anonymous() bool
type ConnectionInfo
type ConnectionLister
type ConnectionsProvider
- func NewConnectionsProvider(lister ConnectionLister) *ConnectionsProvider
- func (*ConnectionsProvider) Name() string
- func (*ConnectionsProvider) Scope() Scope
- func (p *ConnectionsProvider) Search(_ context.Context, q Query) ([]Hit, error)
type DatahubProvider
- func NewDatahubProvider(searcher tableSearcher) *DatahubProvider
- func (*DatahubProvider) Name() string
- func (*DatahubProvider) Scope() Scope
- func (p *DatahubProvider) Search(ctx context.Context, q Query) ([]Hit, error)
type EndpointCandidate
type EndpointSearcher
type EndpointsProvider
- func NewEndpointsProvider(searchers ...EndpointSearcher) *EndpointsProvider
- func (*EndpointsProvider) Name() string
- func (*EndpointsProvider) Scope() Scope
- func (p *EndpointsProvider) Search(ctx context.Context, q Query) ([]Hit, error)
type Hit
type InsightsProvider
- func NewInsightsProvider(searcher insightSearcher) *InsightsProvider
- func (*InsightsProvider) Name() string
- func (*InsightsProvider) Scope() Scope
- func (p *InsightsProvider) Search(ctx context.Context, q Query) ([]Hit, error)
type LineageExpander
type MemoryProvider
- func NewMemoryProvider(store memorySearcher, lineage LineageExpander) *MemoryProvider
- func (*MemoryProvider) Name() string
- func (*MemoryProvider) Scope() Scope
- func (p *MemoryProvider) Search(ctx context.Context, q Query) ([]Hit, error)
type PromptsProvider
- func NewPromptsProvider(searcher promptSearcher) *PromptsProvider
- func (*PromptsProvider) Name() string
- func (*PromptsProvider) Scope() Scope
- func (p *PromptsProvider) Search(ctx context.Context, q Query) ([]Hit, error)
type Provider
type Query
type Result
type Router
- func NewRouter(embedder embedding.Provider, providers ...Provider) *Router
- func (r *Router) Providers() []Provider
- func (r *Router) Search(ctx context.Context, q Query) (Result, error)
type Scope
- func (s Scope) String() string
type SourceCoverage
type SourceGroup

Constants ¶

View Source

const SourceAssets = "assets"

SourceAssets is the provenance label for asset-provider hits.

View Source

const SourceConnections = "connections"

SourceConnections is the provenance label for connection hits.

View Source

const SourceDatahub = "datahub"

SourceDatahub is the provenance label for technical-catalog hits.

View Source

const SourceEndpoints = "endpoints"

SourceEndpoints is the provenance label for API-endpoint hits.

View Source

const SourceInsights = "insights"

SourceInsights is the provenance label for insight-provider hits.

View Source

const SourceMemory = "memory"

SourceMemory is the provenance label for memory-provider hits.

View Source

const SourcePrompts = "prompts"

SourcePrompts is the provenance label for prompt hits.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type AssetsProvider ¶

type AssetsProvider struct {
	// contains filtered or unexported fields
}

AssetsProvider exposes a caller's managed assets (saved artifacts) to the router. It is per-user: results are restricted to assets the caller owns (assets.owner_id == caller UUID), which is why it keys on Caller.UserID rather than the email the memory and insight providers use.

func NewAssetsProvider ¶

func NewAssetsProvider(searcher assetSearcher) *AssetsProvider

NewAssetsProvider builds the assets provider over an asset searcher.

func (*AssetsProvider) Name ¶

func (*AssetsProvider) Name() string

Name returns the provenance label.

func (*AssetsProvider) Scope ¶

func (*AssetsProvider) Scope() Scope

Scope marks this provider per-user; the router supplies the caller identity and must skip it when that identity is absent.

func (*AssetsProvider) Search ¶

func (p *AssetsProvider) Search(ctx context.Context, q Query) ([]Hit, error)

Search returns the caller's assets ranked by relevance. It fails closed on a missing caller UUID rather than searching across all owners.

type Caller ¶

type Caller struct {
	// UserID is the caller's UUID identity, the owner key for assets.
	UserID string
	// Email is the caller's canonical identity, the owner key for memory
	// and insights.
	Email string
	// Persona is the caller's resolved persona. It scopes entity-keyed memory
	// lookups and selects which persona-scoped prompts are visible. It is not a
	// security boundary on its own (per-user records are scoped by Email/UserID);
	// it narrows persona-targeted content.
	Persona string
}

Caller is the resolved identity of the requester. Per-user providers scope on it, and they do not all key on the same field: captured memory and insights are owned by email (memory_records.created_by), while managed assets are owned by the user's UUID (assets.owner_id). Both fields travel on every request so each provider selects the one it scopes on; a provider whose key is empty must return no results rather than query unscoped.

func (Caller) Anonymous ¶

func (c Caller) Anonymous() bool

Anonymous reports whether the caller carries no identity at all. The Router skips every per-user provider for an anonymous caller.

type ConnectionInfo ¶

type ConnectionInfo struct {
	Name        string
	Kind        string
	Description string
}

ConnectionInfo is one configured data connection, reduced to the fields a relevance search needs. The knowledge package defines it (rather than importing the platform's connection types) so the federation engine stays decoupled; the platform adapts its connection registry to ConnectionLister.

type ConnectionLister ¶

type ConnectionLister interface {
	Connections() []ConnectionInfo
}

ConnectionLister enumerates the deployment's configured connections. The platform implements it over the same toolkit registry list_connections uses, so the search corpus and the connections tool stay in agreement.

type ConnectionsProvider ¶

type ConnectionsProvider struct {
	// contains filtered or unexported fields
}

ConnectionsProvider exposes configured connections to the router as a relevance search. Connections are in the default corpus by design (#645): an agent should discover that, say, a "stripe" or "warehouse" connection exists from one search, rather than having to know to call list_connections first. list_connections stays the full enumeration; this surfaces the connections relevant to a query.

It is shared: connection metadata (name, kind, description) is already globally visible through list_connections, so there is no per-user record to scope here. The security-sensitive boundary is which connections a persona may use, and that is enforced fail-closed at tool-call time on the scoped tools (trino_query, api_invoke_endpoint, ...), unchanged by this provider.

func NewConnectionsProvider ¶

func NewConnectionsProvider(lister ConnectionLister) *ConnectionsProvider

NewConnectionsProvider builds the connections provider over a lister.

func (*ConnectionsProvider) Name ¶

func (*ConnectionsProvider) Name() string

Name returns the provenance label.

func (*ConnectionsProvider) Scope ¶

func (*ConnectionsProvider) Scope() Scope

Scope marks connections shared: their metadata is already global via list_connections.

func (*ConnectionsProvider) Search ¶

func (p *ConnectionsProvider) Search(_ context.Context, q Query) ([]Hit, error)

Search returns connections whose name, kind, or description match the intent, ranked by a lexical token-overlap score. Connections carry no embeddings, so ranking is lexical; the score still feeds the allocator's per-source normalization. It responds to the text path only.

type DatahubProvider ¶

type DatahubProvider struct {
	// contains filtered or unexported fields
}

DatahubProvider exposes the technical catalog (DataHub) to the router as a relevance search. It is shared: the catalog is global, so it is queried for every request and needs no caller identity. This folds datahub_search's relevance role into search; structured catalog navigation (platform/domain/tag/entity-type filters) stays in datahub_browse.

DataHub ranks results but does not return a numeric score, so the provider derives a descending positional score from the result order; the router's per-provider normalization then places these on the common scale.

func NewDatahubProvider ¶

func NewDatahubProvider(searcher tableSearcher) *DatahubProvider

NewDatahubProvider builds the datahub provider over a catalog searcher.

func (*DatahubProvider) Name ¶

func (*DatahubProvider) Name() string

Name returns the provenance label.

func (*DatahubProvider) Scope ¶

func (*DatahubProvider) Scope() Scope

Scope marks the catalog shared (global, always queried).

func (*DatahubProvider) Search ¶

func (p *DatahubProvider) Search(ctx context.Context, q Query) ([]Hit, error)

Search returns catalog entities relevant to the intent. It responds to the text path only; a query with no intent yields nothing.

type EndpointCandidate ¶

type EndpointCandidate struct {
	Connection  string
	OperationID string
	Method      string
	Path        string
	Summary     string
	Spec        string
	Score       float64
}

EndpointCandidate is one API operation matched by an endpoint searcher, already ranked and scoped within its API gateway connection. The knowledge package defines it (rather than importing the apigateway concrete) so the federation engine stays decoupled from any one toolkit; the platform adapts each apigateway toolkit to EndpointSearcher.

type EndpointSearcher ¶

type EndpointSearcher interface {
	SearchEndpoints(ctx context.Context, intent string, limit int) ([]EndpointCandidate, error)
}

EndpointSearcher ranks API operations across the connections of one API gateway toolkit, applying that toolkit's per-connection route policy so a caller never sees an operation their persona could not invoke. The platform wires one EndpointSearcher per apigateway toolkit.

type EndpointsProvider ¶

type EndpointsProvider struct {
	// contains filtered or unexported fields
}

EndpointsProvider exposes API endpoints to the router as a relevance search, aggregated across every API gateway toolkit. API endpoints are in the default corpus by design (#645): an agent searching "customer retention" should see a relevant operation next to the dataset and the insight without first having to know an API gateway exists, list connections, and search each one. api_list_endpoints stays the scoped drill-down, the way datahub_browse is the scoped counterpart to catalog search.

It is shared: endpoints are global to the deployment, and each searcher enforces its own per-connection route policy fail-closed, so the provider needs no caller identity of its own.

func NewEndpointsProvider ¶

func NewEndpointsProvider(searchers ...EndpointSearcher) *EndpointsProvider

NewEndpointsProvider builds the endpoints provider over one or more endpoint searchers (one per API gateway toolkit).

func (*EndpointsProvider) Name ¶

func (*EndpointsProvider) Name() string

Name returns the provenance label.

func (*EndpointsProvider) Scope ¶

func (*EndpointsProvider) Scope() Scope

Scope marks endpoints shared (always queried); each searcher self-filters operations to those the caller's persona may invoke.

func (*EndpointsProvider) Search ¶

func (p *EndpointsProvider) Search(ctx context.Context, q Query) ([]Hit, error)

Search returns API operations relevant to the intent, aggregated across every configured API gateway. It responds to the text path only; a query with no intent yields nothing. A single searcher erroring is logged and skipped so one unhealthy gateway does not blank the endpoints group.

type Hit ¶

type Hit struct {
	Text       string   `json:"text"`
	Source     string   `json:"source"`
	Ref        string   `json:"ref"`
	Score      float64  `json:"score"`
	Status     string   `json:"status,omitempty"`
	EntityURNs []string `json:"entity_urns,omitempty"`
	Dimension  string   `json:"dimension,omitempty"`
}

Hit is one knowledge record matched by a provider. Score is the provider's own relevance score; the Router normalizes it across providers before fusing, so callers see a fused rank, not the raw provider score. Source is the provider name, surfaced as provenance. Ref is the record's stable identifier within its source (memory id, insight id, asset id) so a caller can fetch the full record.

The optional fields carry what the specialized search tools returned, so folding them into one search loses nothing: Status is a review or lifecycle state (insight pending/approved/...), EntityURNs are the linked catalog entities (provenance), and Dimension is the memory dimension or category. They are omitted when a source does not populate them.

Temporal validity (valid_from/valid_until) and a live-vs-captured freshness flag remain deferred until a provider populates them (the wiki carries season windows); adding them now would be unexercised fields.

type InsightsProvider ¶

type InsightsProvider struct {
	// contains filtered or unexported fields
}

InsightsProvider exposes captured domain knowledge (insights) to the router.

Insights are knowledge-dimension memory rows owned by the caller (insight.captured_by == caller email). The underlying searcher scopes to that owner and to the knowledge dimension, so this provider covers exactly the records the MemoryProvider skips.

Scope note (#632): the epic envisions reviewed insights becoming shared across callers. The current store has no review-state-aware sharing, and searching it without an owner would expose every user's personal insights, so PR1 keeps this provider per-user. Promoting reviewed insights to ScopeShared is deferred to the write-path/review work (#633).

func NewInsightsProvider ¶

func NewInsightsProvider(searcher insightSearcher) *InsightsProvider

NewInsightsProvider builds the insights provider over an insight searcher.

func (*InsightsProvider) Name ¶

func (*InsightsProvider) Name() string

Name returns the provenance label.

func (*InsightsProvider) Scope ¶

func (*InsightsProvider) Scope() Scope

Scope marks this provider per-user; see the type doc for why reviewed-insight sharing is deferred.

func (*InsightsProvider) Search ¶

func (p *InsightsProvider) Search(ctx context.Context, q Query) ([]Hit, error)

Search returns the caller's captured insights ranked by relevance to the intent, optionally filtered by review status. Each hit carries the insight's review status and linked entity URNs as provenance. It responds to the text (Intent) path only; entity-keyed lookup is served by the memory provider, so a query with no intent yields nothing here. It fails closed on a missing caller email rather than searching across all users.

type LineageExpander ¶

type LineageExpander interface {
	Expand(ctx context.Context, urns []string) []string
}

LineageExpander optionally widens a set of entity URNs along lineage so an entity-keyed lookup also recalls knowledge about upstream and downstream datasets (the old memory_recall "graph" strategy). Implemented by an adapter over the semantic provider; a nil expander disables expansion, leaving a plain entity lookup.

type MemoryProvider ¶

type MemoryProvider struct {
	// contains filtered or unexported fields
}

MemoryProvider exposes a caller's personal memory to the knowledge router.

It is per-user: results are restricted to records the caller owns (memory_records.created_by == caller email), the same identity the portal's "my knowledge" search scopes on. It serves two query shapes: relevance search on Intent, and an exact entity-keyed lookup on EntityURNs (optionally widened along lineage when a LineageExpander is wired).

It deliberately omits the knowledge dimension on both paths. Captured insights and remembered knowledge are knowledge-dimension memory rows owned by the InsightsProvider; surfacing them here too would double-list the same record. This provider covers the caller's non-knowledge memory (preferences, events, entities, relationships).

func NewMemoryProvider ¶

func NewMemoryProvider(store memorySearcher, lineage LineageExpander) *MemoryProvider

NewMemoryProvider builds the memory provider over a memory store. lineage is optional; when nil, entity lookups are not expanded along lineage.

func (*MemoryProvider) Name ¶

func (*MemoryProvider) Name() string

Name returns the provenance label.

func (*MemoryProvider) Scope ¶

func (*MemoryProvider) Scope() Scope

Scope marks this provider per-user; the router supplies the caller identity and must skip it when that identity is absent.

func (*MemoryProvider) Search ¶

func (p *MemoryProvider) Search(ctx context.Context, q Query) ([]Hit, error)

Search returns the caller's active, non-knowledge memory. When EntityURNs are given it does an exact entity lookup (lineage-expanded when configured); when Intent is given it ranks by relevance (hybrid with an embedding, lexical otherwise). Results from both paths are merged and de-duplicated by record id. It fails closed: an empty caller email yields no results rather than an unscoped search across all users.

type PromptsProvider ¶

type PromptsProvider struct {
	// contains filtered or unexported fields
}

PromptsProvider exposes operational prompts to the router. Prompt visibility is mixed: global prompts are visible to everyone, while persona- and personal-scoped prompts are visible only to the matching caller. The underlying searcher enforces that visibility from the caller identity, so the provider is shared (always queried, returning at least the global prompts even for an anonymous caller) yet never leaks another caller's personal prompts.

func NewPromptsProvider ¶

func NewPromptsProvider(searcher promptSearcher) *PromptsProvider

NewPromptsProvider builds the prompts provider over a prompt searcher.

func (*PromptsProvider) Name ¶

func (*PromptsProvider) Name() string

Name returns the provenance label.

func (*PromptsProvider) Scope ¶

func (*PromptsProvider) Scope() Scope

Scope marks prompts shared (always queried); the searcher self-filters persona/personal prompts to the caller.

func (*PromptsProvider) Search ¶

func (p *PromptsProvider) Search(ctx context.Context, q Query) ([]Hit, error)

Search returns prompts visible to the caller, ranked by relevance to the intent. It responds to the text path only; a query with no intent yields nothing.

type Provider ¶

type Provider interface {
	Name() string
	Scope() Scope
	Search(ctx context.Context, q Query) ([]Hit, error)
}

Provider is one searchable knowledge store behind the Router. Name is the provenance label stamped on every Hit. Scope drives the Router's access rules. Search returns the provider's own ranked hits for the query; the Router owns cross-provider normalization and fusion, so a provider only needs to rank within itself.

type Query ¶

type Query struct {
	Intent     string
	Embedding  []float32
	EntityURNs []string
	Status     string
	Caller     Caller
	Limit      int
	Sources    []string
}

Query is one knowledge search. It carries two complementary ways to match, and a provider uses whichever it supports:

Intent is natural-language text matched by relevance. Embedding is the query vector the Router computes once from Intent and shares across providers; nil selects lexical-only ranking.
EntityURNs is an exact, entity-keyed lookup: return knowledge linked to these DataHub URNs (memory uses this, optionally expanded along lineage).

At least one of Intent or EntityURNs is set. Status optionally filters by lifecycle/review state where a provider tracks one (insight review status). Caller carries the identity per-user providers scope on. Limit caps the candidate list each provider returns before the allocator builds the balanced display set.

Sources optionally narrows the federation to a subset of provider names (e.g. ["datahub"]). It only narrows: an empty Sources queries every provider the caller can access, and a name in Sources never opts a caller into a provider their scope would otherwise exclude.

type Result ¶

type Result struct {
	Groups   []SourceGroup
	Coverage []SourceCoverage
	Ranking  string
}

Result is one knowledge search response: the balanced, grouped-by-source display set, the coverage summary (per-source matched vs shown counts so the agent sees breadth beyond what is displayed), and the ranking mode used to produce it.

type Router ¶

type Router struct {
	// contains filtered or unexported fields
}

Router fans one query across every registered provider, normalizes each provider's local relevance scores onto a common scale, fuses them into one ranked list, and enforces per-user scope. It is the single read path behind both the search tool and (later) push injection, so the scope and fusion rules live here once rather than in each surface.

func NewRouter ¶

func NewRouter(embedder embedding.Provider, providers ...Provider) *Router

NewRouter builds a router over an embedder and a set of providers. The embedder may be nil or the noop placeholder; the router then ranks lexically. Provider order does not affect ranking (scores are fused), only the deterministic tie-break.

func (*Router) Providers ¶

func (r *Router) Providers() []Provider

Providers returns the registered providers, for introspection and wiring checks.

func (*Router) Search ¶

func (r *Router) Search(ctx context.Context, q Query) (Result, error)

Search runs one knowledge search from a caller-built Query. It embeds the intent once (when present) and shares the vector across providers, queries every shared provider plus every per-user provider for which the caller carries an identity, fuses the results, and trims to limit. The query may be text-based (Intent), entity-keyed (EntityURNs), or both; each provider uses the parts it supports.

Provider failures are tolerated: a single provider erroring is logged and its results omitted, so one unhealthy store does not blank the whole search. An error is returned only when every queried provider failed, so an all-stores- down condition is not reported as an empty-but-successful result.

type Scope ¶

type Scope int

Scope declares whether a provider's records are visible to every caller or only to the caller who owns them. The Router uses it to decide which providers a request may touch and with what identity.

const (
	// ScopeShared marks a provider that is queried for every request, with or
	// without a caller identity, because it can always return at least some
	// content visible to everyone (the technical catalog, global prompts). A
	// shared provider may still use the caller identity to widen what it
	// returns (a prompt provider adds the caller's persona/personal prompts to
	// the global ones); "shared" means "always queried", not "ignores the
	// caller". It must never return another caller's private records.
	ScopeShared Scope = iota

	// ScopePerUser marks a provider whose records belong to individual
	// callers (personal memory, personal assets). The Router queries a
	// per-user provider only when the request carries the identity that
	// provider scopes on, and the provider must restrict results to that
	// identity. This is the security boundary that keeps one user's private
	// records out of another user's search.
	ScopePerUser
)

func (Scope) String ¶

func (s Scope) String() string

String renders a Scope for logs and test failures.

type SourceCoverage ¶

type SourceCoverage struct {
	Source  string `json:"source"`
	Matched int    `json:"matched"`
	Shown   int    `json:"shown"`
}

SourceCoverage reports, per source, how many candidates matched the query and how many of those are shown in the grouped result. Matched can exceed Shown when the balanced allocator spent its budget elsewhere; that gap is the anti-tunnel signal that tells the agent where unshown answers live ("14 datasets matched, 3 shown"). Matched is the count of candidates the provider returned for this query, capped at the per-source candidate fetch limit, not a full-corpus count.

type SourceGroup ¶

type SourceGroup struct {
	Source string `json:"source"`
	Hits   []Hit  `json:"hits"`
}

SourceGroup is the displayed hits for one source, in that source's own relevance order. Grouping by source (rather than one flat relevance list) is the anti-tunnel shape: the agent sees that answers exist across memory, the catalog, endpoints, and prompts at once, instead of a top list one strong source dominates.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
federation Package federation adapts the platform's live toolkit registry to the knowledge package's source interfaces, so the universal search router can federate API endpoints and connections without the knowledge engine depending on any concrete toolkit.	Package federation adapts the platform's live toolkit registry to the knowledge package's source interfaces, so the universal search router can federate API endpoints and connections without the knowledge engine depending on any concrete toolkit.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL