Documentation
¶
Overview ¶
Package knowledge is the unified read path for platform knowledge (#632).
The platform holds knowledge in several stores that an agent must today search separately (captured memory, reviewed insights, managed assets, and later the technical catalog and prompts). Each store has its own tool, its own scope rules, and its own relevance scoring, so the agent pays a discovery tax to find anything and usually declines to pay it.
This package collapses those stores behind one Provider interface and a Router that fans a single query across every registered provider, normalizes each provider's local relevance score onto a common scale, fuses the results into one ranked list, and enforces per-user scope so a shared search can never surface one user's private records to another.
The same Router is exposed two ways from one code path: as the search agent tool (pull), and later as a retriever wired into the enrichment middleware (push). PR1 (#632) builds the pull path with the memory, insights, and assets providers; the technical catalog (datahub) and prompt providers, and push injection, land in follow-up PRs.
Index ¶
- Constants
- type AssetsProvider
- type Caller
- type ConnectionInfo
- type ConnectionLister
- type ConnectionsProvider
- type DatahubProvider
- type EndpointCandidate
- type EndpointSearcher
- type EndpointsProvider
- type Hit
- type InsightsProvider
- type LineageExpander
- type MemoryProvider
- type PromptsProvider
- type Provider
- type Query
- type Result
- type Router
- type Scope
- type SourceCoverage
- type SourceGroup
Constants ¶
const SourceAssets = "assets"
SourceAssets is the provenance label for asset-provider hits.
const SourceConnections = "connections"
SourceConnections is the provenance label for connection hits.
const SourceDatahub = "datahub"
SourceDatahub is the provenance label for technical-catalog hits.
const SourceEndpoints = "endpoints"
SourceEndpoints is the provenance label for API-endpoint hits.
const SourceInsights = "insights"
SourceInsights is the provenance label for insight-provider hits.
const SourceMemory = "memory"
SourceMemory is the provenance label for memory-provider hits.
const SourcePrompts = "prompts"
SourcePrompts is the provenance label for prompt hits.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type AssetsProvider ¶
type AssetsProvider struct {
// contains filtered or unexported fields
}
AssetsProvider exposes a caller's managed assets (saved artifacts) to the router. It is per-user: results are restricted to assets the caller owns (assets.owner_id == caller UUID), which is why it keys on Caller.UserID rather than the email the memory and insight providers use.
func NewAssetsProvider ¶
func NewAssetsProvider(searcher assetSearcher) *AssetsProvider
NewAssetsProvider builds the assets provider over an asset searcher.
func (*AssetsProvider) Name ¶
func (*AssetsProvider) Name() string
Name returns the provenance label.
func (*AssetsProvider) Scope ¶
func (*AssetsProvider) Scope() Scope
Scope marks this provider per-user; the router supplies the caller identity and must skip it when that identity is absent.
type Caller ¶
type Caller struct {
// UserID is the caller's UUID identity, the owner key for assets.
UserID string
// Email is the caller's canonical identity, the owner key for memory
// and insights.
Email string
// Persona is the caller's resolved persona. It scopes entity-keyed memory
// lookups and selects which persona-scoped prompts are visible. It is not a
// security boundary on its own (per-user records are scoped by Email/UserID);
// it narrows persona-targeted content.
Persona string
}
Caller is the resolved identity of the requester. Per-user providers scope on it, and they do not all key on the same field: captured memory and insights are owned by email (memory_records.created_by), while managed assets are owned by the user's UUID (assets.owner_id). Both fields travel on every request so each provider selects the one it scopes on; a provider whose key is empty must return no results rather than query unscoped.
type ConnectionInfo ¶
ConnectionInfo is one configured data connection, reduced to the fields a relevance search needs. The knowledge package defines it (rather than importing the platform's connection types) so the federation engine stays decoupled; the platform adapts its connection registry to ConnectionLister.
type ConnectionLister ¶
type ConnectionLister interface {
Connections() []ConnectionInfo
}
ConnectionLister enumerates the deployment's configured connections. The platform implements it over the same toolkit registry list_connections uses, so the search corpus and the connections tool stay in agreement.
type ConnectionsProvider ¶
type ConnectionsProvider struct {
// contains filtered or unexported fields
}
ConnectionsProvider exposes configured connections to the router as a relevance search. Connections are in the default corpus by design (#645): an agent should discover that, say, a "stripe" or "warehouse" connection exists from one search, rather than having to know to call list_connections first. list_connections stays the full enumeration; this surfaces the connections relevant to a query.
It is shared: connection metadata (name, kind, description) is already globally visible through list_connections, so there is no per-user record to scope here. The security-sensitive boundary is which connections a persona may use, and that is enforced fail-closed at tool-call time on the scoped tools (trino_query, api_invoke_endpoint, ...), unchanged by this provider.
func NewConnectionsProvider ¶
func NewConnectionsProvider(lister ConnectionLister) *ConnectionsProvider
NewConnectionsProvider builds the connections provider over a lister.
func (*ConnectionsProvider) Name ¶
func (*ConnectionsProvider) Name() string
Name returns the provenance label.
func (*ConnectionsProvider) Scope ¶
func (*ConnectionsProvider) Scope() Scope
Scope marks connections shared: their metadata is already global via list_connections.
func (*ConnectionsProvider) Search ¶
Search returns connections whose name, kind, or description match the intent, ranked by a lexical token-overlap score. Connections carry no embeddings, so ranking is lexical; the score still feeds the allocator's per-source normalization. It responds to the text path only.
type DatahubProvider ¶
type DatahubProvider struct {
// contains filtered or unexported fields
}
DatahubProvider exposes the technical catalog (DataHub) to the router as a relevance search. It is shared: the catalog is global, so it is queried for every request and needs no caller identity. This folds datahub_search's relevance role into search; structured catalog navigation (platform/domain/tag/entity-type filters) stays in datahub_browse.
DataHub ranks results but does not return a numeric score, so the provider derives a descending positional score from the result order; the router's per-provider normalization then places these on the common scale.
func NewDatahubProvider ¶
func NewDatahubProvider(searcher tableSearcher) *DatahubProvider
NewDatahubProvider builds the datahub provider over a catalog searcher.
func (*DatahubProvider) Name ¶
func (*DatahubProvider) Name() string
Name returns the provenance label.
func (*DatahubProvider) Scope ¶
func (*DatahubProvider) Scope() Scope
Scope marks the catalog shared (global, always queried).
type EndpointCandidate ¶
type EndpointCandidate struct {
Connection string
OperationID string
Method string
Path string
Summary string
Spec string
Score float64
}
EndpointCandidate is one API operation matched by an endpoint searcher, already ranked and scoped within its API gateway connection. The knowledge package defines it (rather than importing the apigateway concrete) so the federation engine stays decoupled from any one toolkit; the platform adapts each apigateway toolkit to EndpointSearcher.
type EndpointSearcher ¶
type EndpointSearcher interface {
SearchEndpoints(ctx context.Context, intent string, limit int) ([]EndpointCandidate, error)
}
EndpointSearcher ranks API operations across the connections of one API gateway toolkit, applying that toolkit's per-connection route policy so a caller never sees an operation their persona could not invoke. The platform wires one EndpointSearcher per apigateway toolkit.
type EndpointsProvider ¶
type EndpointsProvider struct {
// contains filtered or unexported fields
}
EndpointsProvider exposes API endpoints to the router as a relevance search, aggregated across every API gateway toolkit. API endpoints are in the default corpus by design (#645): an agent searching "customer retention" should see a relevant operation next to the dataset and the insight without first having to know an API gateway exists, list connections, and search each one. api_list_endpoints stays the scoped drill-down, the way datahub_browse is the scoped counterpart to catalog search.
It is shared: endpoints are global to the deployment, and each searcher enforces its own per-connection route policy fail-closed, so the provider needs no caller identity of its own.
func NewEndpointsProvider ¶
func NewEndpointsProvider(searchers ...EndpointSearcher) *EndpointsProvider
NewEndpointsProvider builds the endpoints provider over one or more endpoint searchers (one per API gateway toolkit).
func (*EndpointsProvider) Name ¶
func (*EndpointsProvider) Name() string
Name returns the provenance label.
func (*EndpointsProvider) Scope ¶
func (*EndpointsProvider) Scope() Scope
Scope marks endpoints shared (always queried); each searcher self-filters operations to those the caller's persona may invoke.
func (*EndpointsProvider) Search ¶
Search returns API operations relevant to the intent, aggregated across every configured API gateway. It responds to the text path only; a query with no intent yields nothing. A single searcher erroring is logged and skipped so one unhealthy gateway does not blank the endpoints group.
type Hit ¶
type Hit struct {
Text string `json:"text"`
Source string `json:"source"`
Ref string `json:"ref"`
Score float64 `json:"score"`
Status string `json:"status,omitempty"`
EntityURNs []string `json:"entity_urns,omitempty"`
Dimension string `json:"dimension,omitempty"`
}
Hit is one knowledge record matched by a provider. Score is the provider's own relevance score; the Router normalizes it across providers before fusing, so callers see a fused rank, not the raw provider score. Source is the provider name, surfaced as provenance. Ref is the record's stable identifier within its source (memory id, insight id, asset id) so a caller can fetch the full record.
The optional fields carry what the specialized search tools returned, so folding them into one search loses nothing: Status is a review or lifecycle state (insight pending/approved/...), EntityURNs are the linked catalog entities (provenance), and Dimension is the memory dimension or category. They are omitted when a source does not populate them.
Temporal validity (valid_from/valid_until) and a live-vs-captured freshness flag remain deferred until a provider populates them (the wiki carries season windows); adding them now would be unexercised fields.
type InsightsProvider ¶
type InsightsProvider struct {
// contains filtered or unexported fields
}
InsightsProvider exposes captured domain knowledge (insights) to the router.
Insights are knowledge-dimension memory rows owned by the caller (insight.captured_by == caller email). The underlying searcher scopes to that owner and to the knowledge dimension, so this provider covers exactly the records the MemoryProvider skips.
Scope note (#632): the epic envisions reviewed insights becoming shared across callers. The current store has no review-state-aware sharing, and searching it without an owner would expose every user's personal insights, so PR1 keeps this provider per-user. Promoting reviewed insights to ScopeShared is deferred to the write-path/review work (#633).
func NewInsightsProvider ¶
func NewInsightsProvider(searcher insightSearcher) *InsightsProvider
NewInsightsProvider builds the insights provider over an insight searcher.
func (*InsightsProvider) Name ¶
func (*InsightsProvider) Name() string
Name returns the provenance label.
func (*InsightsProvider) Scope ¶
func (*InsightsProvider) Scope() Scope
Scope marks this provider per-user; see the type doc for why reviewed-insight sharing is deferred.
func (*InsightsProvider) Search ¶
Search returns the caller's captured insights ranked by relevance to the intent, optionally filtered by review status. Each hit carries the insight's review status and linked entity URNs as provenance. It responds to the text (Intent) path only; entity-keyed lookup is served by the memory provider, so a query with no intent yields nothing here. It fails closed on a missing caller email rather than searching across all users.
type LineageExpander ¶
LineageExpander optionally widens a set of entity URNs along lineage so an entity-keyed lookup also recalls knowledge about upstream and downstream datasets (the old memory_recall "graph" strategy). Implemented by an adapter over the semantic provider; a nil expander disables expansion, leaving a plain entity lookup.
type MemoryProvider ¶
type MemoryProvider struct {
// contains filtered or unexported fields
}
MemoryProvider exposes a caller's personal memory to the knowledge router.
It is per-user: results are restricted to records the caller owns (memory_records.created_by == caller email), the same identity the portal's "my knowledge" search scopes on. It serves two query shapes: relevance search on Intent, and an exact entity-keyed lookup on EntityURNs (optionally widened along lineage when a LineageExpander is wired).
It deliberately omits the knowledge dimension on both paths. Captured insights and remembered knowledge are knowledge-dimension memory rows owned by the InsightsProvider; surfacing them here too would double-list the same record. This provider covers the caller's non-knowledge memory (preferences, events, entities, relationships).
func NewMemoryProvider ¶
func NewMemoryProvider(store memorySearcher, lineage LineageExpander) *MemoryProvider
NewMemoryProvider builds the memory provider over a memory store. lineage is optional; when nil, entity lookups are not expanded along lineage.
func (*MemoryProvider) Name ¶
func (*MemoryProvider) Name() string
Name returns the provenance label.
func (*MemoryProvider) Scope ¶
func (*MemoryProvider) Scope() Scope
Scope marks this provider per-user; the router supplies the caller identity and must skip it when that identity is absent.
func (*MemoryProvider) Search ¶
Search returns the caller's active, non-knowledge memory. When EntityURNs are given it does an exact entity lookup (lineage-expanded when configured); when Intent is given it ranks by relevance (hybrid with an embedding, lexical otherwise). Results from both paths are merged and de-duplicated by record id. It fails closed: an empty caller email yields no results rather than an unscoped search across all users.
type PromptsProvider ¶
type PromptsProvider struct {
// contains filtered or unexported fields
}
PromptsProvider exposes operational prompts to the router. Prompt visibility is mixed: global prompts are visible to everyone, while persona- and personal-scoped prompts are visible only to the matching caller. The underlying searcher enforces that visibility from the caller identity, so the provider is shared (always queried, returning at least the global prompts even for an anonymous caller) yet never leaks another caller's personal prompts.
func NewPromptsProvider ¶
func NewPromptsProvider(searcher promptSearcher) *PromptsProvider
NewPromptsProvider builds the prompts provider over a prompt searcher.
func (*PromptsProvider) Name ¶
func (*PromptsProvider) Name() string
Name returns the provenance label.
func (*PromptsProvider) Scope ¶
func (*PromptsProvider) Scope() Scope
Scope marks prompts shared (always queried); the searcher self-filters persona/personal prompts to the caller.
type Provider ¶
type Provider interface {
Name() string
Scope() Scope
Search(ctx context.Context, q Query) ([]Hit, error)
}
Provider is one searchable knowledge store behind the Router. Name is the provenance label stamped on every Hit. Scope drives the Router's access rules. Search returns the provider's own ranked hits for the query; the Router owns cross-provider normalization and fusion, so a provider only needs to rank within itself.
type Query ¶
type Query struct {
Intent string
Embedding []float32
EntityURNs []string
Status string
Caller Caller
Limit int
Sources []string
}
Query is one knowledge search. It carries two complementary ways to match, and a provider uses whichever it supports:
- Intent is natural-language text matched by relevance. Embedding is the query vector the Router computes once from Intent and shares across providers; nil selects lexical-only ranking.
- EntityURNs is an exact, entity-keyed lookup: return knowledge linked to these DataHub URNs (memory uses this, optionally expanded along lineage).
At least one of Intent or EntityURNs is set. Status optionally filters by lifecycle/review state where a provider tracks one (insight review status). Caller carries the identity per-user providers scope on. Limit caps the candidate list each provider returns before the allocator builds the balanced display set.
Sources optionally narrows the federation to a subset of provider names (e.g. ["datahub"]). It only narrows: an empty Sources queries every provider the caller can access, and a name in Sources never opts a caller into a provider their scope would otherwise exclude.
type Result ¶
type Result struct {
Groups []SourceGroup
Coverage []SourceCoverage
Ranking string
}
Result is one knowledge search response: the balanced, grouped-by-source display set, the coverage summary (per-source matched vs shown counts so the agent sees breadth beyond what is displayed), and the ranking mode used to produce it.
type Router ¶
type Router struct {
// contains filtered or unexported fields
}
Router fans one query across every registered provider, normalizes each provider's local relevance scores onto a common scale, fuses them into one ranked list, and enforces per-user scope. It is the single read path behind both the search tool and (later) push injection, so the scope and fusion rules live here once rather than in each surface.
func NewRouter ¶
NewRouter builds a router over an embedder and a set of providers. The embedder may be nil or the noop placeholder; the router then ranks lexically. Provider order does not affect ranking (scores are fused), only the deterministic tie-break.
func (*Router) Providers ¶
Providers returns the registered providers, for introspection and wiring checks.
func (*Router) Search ¶
Search runs one knowledge search from a caller-built Query. It embeds the intent once (when present) and shares the vector across providers, queries every shared provider plus every per-user provider for which the caller carries an identity, fuses the results, and trims to limit. The query may be text-based (Intent), entity-keyed (EntityURNs), or both; each provider uses the parts it supports.
Provider failures are tolerated: a single provider erroring is logged and its results omitted, so one unhealthy store does not blank the whole search. An error is returned only when every queried provider failed, so an all-stores- down condition is not reported as an empty-but-successful result.
type Scope ¶
type Scope int
Scope declares whether a provider's records are visible to every caller or only to the caller who owns them. The Router uses it to decide which providers a request may touch and with what identity.
const ( // without a caller identity, because it can always return at least some // content visible to everyone (the technical catalog, global prompts). A // shared provider may still use the caller identity to widen what it // returns (a prompt provider adds the caller's persona/personal prompts to // the global ones); "shared" means "always queried", not "ignores the // caller". It must never return another caller's private records. ScopeShared Scope = iota // ScopePerUser marks a provider whose records belong to individual // callers (personal memory, personal assets). The Router queries a // per-user provider only when the request carries the identity that // provider scopes on, and the provider must restrict results to that // identity. This is the security boundary that keeps one user's private // records out of another user's search. ScopePerUser )
type SourceCoverage ¶
type SourceCoverage struct {
Source string `json:"source"`
Matched int `json:"matched"`
Shown int `json:"shown"`
}
SourceCoverage reports, per source, how many candidates matched the query and how many of those are shown in the grouped result. Matched can exceed Shown when the balanced allocator spent its budget elsewhere; that gap is the anti-tunnel signal that tells the agent where unshown answers live ("14 datasets matched, 3 shown"). Matched is the count of candidates the provider returned for this query, capped at the per-source candidate fetch limit, not a full-corpus count.
type SourceGroup ¶
SourceGroup is the displayed hits for one source, in that source's own relevance order. Grouping by source (rather than one flat relevance list) is the anti-tunnel shape: the agent sees that answers exist across memory, the catalog, endpoints, and prompts at once, instead of a top list one strong source dominates.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package federation adapts the platform's live toolkit registry to the knowledge package's source interfaces, so the universal search router can federate API endpoints and connections without the knowledge engine depending on any concrete toolkit.
|
Package federation adapts the platform's live toolkit registry to the knowledge package's source interfaces, so the universal search router can federate API endpoints and connections without the knowledge engine depending on any concrete toolkit. |