Documentation
¶
Overview ¶
Package graphembedding provides the graph-embedding component for generating entity embeddings.
Package graphembedding provides the graph-embedding component for generating entity embeddings.
Overview ¶
The graph-embedding component watches the ENTITY_STATES KV bucket and generates vector embeddings for entities, storing them in the EMBEDDINGS_CACHE KV bucket. These embeddings enable semantic similarity search and clustering.
Tier ¶
Tier: STATISTICAL (Tier 1) with BM25, SEMANTIC (Tier 2) with HTTP embeddings. Not used in Structural (Tier 0) deployments.
Architecture ¶
graph-embedding is a Tier 1+ component. It is not used in Structural-only deployments but required for semantic search and community detection features.
┌──────────────────┐
ENTITY_STATES ─────►│ │
(KV watch) │ graph-embedding ├──► EMBEDDINGS_CACHE (KV)
│ │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Embedding API │
│ (HTTP/BM25) │
└──────────────────┘
Features ¶
- Entity text extraction from configurable fields
- HTTP embedding API integration (OpenAI-compatible)
- BM25 fallback for offline/lightweight deployments
- Batch processing for efficiency
- Caching with configurable TTL
Configuration ¶
The component is configured via JSON with the following structure:
{
"ports": {
"inputs": [
{"name": "entity_watch", "subject": "ENTITY_STATES", "type": "kv-watch"}
],
"outputs": [
{"name": "embeddings", "subject": "EMBEDDINGS_CACHE", "type": "kv"}
]
},
"embedder_type": "http",
"batch_size": 50,
"cache_ttl": "1h"
}
Port Definitions ¶
Inputs:
- KV watch: ENTITY_STATES - watches for entity state changes
Outputs:
- KV bucket: EMBEDDINGS_CACHE - stores vector embeddings keyed by entity ID
Embedder Types ¶
- http: Uses HTTP API (OpenAI-compatible) for embedding generation
- bm25: Uses BM25 sparse vectors for lightweight deployments
Usage ¶
Register the component with the component registry:
import graphembedding "github.com/c360studio/semstreams/processor/graph-embedding"
func init() {
graphembedding.Register(registry)
}
Dependencies ¶
Upstream:
- graph-ingest: produces ENTITY_STATES that this component watches
Downstream:
- graph-clustering: reads EMBEDDINGS_CACHE for semantic similarity in community detection
- graph-gateway: reads EMBEDDINGS_CACHE for semantic search queries
Package graphembedding provides Prometheus metrics for graph-embedding component.
Package graphembedding query handlers
Index ¶
- func CreateGraphEmbedding(rawConfig json.RawMessage, deps component.Dependencies) (component.Discoverable, error)
- func Register(registry *component.Registry) error
- type Component
- func (c *Component) ConfigSchema() component.ConfigSchema
- func (c *Component) DataFlow() component.FlowMetrics
- func (c *Component) Health() component.HealthStatus
- func (c *Component) Initialize() error
- func (c *Component) InputPorts() []component.Port
- func (c *Component) Meta() component.Metadata
- func (c *Component) OutputPorts() []component.Port
- func (c *Component) Start(ctx context.Context) error
- func (c *Component) Stop(timeout time.Duration) error
- type Config
- type SearchRequest
- type SearchResponse
- type SearchResult
- type SimilarEntity
- type SimilarRequest
- type SimilarResponse
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CreateGraphEmbedding ¶
func CreateGraphEmbedding(rawConfig json.RawMessage, deps component.Dependencies) (component.Discoverable, error)
CreateGraphEmbedding is the factory function for creating graph-embedding components
Types ¶
type Component ¶
type Component struct {
// contains filtered or unexported fields
}
Component implements the graph-embedding processor
func (*Component) ConfigSchema ¶
func (c *Component) ConfigSchema() component.ConfigSchema
ConfigSchema returns the configuration schema
func (*Component) DataFlow ¶
func (c *Component) DataFlow() component.FlowMetrics
DataFlow returns current data flow metrics
func (*Component) Health ¶
func (c *Component) Health() component.HealthStatus
Health returns current health status
func (*Component) Initialize ¶
Initialize validates configuration and sets up ports (no I/O)
func (*Component) InputPorts ¶
InputPorts returns input port definitions
func (*Component) OutputPorts ¶
OutputPorts returns output port definitions
type Config ¶
type Config struct {
Ports *component.PortConfig `json:"ports" schema:"type:ports,description:Port configuration,category:basic"`
EmbedderType string `` /* 153-byte string literal not displayed */
BatchSize int `json:"batch_size" schema:"type:int,description:Batch size for embedding generation,category:advanced"`
CacheTTLStr string `json:"cache_ttl" schema:"type:string,description:Cache TTL for embeddings (e.g. 15m or 1h),category:advanced"`
// Dependency startup configuration
StartupAttempts int `` /* 130-byte string literal not displayed */
StartupInterval int `` /* 134-byte string literal not displayed */
// contains filtered or unexported fields
}
Config holds configuration for graph-embedding component
func DefaultConfig ¶
func DefaultConfig() Config
DefaultConfig returns a valid default configuration
func (*Config) ApplyDefaults ¶
func (c *Config) ApplyDefaults()
ApplyDefaults sets default values for configuration
type SearchRequest ¶
SearchRequest is the request format for text search queries
type SearchResponse ¶
type SearchResponse struct {
Query string `json:"query"`
Results []SearchResult `json:"results"`
Duration string `json:"duration"`
}
SearchResponse is the response format for text search queries
type SearchResult ¶
type SearchResult struct {
EntityID string `json:"entity_id"`
Similarity float64 `json:"similarity"`
}
SearchResult represents a search result with relevance score
type SimilarEntity ¶
type SimilarEntity struct {
EntityID string `json:"entity_id"`
Similarity float64 `json:"similarity"`
}
SimilarEntity represents an entity with similarity score
type SimilarRequest ¶
SimilarRequest is the request format for similar entity queries
type SimilarResponse ¶
type SimilarResponse struct {
EntityID string `json:"entity_id"`
Similar []SimilarEntity `json:"similar"`
Duration string `json:"duration"`
}
SimilarResponse is the response format for similar entity queries