GoRa - A self-hosted AI companion with long-term memory, knowledge graphs, and emotional awareness

GoRa (Go based Retrieval Augmented Generation) is a self-hosted AI system that goes beyond simple document Q&A. It combines Ollama for local LLM inference, Redis for vector search, and Neo4j for structured knowledge graphs - and can operate in two modes:
- Documentation mode: Chat with your local docs. Shared vector store, factual answers, no hallucinations.
- Companion mode: A persistent AI companion that remembers you across sessions, builds a knowledge graph about your life, tracks emotional context, adapts its personality over time, and reaches out when you've been away.
Architecture
GoRa uses a hybrid retrieval strategy combining three complementary memory systems:
- Vector Search (Redis): Semantic similarity search over document embeddings for context-aware retrieval.
- Knowledge Graph (Neo4j): Structured facts and relationships extracted from documents and conversations - with confidence scoring, conflict detection, and hard-fact protection.
- Rolling Memory: Session summaries that compress long conversations into concise context, preventing token overflow. Episodic memories are archived to Neo4j for long-term recall.
Requirements
To run GoRa, you need the following components:
- Go (1.25.3 or higher)
- Docker & Docker Compose (for Redis Stack & Neo4j)
- Ollama with the following models (models can be changed in
config.yml):
mxbai-embed-large: For high-performance embeddings.
mistral-nemo: For synthetic question generation, knowledge extraction, summarization, and safety classification.
gpt-oss:20b: For generating precise, context-aware answers.
qwen2.5:14b: Used for knowledge graph extraction.
Getting Started
We provide a Makefile to simplify all common tasks.
1. Spin up the infrastructure
Start Redis Stack and Neo4j:
make up
2. Prepare your data
Place your documentation (Markdown or Text files) into the /data directory. The system will automatically parse, chunk, and generate synthetic questions for these files to improve search accuracy.
3. Populate the databases
Convert your text into vectors and extract knowledge graphs:
make import
You can also import a specific file or directory:
go run ./cmd/database/import.go -path /path/to/file_or_dir
4. Start the conversation
Option A: Interactive CLI
make run
Option B: HTTP Server with Web UI
make http
Then open http://localhost:8080 in your browser.
Configuration
GoRa uses a config.yml file for all settings. If an env.yml file exists, it takes priority (useful for local overrides - env.yml is gitignored).
General Settings
| Key |
Description |
Default |
settings.debug |
Enables verbose logging (similarity scores, prompts, graph queries). |
false |
settings.max_history_messages |
Number of recent chat messages included in the LLM prompt. |
10 |
settings.chat_history_ttl |
Time-to-live for chat history in hours. 0 = persist forever. |
0 |
Logging
| Key |
Description |
Default |
settings.log_format |
Log output format: json for machine-readable, text for human-readable. |
text |
settings.log_level |
Minimum log level: debug, info, warn, error. |
info |
settings.log_path |
File path for log output. Empty = per-binary default (see below). |
logs/gora.log |
Environment variable overrides: GORA_LOG_FORMAT, GORA_LOG_LEVEL, GORA_LOG_PATH.
When log_path is not set or empty, each binary writes to its own default file:
gora (CLI): ./logs/cli.log
gora-server (HTTP): ./logs/http.log
gora-import (Import): ./logs/import.log
Log files are automatically rotated: max 10 MB per file, 5 backups retained, 30 days max age, gzip-compressed.
Database - Redis (Vector Store)
| Key |
Description |
Default |
database.redis.ollama_model |
Model used for creating vector embeddings. |
mxbai-embed-large |
database.redis.ollama_url |
Ollama URL for embedding generation. |
http://127.0.0.1:11434 |
database.redis.uri |
Redis connection string. Override: GORA_REDIS_URI. |
redis://localhost:6379 |
database.redis.index_name |
Base name for the search index in Redis. |
gora-doc |
database.redis.append_model_name_to_index |
Appends the model name to the index (e.g., gora-doc-mxbai-embed-large). Prevents index pollution when switching models. |
true |
database.redis.embed_dimension |
Dimension of the embedding vectors. Must match the model. |
1024 |
database.redis.top_k_results |
Number of documents returned by similarity search for context. |
5 |
database.redis.hnsw_m |
HNSW max edges per node. Higher = better recall, more memory. |
16 |
database.redis.hnsw_ef_construction |
HNSW exploration factor at build time. Higher = better index quality, slower builds. |
200 |
database.redis.hnsw_ef_runtime |
HNSW exploration factor at query time. Higher = better recall, slower queries. |
10 |
Database - Neo4j (Knowledge Graph)
| Key |
Description |
Default |
database.neo4j.ollama_model |
Model used for knowledge graph extraction. |
mistral-nemo |
database.neo4j.ollama_url |
Ollama URL for extraction. |
http://127.0.0.1:11434 |
database.neo4j.uri |
Neo4j Bolt connection URI. Override: GORA_NEO4J_URI. |
bolt://localhost:7687 |
database.neo4j.username |
Neo4j username. Override: GORA_NEO4J_USERNAME. |
neo4j |
database.neo4j.password |
Neo4j password. Override: GORA_NEO4J_PASSWORD. |
password |
database.neo4j.max_connection_lifetime |
Max lifetime of a connection in minutes. |
30 |
database.neo4j.max_connection_pool_size |
Max number of connections in the pool. |
50 |
database.neo4j.graph_search_limit |
Max results from graph knowledge searches. |
100 |
Setup (Document Import)
| Key |
Description |
Default |
setup.ollama_model_synthetic_questions |
Fast model for generating synthetic questions during ingestion. |
mistral-nemo |
setup.ollama_synthetic_questions_url |
Ollama URL for synthetic question generation. |
http://127.0.0.1:11434 |
setup.data_root_path |
Local directory containing your documentation files. |
data |
setup.redis_chunk_size |
Character count per document chunk. |
500 |
setup.redis_chunk_overlap |
Character overlap between chunks to maintain context. |
100 |
setup.synthetic_question_worker_count |
Number of parallel workers for question generation. |
8 |
Summarizer (Rolling Memory)
| Key |
Description |
Default |
summarizer.ollama_model |
Model used for generating conversation summaries. |
mistral-nemo |
summarizer.ollama_url |
Ollama URL for summarization. |
http://127.0.0.1:11434 |
summarizer.rolling_chunk_size |
Number of oldest messages summarized per rolling cycle. |
10 |
summarizer.session_timeout_minutes |
Idle time in minutes before a session is archived and a fresh one starts. 0 = disabled. |
15 |
summarizer.min_starter_gap_hours |
Minimum hours since last activity before a conversation starter is generated. |
2 |
Safety
| Key |
Description |
Default |
safety.ollama_model |
Model used for safety classification and crisis detection. |
mistral-nemo |
safety.ollama_url |
Ollama URL for safety analysis. |
http://127.0.0.1:11434 |
safety.crisis_resources |
Newline-separated helpline information injected during crisis events. |
(see config.yml) |
HTTP Server
| Key |
Description |
Default |
http.port |
Port for the HTTP server. |
8080 |
http.read_timeout |
Read timeout in seconds. |
120 |
http.write_timeout |
Write timeout in seconds. |
120 |
http.sse_flush_tokens |
Tokens to buffer before flushing SSE. 1 = flush every token. Higher values reduce syscalls. |
1 |
http.trusted_proxies |
List of IP/CIDR ranges allowed to set X-Forwarded-For. Empty = never trust. |
[] |
http.cors_allowed_origins |
Allowed CORS origins. |
["*"] |
http.rate_limit.enabled |
Enable rate limiting. |
true |
http.rate_limit.requests_per_second |
Rate limit for API requests. |
2.0 |
http.rate_limit.burst |
Maximum burst size for rate limiting. |
5 |
http.rate_limit.cleanup_interval_s |
Stale rate limit entry cleanup interval in seconds. |
300 |
http.timing.enabled |
Enable variable response delays simulating human typing (companion mode). |
true |
http.timing.min_delay_ms |
Minimum delay before streaming starts in milliseconds. |
500 |
http.timing.max_delay_ms |
Maximum delay before streaming starts in milliseconds. |
3000 |
AI - Main LLM
| Key |
Description |
Default |
ai.ollama_url |
Ollama URL for the main LLM. Override: GORA_OLLAMA_URL. |
http://127.0.0.1:11434 |
ai.ollama_model |
Main model for generating final answers. |
gpt-oss:20b |
ai.temperature |
Creativity of the main model. 0.0 = deterministic, 2.0 = max. |
0.0 |
ai.ollama_draft_url |
Ollama URL for the draft model. |
http://127.0.0.1:11434 |
ai.ollama_draft_model |
Fast model for drafts, starters, and lightweight tasks. Empty = disabled. |
mistral-nemo |
ai.draft_temperature |
Temperature for the draft model. |
0.0 |
ai.draft_timeout |
Timeout for draft generation in seconds. |
10 |
ai.num_ctx |
Context window size for the main model in tokens. |
16384 |
ai.draft_num_ctx |
Context window size for the draft model in tokens. |
4096 |
ai.ollama_vision_model |
Multimodal model for image descriptions (e.g., LLaVA). Empty = disabled. |
(empty) |
ai.ollama_vision_url |
Ollama URL for the vision model. |
http://127.0.0.1:11434 |
ai.vision_timeout |
Timeout for vision model calls in seconds. |
60 |
AI - Persona
| Key |
Description |
Default |
ai.name |
The name of your AI assistant. |
GoRa |
ai.gender |
Gender identity (used in some persona prompts). |
diverse |
ai.persona.style |
Instruction for the AI's communication style. |
(see config.yml) |
ai.persona.tone |
Instruction for the AI's tone. |
(see config.yml) |
ai.persona.background |
Background context the AI is given about itself. Supports {{ .name }} template. |
(see config.yml) |
ai.persona.type_of_relationship |
Relationship type: friendship, romantic, mentor, or empty. |
friendship |
ai.persona.key_traits |
Comma-separated personality traits. |
loyal,sassy,extroverted,proactive |
AI - Prompts
| Key |
Description |
Default |
ai.prompts.system_path |
Path to the main system prompt template. |
./prompts/system.txt |
ai.prompts.extraction_path |
Path to the knowledge extraction prompt (user messages). |
./prompts/internal/extract.txt |
ai.prompts.extraction_ai_path |
Path to the knowledge extraction prompt (AI responses). |
./prompts/internal/extract_ai.txt |
ai.prompts.rolling_update_path |
Path to the rolling summary update prompt. |
./prompts/internal/rolling_update.txt |
ai.prompts.safety_path |
Path to the safety classification prompt. |
./prompts/internal/safety.txt |
ai.prompts.summary_safety_path |
Path to the summary safety analysis prompt. |
./prompts/internal/summary_safety.txt |
ai.prompts.emotional_residue_path |
Path to the emotional residue extraction prompt. |
./prompts/internal/emotional_residue.txt |
ai.prompts.crisis_template_path |
Path to the crisis response template. |
./prompts/internal/crisis.txt |
ai.prompts.starter_path |
Path to the conversation starter prompt (companion mode). |
./prompts/internal/starter.txt |
ai.prompts.onboarding_paths |
List of prompt paths for each onboarding phase. Count must match onboarding_sessions. |
[onboarding_1.txt, onboarding_2.txt, onboarding_3.txt] |
ai.prompts.onboarding_starter_path |
Path to the onboarding-specific conversation starter prompt. |
./prompts/internal/onboarding_starter.txt |
ai.prompts.outreach_path |
Path to the LLM-generated outreach message prompt. |
./prompts/internal/outreach.txt |
AI - Season Context
| Key |
Description |
Default |
ai.season_context |
Map of date-range keys to seasonal descriptions injected into the system prompt. Keys: jan_early, jan_late, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec_advent, dec_holiday. |
(see config.yml for defaults - Central European climate) |
AI - Graph
| Key |
Description |
Default |
ai.graph.allowed_relationships |
Whitelist of relationship types the extraction model is allowed to create. |
(see config.yml for full list) |
Features
| Key |
Description |
Default |
features.active_learning |
Enables companion mode: per-user isolation, knowledge extraction, rolling memory. |
false |
features.onboarding_enabled |
Enables guided onboarding for new users. Requires active_learning: true. |
true |
features.onboarding_sessions |
Number of onboarding sessions before switching to the normal system prompt (1-10). |
3 |
features.consent_before_storing |
Enables per-category opt-in/opt-out before storing personal facts. |
false |
features.life_simulation |
Enables GoRa's life simulation layer - mood, activity, thoughts at session start. |
false |
features.life_simulation_probability |
Probability (0.0-1.0) of generating a life state per session. |
0.6 |
features.self_disclosure |
Enables reciprocal self-disclosure - GoRa occasionally shares about itself. |
false |
features.self_disclosure_max_per_session |
Max self-disclosures per session. |
1 |
features.self_disclosure_balance_threshold |
User:AI disclosure ratio above which GoRa is nudged to share. |
1.5 |
features.spontaneous_outreach |
Enables non-event-based proactive messages after user inactivity. |
false |
features.spontaneous_min_gap_hours |
Minimum hours since last interaction before spontaneous outreach. |
6 |
features.backup.enabled |
Enables automatic per-user data backups. |
false |
features.backup.interval_hours |
Time between backup runs in hours. |
24 |
features.backup.dir |
Directory for backup files. |
./backups/users |
features.escalation_enabled |
Enables emotional escalation in outreach messages (casual → self-doubt → missing). |
false |
features.escalation_min_gap_hours |
Minimum hours between escalation level increases. |
6 |
features.backup.max_per_user |
Max backup files per user. Oldest are pruned. |
5 |
Asymmetric Model Strategy
GoRa allows you to use different models for different tasks to optimize performance:
- Final Generation: Use a large model (e.g.,
gpt-oss:20b) for high-quality reasoning.
- Knowledge Extraction / Synthetic Questions / Summarization: Use medium/fast models (e.g.,
mistral-nemo, qwen2.5:14b) for speed.
- Embeddings: Specialized models like
mxbai-embed-large for state-of-the-art retrieval.
Parallel Ingestion
The synthetic_question_worker_count setting controls how many document chunks are processed simultaneously.
Tip: If you have a high-end GPU with plenty of VRAM (48GB+), increase this value and set OLLAMA_NUM_PARALLEL in your environment to match for true hardware parallelism.
Index Management
By setting append_model_name_to_index: true, GoRa automatically separates your data when you switch embedding models. Since embeddings from different models are not compatible, this prevents polluting your search results.
Prompt Customization
GoRa ships with multiple prompt templates in the /prompts directory:
| File |
Purpose |
system.txt |
Default technical documentation assistant. Strict, factual, no hallucinations. |
friend.txt |
Personal companion with time-awareness, memory usage, and persona adaptation. |
onboarding_1.txt |
Onboarding phase 1: First meeting - introduction, no prior context. |
onboarding_2.txt |
Onboarding phase 2: Reconnecting - getting to know interests, building familiarity. |
onboarding_3.txt |
Onboarding phase 3: Deepening the connection - relationship becomes comfortable. |
internal/extract.txt |
Knowledge graph extraction from user messages (entities & relationships). |
internal/extract_ai.txt |
Knowledge graph extraction from AI responses. |
internal/extract_system.txt |
System extraction prompt. |
internal/rolling_update.txt |
Rolling summary updates during long conversations. |
internal/safety.txt |
Safety classification prompt (sentiment, crisis level). |
internal/summary_safety.txt |
Safety analysis on rolling summaries. |
internal/emotional_residue.txt |
Emotional state extraction at session end. |
internal/crisis.txt |
Crisis response template with helpline resources. |
internal/starter.txt |
Conversation starter generation for returning users (companion mode). |
internal/onboarding_starter.txt |
Phase-aware conversation starter for onboarding sessions. |
internal/outreach.txt |
LLM-generated outreach message template (escalation levels). |
Switch the active prompt by changing ai.prompts.system_path in your config.
Makefile Commands
Infrastructure
| Command |
Description |
make up |
Start Docker stack (Redis & Neo4j). |
make down |
Stop Docker stack. |
make restart |
Restart Docker stack (down + up). |
make status |
Show Docker container status. |
Application
| Command |
Description |
make build |
Compile all binaries into /bin. |
make build-release-doku |
Build a .deb package for server deployment (VERSION=x.y.z DEB_ARCH=amd64|arm64). |
make run |
Start GoRa interactive CLI. |
make http |
Start GoRa HTTP server. |
make import |
Chunk documents and populate Redis + Neo4j. |
make create-user |
Create a new user for companion mode (active_learning). |
make deps |
Update all Go dependencies and tidy go.mod. |
make clean |
Remove compiled binaries. |
Testing
| Command |
Description |
make test |
Run unit + integration tests. |
make test-unit |
Run unit tests only. |
make test-integration |
Run integration tests (requires Redis, Neo4j, and Ollama). |
make test-integration-llm |
Run integration tests including actual LLM calls (5 min timeout). |
make test-coverage |
Run tests with coverage and open HTML report. |
Database Management
| Command |
Description |
make wipe |
Wipe both Redis and Neo4j. |
make wipe-redis |
Wipe Redis only. |
make wipe-graph |
Wipe Neo4j only. |
Ollama
| Command |
Description |
make ollama |
Pull all models and list all. |
make ollama-pull |
Download models from the Ollama library. |
make ollama-list |
List all locally available Ollama models. |
Backup & Restore
| Command |
Description |
make backup |
Backup both databases. |
make backup-redis |
Backup Redis only (snapshot to /backups). |
make backup-graph |
Backup Neo4j only (archive to /backups). |
make restore |
Restore both databases from latest backups. |
make restore-redis |
Restore Redis from latest backup. |
make restore-graph |
Restore Neo4j from latest backup. |
API Endpoints
Core (always available)
| Method |
Endpoint |
Description |
POST |
/api/chat |
Send a message, receive full response as JSON. |
POST |
/api/chat/stream |
Send a message, receive response as SSE stream. |
GET |
/api/history?session_id=... |
Retrieve recent chat history for a session. |
GET |
/health |
Health check - returns Redis, Neo4j, and Ollama connectivity status. |
Companion Mode (active_learning: true)
These endpoints are only available when companion mode is enabled.
Conversation
| Method |
Endpoint |
Description |
GET |
/api/starter?session_id=... |
Generate a conversation starter based on past context. |
GET |
/api/notifications |
Get pending proactive notifications and follow-ups. |
Memory Management
| Method |
Endpoint |
Description |
GET |
/api/memory |
Retrieve all stored entities and relationships for the authenticated user. |
DELETE |
/api/memory/entity?name=... |
Delete an entity and all its relationships. |
DELETE |
/api/memory/relationship?source=...&target=...&type=... |
Delete a specific relationship between two entities. |
DELETE |
/api/memory/forget?keyword=... |
Bulk delete entities and relationships matching a keyword. |
PUT |
/api/memory/sensitivity?source=...&target=...&type=...&level=... |
Set sensitivity level (low, medium, high) on a relationship. |
PUT |
/api/memory/confirm?source=...&target=...&type=... |
Mark a fact as confirmed (prevents overwriting and decay). |
PUT |
/api/memory/approve?source=...&target=...&type=... |
Approve a pending consent fact. |
DELETE |
/api/memory/reject?source=...&target=...&type=... |
Reject a pending consent fact. |
Consent & Preferences
| Method |
Endpoint |
Description |
GET |
/api/consent |
Get user consent preferences (per-category opt-in/opt-out). |
PUT |
/api/consent |
Update user consent preferences. |
GET |
/api/outreach/preference |
Get spontaneous outreach preference. |
PUT |
/api/outreach/preference |
Set spontaneous outreach preference. |
Data Management
| Method |
Endpoint |
Description |
GET |
/api/export |
Export all user data as JSON (graph, history, episodic memories, profile). |
POST |
/api/import |
Import user data from JSON backup. |
DELETE |
/api/delete-account |
Delete all user data permanently (right to be forgotten). |
Administration
| Method |
Endpoint |
Description |
POST |
/api/admin/users |
Create a new user. |
GET |
/api/admin/metrics |
Get aggregated conversation quality metrics. |
Request Body (/api/chat and /api/chat/stream)
{
"message": "How does GoRa handle embeddings?",
"session_id": "my-session-id",
"image": "<base64-encoded image data (optional)>",
"mime_type": "image/png (required when image is set)"
}
Request Body (/api/admin/users)
{
"name": "Alice",
"email": "alice@example.com"
}
Health Check Response (/health)
{
"status": "ok",
"redis": "ok",
"neo4j": "ok",
"ollama": "ok"
}
Returns 200 OK when all services are healthy, 503 Service Unavailable when degraded.
Security
API Authentication
GoRa supports two authentication modes depending on the operating mode:
Documentation mode (active_learning: false): Optional shared API key via the GORA_API_KEY environment variable. If no key is configured, the API is accessible without authentication - this is acceptable for local development but should never be used in a publicly reachable environment.
Companion mode (active_learning: true): Per-user API key authentication. Each user receives a unique API key when created via make create-user or the /api/admin/users endpoint. The first user must be created via CLI.
Setting up shared API key (documentation mode)
Generate a cryptographically secure key and export it:
export GORA_API_KEY="$(openssl rand -hex 32)"
Then start the server as usual:
make http
You can create a .env file, too, add the GORA_API_KEY=1234567890abcdef and run it like
source .env && make http
All API requests must include the key as a Bearer token in the Authorization header:
curl -X POST http://localhost:8080/api/chat \
-H "Authorization: Bearer <your-key>" \
-H "Content-Type: application/json" \
-d '{"message": "What is GoRa?", "session_id": "my-session"}'
Requests without a valid key will receive a 401 Unauthorized response. If you are running GoRa behind a reverse proxy (e.g. nginx or Traefik), configure http.trusted_proxies with your proxy IPs so that rate limiting operates on the real client IP rather than the proxy address.
GoRa sets the following security headers on all responses:
Content-Security-Policy
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Referrer-Policy
CORS
CORS is configured globally via http.cors_allowed_origins. Defaults to ["*"] - restrict this in production.
Environment Variables
| Variable |
Description |
GORA_API_KEY |
Shared API key (documentation mode) or admin key (companion mode). |
GORA_REDIS_URI |
Override Redis connection URI. |
GORA_NEO4J_URI |
Override Neo4j connection URI. |
GORA_NEO4J_USERNAME |
Override Neo4j username. |
GORA_NEO4J_PASSWORD |
Override Neo4j password. |
GORA_OLLAMA_URL |
Override main Ollama URL. |
GORA_OLLAMA_SUMMARIZER_URL |
Override Ollama URL for summarization. |
GORA_OLLAMA_EMBED_URL |
Override Ollama URL for embeddings. |
GORA_OLLAMA_GRAPH_URL |
Override Ollama URL for graph extraction. |
GORA_OLLAMA_SYNTHETIC_QUESTIONS_URL |
Override Ollama URL for synthetic question generation. |
GORA_LOG_FORMAT |
Override log format (json or text). |
GORA_LOG_LEVEL |
Override log level (debug, info, warn, error). |
GORA_LOG_PATH |
Override log file path. |
License
This project is licensed under the MIT License.