discovery

package
v0.14.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 16, 2026 License: MIT Imports: 15 Imported by: 0

README

pkg/discovery

Session discovery, metadata extraction, agent ID parsing, and hook input reading. This package finds Claude Code sessions on disk and extracts useful information from their transcripts.

Files

File Role
sessions.go ScanAllSessions, FindSessionByID — find sessions in ~/.claude/projects/
extract.go ExtractSessionMetadata, ExtractMetadataFromLines, SanitizeText — extract summaries and first user messages from transcripts
files.go ExtractAgentIDsFromMessage, IsValidAgentID — find agent IDs in transcript entries
hook.go ReadHookInputFrom — parse hook input JSON from stdin

Key Types

SessionInfo

Returned by ScanAllSessions. Contains session ID, transcript path, project path, mod time, size, summary, and first user message.

ExtractionResult

Returned by ExtractMetadataFromLines. Contains Summary (last local summary), FirstUserMessage, and SummaryLinks (summaries linking to previous sessions via leafUuid).

How to Extend

Supporting new transcript metadata fields: Modify ExtractMetadataFromLines() in extract.go. Add a new field to ExtractionResult, then extract it from the parsed JSON entries. Follow the existing pattern of checking message type and extracting from the appropriate nested field.

Supporting new agent ID formats: Update IsValidAgentID() in files.go. The current validation accepts alphanumeric chars plus hyphen and underscore, with a minimum length of 6. If new formats use different characters, update isAgentIDChar().

Adding new hook input fields: The hook input type lives in pkg/types (HookInput), not here. This package just reads and validates it.

Invariants

  • Session IDs are exactly 36 characters (UUID length). Files with other name lengths are silently skipped by ScanAllSessions. Don't change this — it's how sessions are distinguished from other JSONL files.
  • Agent files use agent- prefix. The scanner skips these when listing sessions but they're picked up by pkg/sync for syncing alongside the main transcript.
  • Metadata extraction reads at most 50 lines (MaxLinesForExtraction). This is a performance bound — transcripts can be very large. Summary and first user message are expected to appear near the beginning.
  • FirstUserMessage is truncated to 4KB (half of MaxMetadataFieldSize). This is a backend-imposed limit.
  • Scanning continues on permission errors. Users may have mixed-permission directories under ~/.claude/projects/. Failed directories are reported to stderr but don't fail the operation.

Design Decisions

Separate ExtractMetadataFromLines vs ExtractSessionMetadata. ExtractSessionMetadata reads from a file path (used during session scanning). ExtractMetadataFromLines operates on in-memory lines (used by pkg/sync during chunk processing). The extraction logic is shared; only the I/O differs.

HTML sanitization on summaries. Claude Code summaries contain HTML tags (<b>, <i>, etc.) and entities. SanitizeText strips tags, decodes entities, and normalizes whitespace so summaries are clean plain text.

Agent ID extraction checks two locations. Agent IDs appear either at the root level (message.toolUseResult.agentId) or nested in content blocks. Both paths must be checked — the format depends on the Claude Code version.

Testing

go test ./pkg/discovery/...

Tests create temporary directory structures mimicking ~/.claude/projects/ with synthetic JSONL files.

Dependencies

Uses: pkg/types (HookInput), pkg/config (GetProjectsDir), pkg/logger

Used by: cmd/ (list, save, hook handlers), pkg/sync/ (agent ID extraction, metadata)

Documentation

Index

Constants

View Source
const (
	// MinAgentIDLength is the minimum length of an agent ID string
	MinAgentIDLength = 6
	// UUIDLength is the expected length of UUID strings (with hyphens)
	UUIDLength = 36
)
View Source
const MaxLinesForExtraction = 50

MaxLinesForExtraction limits how many lines we read when extracting metadata. Summaries and first user messages typically appear in the first few lines.

View Source
const MaxMetadataFieldSize = 8 * 1024 // 8KB

MaxMetadataFieldSize is the backend-imposed limit for metadata fields like first_user_message. The server rejects metadata fields larger than this value. Messages are truncated to half this (4KB) to leave headroom in chunk uploads. If the backend limit changes, this constant must be updated accordingly.

Variables

This section is empty.

Functions

func ExtractAgentIDsFromMessage

func ExtractAgentIDsFromMessage(message map[string]interface{}) []string

ExtractAgentIDsFromMessage extracts agent IDs from a parsed JSONL message. It checks both root-level toolUseResult.agentId and nested content blocks.

func FindSessionByID

func FindSessionByID(partialID string) (fullID string, transcriptPath string, err error)

FindSessionByID finds a session transcript by full or partial ID Returns the full session ID and transcript path

func IsValidAgentID

func IsValidAgentID(s string) bool

IsValidAgentID checks if a string is a valid agent ID. Agent IDs are 6+ characters matching [a-zA-Z0-9_-]+. This covers all observed formats:

  • Pure hex (7-17+ chars): "a0074ac", "a3eaf63159a07953f"
  • Compact: "acompact-2aaa241e456ebc94"
  • Prompt suggestion: "aprompt_suggestion-ba74af"
  • Legacy 8-char hex: "abcd1234"

func ReadHookInputFrom

func ReadHookInputFrom(r io.Reader) (*types.HookInput, error)

ReadHookInputFrom reads and parses hook data from the given reader. It delegates to types.ReadHookInput and additionally validates that transcript_path is non-empty (required by SessionStart/SessionEnd hooks).

func SanitizeText

func SanitizeText(input string) string

SanitizeText removes HTML tags, decodes HTML entities, and normalizes whitespace.

Types

type ExtractionResult

type ExtractionResult struct {
	Summary          string        // Local summary for current session (no leafUuid)
	FirstUserMessage string        // First user message content
	SummaryLinks     []SummaryLink // Summaries with leafUuid (for linking to previous sessions)
}

ExtractionResult holds extracted metadata from transcript lines.

func ExtractMetadataFromLines

func ExtractMetadataFromLines(lines []string) ExtractionResult

ExtractMetadataFromLines extracts summary and first user message from transcript lines. Unlike ExtractSessionMetadata, this processes lines already in memory (from a chunk).

For summaries:

  • Summaries with leafUuid are collected in SummaryLinks (for linking to previous sessions)
  • Summaries without leafUuid are local to current session (last one wins)

For user messages:

  • First user message encountered sets FirstUserMessage

func ExtractSessionMetadata

func ExtractSessionMetadata(transcriptPath string) ExtractionResult

ExtractSessionMetadata reads a transcript file and extracts summary and first user message. It reads up to MaxLinesForExtraction lines and delegates to ExtractMetadataFromLines.

type SessionInfo

type SessionInfo struct {
	SessionID        string
	TranscriptPath   string
	ProjectPath      string // Relative path from projects dir
	ModTime          time.Time
	SizeBytes        int64
	Summary          string // First summary after first user message
	FirstUserMessage string // First user message content
}

SessionInfo holds metadata about a discovered session

func ScanAllSessions

func ScanAllSessions() ([]SessionInfo, error)

ScanAllSessions finds all session transcript files in ~/.claude/projects/ Returns sessions sorted by modification time (oldest first)

type SummaryLink struct {
	Summary  string
	LeafUUID string
}

SummaryLink represents a summary that links to a previous session via leafUuid.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL