serverclient

package
v0.35.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 18, 2026 License: Apache-2.0 Imports: 17 Imported by: 0

Documentation

Overview

Package serverclient is the client-side transport adapter that lets a device-target (cmd/speechkit) or a local-target binary delegate one or more modes (Dictation, Assist, Voice Agent) to a remote SpeechKit Server-Target instead of running the Framework kernel in-process.

Each mode adapter implements the same Go interface that the kernel already exposes for in-process use:

stt.STTProvider              for Dictation       (POST /v1/dictation/transcribe)
assistpkg.Processor          for Assist          (POST /v1/assist/process)
voiceagent.LiveProvider      for Voice Agent     (POST /v1/voiceagent/sessions + WS)

Callers therefore swap a serverclient adapter for the in-process implementation without touching their orchestration code. The choice is made by config.ModeModelSelection.ResolvedModeSource() per mode; see the device-target's app_init.go for the routing.

The package is intentionally cross-platform pure Go (no build tag) so it works on Windows desktop, Linux daemons, and anywhere else a Go binary might want to embed SpeechKit "as a service".

Index

Constants

This section is empty.

Variables

View Source
var ErrServerConnectionDisabled = errors.New("serverclient: server_connection disabled")

ErrServerConnectionDisabled signals that NewFromConfig was called with cfg.Enabled = false. Callers typically branch on this and fall back to the in-process kernel.

Functions

func IsCode

func IsCode(err error, code string) bool

IsCode reports whether the error envelope carries the given code. Useful for callers that want to branch on, say, "provider_unavailable" without a type assertion. Returns false if err is not (or does not wrap) a *ServerError.

Types

type AssistProcessor

type AssistProcessor struct {
	// contains filtered or unexported fields
}

AssistProcessor implements the assist.Processor interface (the same surface internal/server/assist.Handler depends on) against a remote SpeechKit server's POST /v1/assist/process endpoint. The kernel-side caller doesn't know whether the pipeline ran in-process or hopped a network — the abstract type is preserved.

func NewAssist

func NewAssist(client *Client) (*AssistProcessor, error)

NewAssist constructs an AssistProcessor.

func (*AssistProcessor) Process

func (p *AssistProcessor) Process(ctx context.Context, transcript string, opts assistpkg.ProcessOpts) (*assistpkg.Result, error)

Process implements assistpkg.Processor by forwarding the transcript + options to the server. Returned *assistpkg.Result mirrors the wire response, decoding base64 audio back to bytes when present.

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client is the shared transport core for all serverclient mode adapters. One Client per process; mode adapters borrow it and add their own paths + payload encoding. Safe for concurrent use.

func New

func New(opts Options) (*Client, error)

New constructs a Client from explicit Options. baseURL must be parseable as an absolute URL with http or https scheme.

func NewFromConfig

func NewFromConfig(cfg config.ServerConnectionConfig) (*Client, error)

NewFromConfig resolves the bearer token from the env var named in cfg.BearerTokenEnv and constructs a Client. AuthMode decides whether that token is sent as Authorization: Bearer or X-Api-Key. Returns an error if the env var is not set or empty when cfg.Enabled is true. When cfg.Enabled is false the function returns (nil, ErrServerConnectionDisabled) so callers can branch cleanly.

func (*Client) AuthMode added in v0.31.0

func (c *Client) AuthMode() string

func (*Client) BaseURL

func (c *Client) BaseURL() *url.URL

BaseURL returns the configured base URL. Useful for the Voice Agent adapter when it needs to derive WS URLs from the same origin.

func (*Client) BearerToken

func (c *Client) BearerToken() string

BearerToken returns the configured auth token. The Voice Agent adapter uses it for the initial POST /sessions; the WebSocket hop is ticket-only.

func (*Client) HTTPClient

func (c *Client) HTTPClient() *http.Client

HTTPClient returns the underlying *http.Client. Mode adapters use it directly so they can stream uploads/downloads without buffering.

func (*Client) Health

func (c *Client) Health(ctx context.Context) error

Health hits /healthz to confirm the server is reachable. Returns nil on 200, a wrapped error otherwise. Use it from /readyz dashboards on the device-target so a misconfigured ServerConnection surfaces early.

func (*Client) Probe added in v0.31.0

func (c *Client) Probe(ctx context.Context) (ProbeResult, error)

type DictationProvider

type DictationProvider struct {
	// contains filtered or unexported fields
}

DictationProvider implements stt.STTProvider against a remote SpeechKit Server-Target's POST /v1/dictation/transcribe endpoint. It is a drop-in replacement for any local stt.STTProvider in the kernel's router — when ModelSelection.Dictate.ModeSource = "server", the device app constructs one of these instead of the in-process whisper.cpp / HuggingFace / etc. provider.

func NewDictation

func NewDictation(client *Client) (*DictationProvider, error)

NewDictation constructs a DictationProvider. Calling Transcribe sends the audio as base64-encoded PCM to the server so the server skips decoding and goes straight to its STT router.

func (*DictationProvider) Health

func (p *DictationProvider) Health(ctx context.Context) error

Health calls the server's /readyz which only returns 200 when the STT router is wired and at least one provider is healthy.

func (*DictationProvider) Name

func (p *DictationProvider) Name() string

Name identifies this provider in router logs. The "server:" prefix makes it obvious in mixed-provider logs that the request was delegated.

func (*DictationProvider) Transcribe

func (p *DictationProvider) Transcribe(ctx context.Context, audio []byte, opts stt.TranscribeOpts) (*stt.Result, error)

Transcribe sends audio bytes to the server and returns the transcription. audio is expected to be already-resampled 16kHz S16 mono PCM (kernel convention). Returns wrapped *ServerError on 4xx/5xx for callers that want to branch on error codes.

type Options

type Options struct {
	// BaseURL is the root of the speechkit-server, e.g.
	// "https://speechkit.example.com" or "http://localhost:8080".
	BaseURL string

	// BearerToken is sent verbatim as the Authorization header value
	// "Bearer <token>" by default, or as X-Api-Key when AuthMode is
	// "api_key". Leave empty for endpoints that don't require auth.
	BearerToken string

	// AuthMode controls how BearerToken is attached to HTTP requests.
	// Defaults to "bearer"; "api_key" sends the same configured token as
	// X-Api-Key for custom servers that expect header-based API keys.
	AuthMode string

	// HTTPClient overrides the default http.Client. Useful in tests with
	// httptest.Server. If nil, a sensible default with the timeout below
	// is constructed.
	HTTPClient *http.Client

	// RequestTimeout caps each non-streaming request. WS upgrades for the
	// Voice Agent ignore this and use the request context's deadline. 0
	// means no client-side timeout (server-side limits still apply).
	RequestTimeout time.Duration

	// UserAgent is sent in the User-Agent header. Defaults to
	// "speechkit-serverclient/<version>".
	UserAgent string

	// Debug, when true, logs request lines + response status to stderr.
	// Tests opt in via Options{Debug: testing.Verbose()}.
	Debug bool
}

Options configures a Client. Use NewFromConfig to wire from config.toml or construct directly for tests.

type ProbeResult added in v0.31.0

type ProbeResult struct {
	HealthStatus int
	ReadyStatus  int
}

type ServerError

type ServerError struct {
	Status  int            // HTTP status code from the response
	Code    string         // Server-side error code (e.g. "provider_unavailable")
	Message string         // Human-friendly message from the envelope
	Details map[string]any // Optional details; nil when the envelope omits them
}

ServerError is the typed representation of the server's standard {"error": {"code", "message", "details"}} envelope. Mode adapters return it (wrapped as %w) so callers can use errors.As to introspect both the HTTP status and the machine-readable error code without re-parsing the HTTP response.

func (*ServerError) Error

func (e *ServerError) Error() string

Error formats the envelope for log/diagnostic use. The format is stable across the v1 surface so log greps stay durable.

type VoiceAgentOption

type VoiceAgentOption func(*VoiceAgentProvider)

VoiceAgentOption configures a VoiceAgentProvider.

func WithPersona

func WithPersona(id string) VoiceAgentOption

WithPersona presets the Start-frame persona_id. Pass an empty string to fall back to the server's default persona (typically "default").

func WithRole

func WithRole(id string) VoiceAgentOption

WithRole presets the Start-frame role_id.

func WithSequence

func WithSequence(id string) VoiceAgentOption

WithSequence presets the Start-frame sequence_id.

type VoiceAgentProvider

type VoiceAgentProvider struct {
	// contains filtered or unexported fields
}

VoiceAgentProvider implements voiceagent.LiveProvider against the remote SpeechKit server's POST /v1/voiceagent/sessions + WebSocket protocol. The kernel-side voiceagent.Session orchestrates Connect/Send*/Receive without knowing whether the LiveProvider is in-process Gemini Live or this network-hop adapter.

func NewVoiceAgent

func NewVoiceAgent(client *Client, opts ...VoiceAgentOption) (*VoiceAgentProvider, error)

NewVoiceAgent constructs a VoiceAgentProvider.

func (*VoiceAgentProvider) AdvanceStep added in v0.28.2

func (p *VoiceAgentProvider) AdvanceStep(reason string) error

AdvanceStep asks a server-side Voice Agent workflow to move to the next sequence step. It is intentionally outside voiceagent.LiveProvider because local realtime providers do not own server-side workflow state.

func (*VoiceAgentProvider) Close

func (p *VoiceAgentProvider) Close() error

Close terminates the WebSocket session. Idempotent; a second Close call returns nil.

func (*VoiceAgentProvider) Connect

Connect creates the server-side session, dials the WebSocket using the returned ticket, and sends the mandatory Start frame. Errors at any of the three stages are wrapped with phase context so logs show where the failure occurred (POST vs dial vs first-frame).

func (*VoiceAgentProvider) Name

func (p *VoiceAgentProvider) Name() string

Name identifies this provider in voiceagent logs.

func (*VoiceAgentProvider) Receive

Receive reads the next server-emitted frame and translates it to a voiceagent.LiveMessage. Frames the kernel doesn't care about (state, pong, sequence_step) are swallowed and the function loops to the next frame so callers don't see them.

func (*VoiceAgentProvider) SendAudio

func (p *VoiceAgentProvider) SendAudio(chunk []byte) error

SendAudio writes a binary PCM 16kHz S16 mono chunk to the server. The server forwards the bytes verbatim to the underlying provider after minor framing; chunk sizes between 20 ms and 100 ms are recommended.

func (*VoiceAgentProvider) SendAudioStreamEnd

func (p *VoiceAgentProvider) SendAudioStreamEnd() error

SendAudioStreamEnd signals end-of-turn to the server. The server-side adapter forwards it to the kernel which translates it to the provider-specific stream-end signal.

func (*VoiceAgentProvider) SendText

func (p *VoiceAgentProvider) SendText(text string) error

SendText injects a text-only turn (used by the kernel for idle reminders or when the orchestration layer wants to fake user input).

func (*VoiceAgentProvider) SendToolResponse

func (p *VoiceAgentProvider) SendToolResponse(response voiceagent.ToolResponse) error

SendToolResponse forwards the result of a host-side tool execution.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL