Documentation
¶
Overview ¶
Package serverclient is the client-side transport adapter that lets a device-target (cmd/speechkit) or a local-target binary delegate one or more modes (Dictation, Assist, Voice Agent) to a remote SpeechKit Server-Target instead of running the Framework kernel in-process.
Each mode adapter implements the same Go interface that the kernel already exposes for in-process use:
stt.STTProvider for Dictation (POST /v1/dictation/transcribe) assistpkg.Processor for Assist (POST /v1/assist/process) voiceagent.LiveProvider for Voice Agent (POST /v1/voiceagent/sessions + WS)
Callers therefore swap a serverclient adapter for the in-process implementation without touching their orchestration code. The choice is made by config.ModeModelSelection.ResolvedModeSource() per mode; see the device-target's app_init.go for the routing.
The package is intentionally cross-platform pure Go (no build tag) so it works on Windows desktop, Linux daemons, and anywhere else a Go binary might want to embed SpeechKit "as a service".
Index ¶
- Variables
- func IsCode(err error, code string) bool
- type AssistProcessor
- type Client
- type DictationProvider
- type Options
- type ServerError
- type VoiceAgentOption
- type VoiceAgentProvider
- func (p *VoiceAgentProvider) AdvanceStep(reason string) error
- func (p *VoiceAgentProvider) Close() error
- func (p *VoiceAgentProvider) Connect(ctx context.Context, cfg voiceagent.LiveConfig) error
- func (p *VoiceAgentProvider) Name() string
- func (p *VoiceAgentProvider) Receive(ctx context.Context) (*voiceagent.LiveMessage, error)
- func (p *VoiceAgentProvider) SendAudio(chunk []byte) error
- func (p *VoiceAgentProvider) SendAudioStreamEnd() error
- func (p *VoiceAgentProvider) SendText(text string) error
- func (p *VoiceAgentProvider) SendToolResponse(response voiceagent.ToolResponse) error
Constants ¶
This section is empty.
Variables ¶
var ErrServerConnectionDisabled = errors.New("serverclient: server_connection disabled")
ErrServerConnectionDisabled signals that NewFromConfig was called with cfg.Enabled = false. Callers typically branch on this and fall back to the in-process kernel.
Functions ¶
Types ¶
type AssistProcessor ¶
type AssistProcessor struct {
// contains filtered or unexported fields
}
AssistProcessor implements the assist.Processor interface (the same surface internal/server/assist.Handler depends on) against a remote SpeechKit server's POST /v1/assist/process endpoint. The kernel-side caller doesn't know whether the pipeline ran in-process or hopped a network — the abstract type is preserved.
func NewAssist ¶
func NewAssist(client *Client) (*AssistProcessor, error)
NewAssist constructs an AssistProcessor.
func (*AssistProcessor) Process ¶
func (p *AssistProcessor) Process(ctx context.Context, transcript string, opts assistpkg.ProcessOpts) (*assistpkg.Result, error)
Process implements assistpkg.Processor by forwarding the transcript + options to the server. Returned *assistpkg.Result mirrors the wire response, decoding base64 audio back to bytes when present.
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client is the shared transport core for all serverclient mode adapters. One Client per process; mode adapters borrow it and add their own paths + payload encoding. Safe for concurrent use.
func New ¶
New constructs a Client from explicit Options. baseURL must be parseable as an absolute URL with http or https scheme.
func NewFromConfig ¶
func NewFromConfig(cfg config.ServerConnectionConfig) (*Client, error)
NewFromConfig resolves the bearer token from the env var named in cfg.BearerTokenEnv and constructs a Client. Returns an error if the env var is not set or empty when cfg.Enabled is true. When cfg.Enabled is false the function returns (nil, ErrServerConnectionDisabled) so callers can branch cleanly.
func (*Client) BaseURL ¶
BaseURL returns the configured base URL. Useful for the Voice Agent adapter when it needs to derive WS URLs from the same origin.
func (*Client) BearerToken ¶
BearerToken returns the configured bearer token. The Voice Agent adapter needs it to attach the same token to the initial POST /sessions and to pass through to the WebSocket query-string ticket logic.
func (*Client) HTTPClient ¶
HTTPClient returns the underlying *http.Client. Mode adapters use it directly so they can stream uploads/downloads without buffering.
type DictationProvider ¶
type DictationProvider struct {
// contains filtered or unexported fields
}
DictationProvider implements stt.STTProvider against a remote SpeechKit Server-Target's POST /v1/dictation/transcribe endpoint. It is a drop-in replacement for any local stt.STTProvider in the kernel's router — when ModelSelection.Dictate.ModeSource = "server", the device app constructs one of these instead of the in-process whisper.cpp / HuggingFace / etc. provider.
func NewDictation ¶
func NewDictation(client *Client) (*DictationProvider, error)
NewDictation constructs a DictationProvider. Calling Transcribe sends the audio as base64-encoded PCM to the server so the server skips decoding and goes straight to its STT router.
func (*DictationProvider) Health ¶
func (p *DictationProvider) Health(ctx context.Context) error
Health calls the server's /readyz which only returns 200 when the STT router is wired and at least one provider is healthy.
func (*DictationProvider) Name ¶
func (p *DictationProvider) Name() string
Name identifies this provider in router logs. The "server:" prefix makes it obvious in mixed-provider logs that the request was delegated.
func (*DictationProvider) Transcribe ¶
func (p *DictationProvider) Transcribe(ctx context.Context, audio []byte, opts stt.TranscribeOpts) (*stt.Result, error)
Transcribe sends audio bytes to the server and returns the transcription. audio is expected to be already-resampled 16kHz S16 mono PCM (kernel convention). Returns wrapped *ServerError on 4xx/5xx for callers that want to branch on error codes.
type Options ¶
type Options struct {
// BaseURL is the root of the speechkit-server, e.g.
// "https://speechkit.example.com" or "http://localhost:8080".
BaseURL string
// BearerToken is sent verbatim as the Authorization header value
// "Bearer <token>". Leave empty for endpoints that don't require auth
// (only /healthz and /readyz qualify on the server today).
BearerToken string
// HTTPClient overrides the default http.Client. Useful in tests with
// httptest.Server. If nil, a sensible default with the timeout below
// is constructed.
HTTPClient *http.Client
// RequestTimeout caps each non-streaming request. WS upgrades for the
// Voice Agent ignore this and use the request context's deadline. 0
// means no client-side timeout (server-side limits still apply).
RequestTimeout time.Duration
// UserAgent is sent in the User-Agent header. Defaults to
// "speechkit-serverclient/<version>".
UserAgent string
// Debug, when true, logs request lines + response status to stderr.
// Tests opt in via Options{Debug: testing.Verbose()}.
Debug bool
}
Options configures a Client. Use NewFromConfig to wire from config.toml or construct directly for tests.
type ServerError ¶
type ServerError struct {
Status int // HTTP status code from the response
Code string // Server-side error code (e.g. "provider_unavailable")
Message string // Human-friendly message from the envelope
Details map[string]any // Optional details; nil when the envelope omits them
}
ServerError is the typed representation of the server's standard {"error": {"code", "message", "details"}} envelope. Mode adapters return it (wrapped as %w) so callers can use errors.As to introspect both the HTTP status and the machine-readable error code without re-parsing the HTTP response.
func (*ServerError) Error ¶
func (e *ServerError) Error() string
Error formats the envelope for log/diagnostic use. The format is stable across the v1 surface so log greps stay durable.
type VoiceAgentOption ¶
type VoiceAgentOption func(*VoiceAgentProvider)
VoiceAgentOption configures a VoiceAgentProvider.
func WithPersona ¶
func WithPersona(id string) VoiceAgentOption
WithPersona presets the Start-frame persona_id. Pass an empty string to fall back to the server's default persona (typically "default").
func WithSequence ¶
func WithSequence(id string) VoiceAgentOption
WithSequence presets the Start-frame sequence_id.
type VoiceAgentProvider ¶
type VoiceAgentProvider struct {
// contains filtered or unexported fields
}
VoiceAgentProvider implements voiceagent.LiveProvider against the remote SpeechKit server's POST /v1/voiceagent/sessions + WebSocket protocol. The kernel-side voiceagent.Session orchestrates Connect/Send*/Receive without knowing whether the LiveProvider is in-process Gemini Live or this network-hop adapter.
func NewVoiceAgent ¶
func NewVoiceAgent(client *Client, opts ...VoiceAgentOption) (*VoiceAgentProvider, error)
NewVoiceAgent constructs a VoiceAgentProvider.
func (*VoiceAgentProvider) AdvanceStep ¶ added in v0.28.2
func (p *VoiceAgentProvider) AdvanceStep(reason string) error
AdvanceStep asks a server-side Voice Agent workflow to move to the next sequence step. It is intentionally outside voiceagent.LiveProvider because local realtime providers do not own server-side workflow state.
func (*VoiceAgentProvider) Close ¶
func (p *VoiceAgentProvider) Close() error
Close terminates the WebSocket session. Idempotent; a second Close call returns nil.
func (*VoiceAgentProvider) Connect ¶
func (p *VoiceAgentProvider) Connect(ctx context.Context, cfg voiceagent.LiveConfig) error
Connect creates the server-side session, dials the WebSocket using the returned ticket, and sends the mandatory Start frame. Errors at any of the three stages are wrapped with phase context so logs show where the failure occurred (POST vs dial vs first-frame).
func (*VoiceAgentProvider) Name ¶
func (p *VoiceAgentProvider) Name() string
Name identifies this provider in voiceagent logs.
func (*VoiceAgentProvider) Receive ¶
func (p *VoiceAgentProvider) Receive(ctx context.Context) (*voiceagent.LiveMessage, error)
Receive reads the next server-emitted frame and translates it to a voiceagent.LiveMessage. Frames the kernel doesn't care about (state, pong, sequence_step) are swallowed and the function loops to the next frame so callers don't see them.
func (*VoiceAgentProvider) SendAudio ¶
func (p *VoiceAgentProvider) SendAudio(chunk []byte) error
SendAudio writes a binary PCM 16kHz S16 mono chunk to the server. The server forwards the bytes verbatim to the underlying provider after minor framing; chunk sizes between 20 ms and 100 ms are recommended.
func (*VoiceAgentProvider) SendAudioStreamEnd ¶
func (p *VoiceAgentProvider) SendAudioStreamEnd() error
SendAudioStreamEnd signals end-of-turn to the server. The server-side adapter forwards it to the kernel which translates it to the provider-specific stream-end signal.
func (*VoiceAgentProvider) SendText ¶
func (p *VoiceAgentProvider) SendText(text string) error
SendText injects a text-only turn (used by the kernel for idle reminders or when the orchestration layer wants to fake user input).
func (*VoiceAgentProvider) SendToolResponse ¶
func (p *VoiceAgentProvider) SendToolResponse(response voiceagent.ToolResponse) error
SendToolResponse forwards the result of a host-side tool execution.