Documentation
¶
Overview ¶
Package shellparse provides shell-quoting-aware preprocessing for hooks that pattern-match Bash command strings.
Why this exists: any hook that runs regex over a Bash command line — auto-RAG drift detection, command-allowlist gates, taint trackers — must skip content inside heredoc bodies, single-quoted strings, and double-quoted strings. The shell sees those regions as literal text; a naïve regex sees command separators (`|`, `;`, `&`) and command verbs inside them and fires false positives. This was a real failure mode discovered during workhorse's auto-RAG dogfood 2026-05-06/07: drift verbs in git commit-message heredoc bodies tripped the guard twice during commit authoring.
Origin: ported from workhorse v0.3 stable (`/Users/peiman/dev/workhorse/.claude/scripts/shell-strip.sh`, commits b0c7ee4 + 0ee2f89, 2026-05-07). The bash reference is the load-bearing implementation; this Go port is a translation, not a reinterpretation.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func StripShellQuoting ¶
StripShellQuoting returns the OUTSIDE skeleton of a Bash command string with heredoc bodies and quoted regions removed. Used by drift-detection regexes that must not match command separators or verbs that appear inside literal-quoted text.
Coverage (intentional, v0.3):
- heredoc starts: <<MARKER, <<-MARKER, <<'MARKER', <<"MARKER", <<-'MARKER', <<-"MARKER". The <<- variant strips leading tabs (only tabs, not spaces, per shell semantics) when matching the close marker.
- single-quoted strings ('...'): no escapes per shell semantics; close on next '.
- double-quoted strings ("..."): \X is an escape (so \" doesn't close); close on unescaped ".
Coverage NOT included (deferred TODOs from workhorse v0.3 spec):
- here-strings (<<<): body is single token, low false-positive risk.
- nested same-marker heredocs: exotic shell construct; not observed in workhorse workflow.
- heredocs inside $(...) command substitution: this preprocessor is line-oriented for heredoc bodies, so nested cases may produce slightly off skeletons. Acceptable for drift detection (regex is permissive about extra OUTSIDE content).
Pure function: no I/O, no allocations beyond the output builder. Byte-oriented (matches bash semantics); content bytes pass through unchanged whether dropped or copied, so multi-byte UTF-8 in quoted/heredoc bodies is handled implicitly.
Types ¶
This section is empty.