shellparse

package
v0.1.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 4, 2026 License: MIT Imports: 1 Imported by: 0

Documentation

Overview

Package shellparse provides shell-quoting-aware preprocessing for hooks that pattern-match Bash command strings.

Why this exists: any hook that runs regex over a Bash command line — auto-RAG drift detection, command-allowlist gates, taint trackers — must skip content inside heredoc bodies, single-quoted strings, and double-quoted strings. The shell sees those regions as literal text; a naïve regex sees command separators (`|`, `;`, `&`) and command verbs inside them and fires false positives. This was a real failure mode discovered during the companion project's auto-RAG dogfood 2026-05-06/07: drift verbs in git commit-message heredoc bodies tripped the guard twice during commit authoring.

Origin: ported from the companion project v0.3 stable (a companion project's shell-strip.sh, commits b0c7ee4 + 0ee2f89, 2026-05-07). The bash reference is the load-bearing implementation; this Go port is a translation, not a reinterpretation.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func StripCommentsAndBlanks added in v0.1.9

func StripCommentsAndBlanks(script string) string

StripCommentsAndBlanks returns the behavioral skeleton of a shell SCRIPT: full-line comments and blank lines removed, while heredoc bodies and multi-line quoted-string contents are preserved verbatim. Two scripts with equal skeletons differ only in comments or blank lines — i.e. they behave identically.

Hook-drift detection uses this so a comment-only divergence (e.g. an installed script that kept richer real-name annotations than the sanitized canonical) is NOT reported as drift, while a real CODE change still is. It is the script-level analogue of StripShellQuoting (which is command-string-level and drops the opposite regions: quoted/heredoc content, keeping comments).

Scope (intentional): only FULL-LINE comments are stripped — a line whose first non-blank byte is `#`, excluding a line-1 `#!` shebang (which selects the interpreter and so is behavioral; a `#!` on any later line is an ordinary comment and is dropped). Trailing/inline comments (`cmd # note`) are KEPT: stripping them needs word-boundary analysis and would risk false negatives, whereas keeping them at worst reports a harmless trailing-comment-only drift. A `#`-leading line inside a heredoc body or a multi-line quoted string is literal content, not a comment, so it is preserved; quote and heredoc state is tracked across lines with the same lexical model as StripShellQuoting.

func StripShellQuoting

func StripShellQuoting(cmd string) string

StripShellQuoting returns the OUTSIDE skeleton of a Bash command string with heredoc bodies and quoted regions removed. Used by drift-detection regexes that must not match command separators or verbs that appear inside literal-quoted text.

Coverage (intentional, v0.3):

  • heredoc starts: <<MARKER, <<-MARKER, <<'MARKER', <<"MARKER", <<-'MARKER', <<-"MARKER". The <<- variant strips leading tabs (only tabs, not spaces, per shell semantics) when matching the close marker.
  • single-quoted strings ('...'): no escapes per shell semantics; close on next '.
  • double-quoted strings ("..."): \X is an escape (so \" doesn't close); close on unescaped ".

Coverage NOT included (deferred TODOs from the companion project v0.3 spec):

  • here-strings (<<<): body is single token, low false-positive risk.
  • nested same-marker heredocs: exotic shell construct; not observed in the companion project workflow.
  • heredocs inside $(...) command substitution: this preprocessor is line-oriented for heredoc bodies, so nested cases may produce slightly off skeletons. Acceptable for drift detection (regex is permissive about extra OUTSIDE content).

Pure function: no I/O, no allocations beyond the output builder. Byte-oriented (matches bash semantics); content bytes pass through unchanged whether dropped or copied, so multi-byte UTF-8 in quoted/heredoc bodies is handled implicitly.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL