semanticfw

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 11, 2026 License: MIT Imports: 17 Imported by: 0

README

Semantic Firewall

Detect logic corruption that bypasses code reviews.

Go Reference

Semantic Firewall generates deterministic fingerprints of your Go code's behavior, not its bytes. Rename variables, refactor loops, extract helpers—the fingerprint stays the same. Change the actual logic? The fingerprint changes instantly.


Quick Start

# Install
go install github.com/BlackVectorOps/semantic_firewall/cmd/sfw@latest

# Fingerprint a file
sfw check ./main.go

Output:

{
  "file": "./main.go",
  "functions": [
    {
      "function": "main",
      "fingerprint": "005efb52a8c9d1e3f4b6..."
    }
  ]
}

Why Use This?

Traditional Hashing Semantic Firewall
key := rand() → Hash A key := rand() → Hash A
entropy := rand()Hash B entropy := rand()Hash A
Rename breaks the hash Rename preserves the hash

Use cases:

  • 🔒 Supply chain security — Detect backdoors like the xz attack that pass code review
  • 🔄 Safe refactoring — Prove your refactor didn't change behavior
  • 🤖 CI/CD gates — Block PRs that alter critical function logic

GitHub Action

Add semantic fingerprinting to your CI pipeline:

- uses: BlackVectorOps/semantic_firewall@v1
  with:
    path: ./pkg/critical/

See action.yml for configuration options.


How It Works

  1. Parse — Load Go source into SSA (Static Single Assignment) form
  2. Canonicalize — Normalize variable names, branch ordering, loop structures
  3. Fingerprint — SHA-256 hash of the canonical IR

The result: semantically equivalent code produces identical fingerprints.


Library Usage

import semanticfw "github.com/BlackVectorOps/semantic_firewall"

src := `package main
func Add(a, b int) int { return a + b }
`

results, err := semanticfw.FingerprintSource("example.go", src, semanticfw.DefaultLiteralPolicy)
if err != nil {
    log.Fatal(err)
}

for _, r := range results {
    fmt.Printf("%s: %s\n", r.FunctionName, r.Fingerprint)
}

Technical Deep Dive

Click to expand: Architecture & Theory
Abstract

Modern software supply chain security relies heavily on cryptographic signatures that verify provenance (who signed it) but fail to verify intent (what the code actually does). This fragility allows malicious actors to introduce subtle logic corruption that bypasses traditional diff reviews and signature checks. This paper introduces the Semantic Attestation Authority (SAA), a framework that utilizes Static Single Assignment (SSA) canonicalization and Scalar Evolution (SCEV) analysis to generate deterministic fingerprints of software logic. We demonstrate that this method can mathematically attest to the semantic equivalence of refactored code while detecting logic corruption, effectively decoupling software identity from its syntactic representation.

1. Introduction: The Limits of Syntactic Verification

Current integrity mechanisms (e.g., GPG, Sigstore) operate strictly at the byte level. If a developer changes a variable name from key to entropy, the binary hash changes entirely. This fragility means that "security" is often synonymous with "bit-perfect reproduction." This is insufficient for detecting subtle logic tampering—such as the xz backdoor—where the syntax is valid, the signature is valid, but the semantics are malicious.

This paper proposes a shift from Syntactic Integrity to Semantic Integrity, defined as:

The property whereby two programs are considered identical if and only if their control flow graphs and data dependencies produce the same side effects, regardless of register allocation, variable naming, or loop structure.

2. Architecture of the Semantic Firewall

The Semantic Firewall operates on a three-stage pipeline designed to distill raw source code into a canonical Intermediate Representation (IR). By operating on the SSA graph rather than the AST, we eliminate syntactic noise early in the pipeline.

2.1 The Canonicalization Engine

The core of the system is a deterministic transformation engine (canonicalizer.go) that normalizes Go source code.

  • Virtual Control Flow: We utilize a virtualized representation of basic blocks to enforce deterministic ordering of independent branches. This mitigates non-determinism in compiler block ordering without mutating the underlying SSA graph, preserving thread safety during analysis.
  • Register Renaming: All SSA values are mapped to canonical names (e.g., v0, v1, p0) based on topological order. This eliminates noise from developer naming choices, ensuring that func(a int) and func(b int) produce identical IR.
  • Instruction Normalization: Operations are standardized to handle commutativity. Binary operations like ADD and MUL are sorted by the hash weight of their operands, ensuring $a + b$ fingerprints identically to $b + a$.
2.2 Scalar Evolution (SCEV) Analysis

To handle loop variance (e.g., for i := 0; i < n vs for range), we implement a Scalar Evolution analysis engine (scev.go) capable of solving loop trip counts symbolically.

  • Induction Variable Detection: The engine identifies loops and classifies induction variables into basic Add Recurrences: ${Start, +, Step}$.
  • Trip Count Derivation: We statically compute loop trip counts using ceiling division formulas (e.g., $\lceil(Diff + Step - 1) / Step\rceil$). This allows the system to verify that two loops iterate the same number of times regardless of their increment strategy (e.g., i++ vs i+=2).
  • Loop Invariant Code Motion: Invariant calls (such as len(s) inside a loop) are virtually hoisted to the pre-header, ensuring that optimization levels or manual hoisting do not alter the fingerprint.

3. Security & Determinism

To prevent the Attestation Authority itself from becoming an attack vector, strictly enforced defensive measures are integrated into the core pipeline.

3.1 Cycle Detection & DoS Prevention

Recursive analysis of logic graphs creates a risk of Stack Overflow Denial of Service (DoS) attacks via malformed cyclic graphs. I have implemented a robust renamer that detects recursion cycles during stringification (stack[v]), ensuring the analysis terminates even when processing hostile, self-referential code structures.

3.2 IR Injection Prevention

A unique class of vulnerabilities involves injecting fake IR instructions via string literals or struct tags. The Semantic Firewall sanitizes all type definitions and string constants (using quoted literals %q), preventing attackers from "breaking out" of the data layer to inject malicious control flow instructions into the canonical output.

3.3 Logic Inversion Protection

When normalizing control flow (e.g., converting a >= b to a < b), strict type checking is enforced. We limit virtual branch swapping to integers and strings. This prevents semantic corruption in floating-point operations where $NaN$ behavior makes standard inversion unsafe due to unordered comparison rules (i.e., !(a < b) does not imply a >= b if a is NaN).

4. Case Study: Semantic Attestation

The framework's capability is verified using a controlled reference implementation of a sensitive data wipe function.

  1. Reference Implementation: The "Golden Logic."
  2. Refactored Implementation: Variables renamed (key $\rightarrow$ entropy), loops altered (range $\rightarrow$ index), and helper functions extracted.
  3. Compromised Implementation: Data wipe logic removed, but the function signature and control flow structure were superficially maintained.

Results:

  • The Refactored version produced a hash identical to the Reference version: 005efb52....
  • The Compromised version produced a divergent hash: 82281950....

This confirms the system successfully decoupled syntax from semantics, allowing for automated acceptance of safe refactors while instantaneously flagging genuine logic tampering.

5. Conclusion

The Semantic Attestation Authority provides a necessary layer of verification above standard cryptographic signing. By fingerprinting the behavior of code rather than its bytes, organizations can automate the acceptance of non-functional refactors while creating a robust "Semantic Firewall" for the software supply chain.


License

MIT License — See LICENSE for details.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultLiteralPolicy = LiteralPolicy{
	AbstractControlFlowComparisons: true,
	KeepSmallIntegerIndices:        true,
	KeepReturnStatusValues:         true,

	KeepStringLiterals: false,
	SmallIntMin:        -16,
	SmallIntMax:        16,
	AbstractOtherTypes: true,
}

DefaultLiteralPolicy represents the standard policy for fingerprinting; it preserves small integers used for indexing and status codes while masking magic numbers and large constants.

View Source
var KeepAllLiteralsPolicy = LiteralPolicy{
	AbstractControlFlowComparisons: false,
	KeepSmallIntegerIndices:        true,
	KeepReturnStatusValues:         true,
	KeepStringLiterals:             true,
	SmallIntMin:                    math.MinInt64,
	SmallIntMax:                    math.MaxInt64,
	AbstractOtherTypes:             false,
}

KeepAllLiteralsPolicy is designed for testing or exact matching by disabling most abstractions and expanding the "small" integer range to the full int64 spectrum.

Functions

func AnalyzeSCEV

func AnalyzeSCEV(info *LoopInfo)

AnalyzeSCEV is the main entry point for SCEV analysis on a LoopInfo.

func BuildSSAFromPackages

func BuildSSAFromPackages(initialPkgs []*packages.Package) (*ssa.Program, *ssa.Package, error)

Constructs the Static Single Assignment form from loaded Go packages. Provides the complete program and the target package for analysis.

func ReleaseCanonicalizer

func ReleaseCanonicalizer(c *Canonicalizer)

Types

type Canonicalizer

type Canonicalizer struct {
	Policy     LiteralPolicy
	StrictMode bool
	// contains filtered or unexported fields
}

Canonicalizer transforms an SSA function into a deterministic string representation.

func AcquireCanonicalizer

func AcquireCanonicalizer(policy LiteralPolicy) *Canonicalizer

func NewCanonicalizer

func NewCanonicalizer(policy LiteralPolicy) *Canonicalizer

func (*Canonicalizer) ApplyVirtualControlFlowFromState

func (c *Canonicalizer) ApplyVirtualControlFlowFromState(swappedBlocks map[*ssa.BasicBlock]bool, virtualBinOps map[*ssa.BinOp]token.Token)

func (*Canonicalizer) CanonicalizeFunction

func (c *Canonicalizer) CanonicalizeFunction(fn *ssa.Function) string

type FingerprintResult

type FingerprintResult struct {
	FunctionName string
	Fingerprint  string
	CanonicalIR  string
	Pos          token.Pos
	Line         int
	Filename     string
	// contains filtered or unexported fields
}

Encapsulates the output of the semantic fingerprinting process for a function.

func FingerprintPackages

func FingerprintPackages(initialPkgs []*packages.Package, policy LiteralPolicy, strictMode bool) ([]FingerprintResult, error)

FingerprintPackages iterates over loaded packages to construct SSA and generate results.

func FingerprintSource

func FingerprintSource(filename string, src string, policy LiteralPolicy) ([]FingerprintResult, error)

FingerprintSource analyzes a single Go source file provided as a string. This is the primary entry point for verifying code snippets or patch hunks.

func FingerprintSourceAdvanced

func FingerprintSourceAdvanced(filename string, src string, policy LiteralPolicy, strictMode bool) ([]FingerprintResult, error)

FingerprintSourceAdvanced provides an extended interface for source analysis with strict mode control.

func GenerateFingerprint

func GenerateFingerprint(fn *ssa.Function, policy LiteralPolicy, strictMode bool) FingerprintResult

GenerateFingerprint generates the hash and canonical string representation for an SSA function. This function uses a pooled Canonicalizer to ensure high throughput and low allocation overhead.

type IVType

type IVType int
const (
	IVTypeUnknown    IVType = iota
	IVTypeBasic             // {S, +, C}
	IVTypeDerived           // Affine: A * IV + B
	IVTypeGeometric         // {S, *, C}
	IVTypePolynomial        // Step is another IV
)

type InductionVariable

type InductionVariable struct {
	Phi   *ssa.Phi
	Type  IVType
	Start SCEV // Value at iteration 0
	Step  SCEV // Update stride
}

InductionVariable describes a detected IV. Reference: Section 3.2 Classification Taxonomy.

type LiteralPolicy

type LiteralPolicy struct {
	AbstractControlFlowComparisons bool
	KeepSmallIntegerIndices        bool
	KeepReturnStatusValues         bool
	// FIX: Added flag to keep string literals.
	KeepStringLiterals bool
	SmallIntMin        int64
	SmallIntMax        int64
	AbstractOtherTypes bool
}

LiteralPolicy defines the configurable strategy for determining which literal values should be abstracted into placeholders during canonicalization. It allows fine grained control over integer abstraction in different contexts.

func (*LiteralPolicy) ShouldAbstract

func (p *LiteralPolicy) ShouldAbstract(c *ssa.Const, usageContext ssa.Instruction) bool

decides whether a given constant should be replaced by a generic placeholder. It analyzes the constant's type, value, and immediate usage context in the SSA graph.

type Loop

type Loop struct {
	Header *ssa.BasicBlock
	Latch  *ssa.BasicBlock // Primary source of the backedge

	// Blocks contains all basic blocks within the loop body.
	Blocks map[*ssa.BasicBlock]bool
	// Exits contains blocks inside the loop that have successors outside.
	Exits []*ssa.BasicBlock

	// Hierarchy
	Parent   *Loop
	Children []*Loop

	// Semantic Analysis (populated in scev.go)
	Inductions map[*ssa.Phi]*InductionVariable
	TripCount  SCEV // Symbolic expression
}

Loop represents a natural loop in the SSA graph. Reference: Section 2.3 Natural Loops.

func (*Loop) String

func (l *Loop) String() string

type LoopInfo

type LoopInfo struct {
	Function *ssa.Function
	Loops    []*Loop // Top-level loops (roots of the hierarchy)
	// Map from Header block to Loop object for O(1) lookup
	LoopMap map[*ssa.BasicBlock]*Loop
}

LoopInfo summarizes loop analysis for a single function.

func DetectLoops

func DetectLoops(fn *ssa.Function) *LoopInfo

DetectLoops reconstructs the loop hierarchy using dominance relations. Reference: Section 2.3.1 Algorithm: Detecting Natural Loops.

type Renamer

type Renamer func(ssa.Value) string

Renamer is a function that maps an SSA value to its canonical name. This is used to ensure deterministic output regardless of SSA register naming.

type SCEV

type SCEV interface {
	ssa.Value
	EvaluateAt(k *big.Int) *big.Int
	IsLoopInvariant(loop *Loop) bool
	String() string
	// StringWithRenamer returns a canonical string using the provided renamer
	// function to map SSA values to their canonical names (e.g., v0, v1).
	// This is critical for determinism: without it, raw SSA names (t0, t1)
	// would leak into fingerprints, breaking semantic equivalence.
	StringWithRenamer(r Renamer) string
}

SCEV represents a scalar expression.

type SCEVAddRec

type SCEVAddRec struct {
	Start SCEV
	Step  SCEV
	Loop  *Loop
}

SCEVAddRec represents an Add Recurrence: {Start, +, Step}_L Reference: Section 4.1 The Add Recurrence Abstraction.

func (*SCEVAddRec) EvaluateAt

func (s *SCEVAddRec) EvaluateAt(k *big.Int) *big.Int

func (*SCEVAddRec) IsLoopInvariant

func (s *SCEVAddRec) IsLoopInvariant(loop *Loop) bool

func (*SCEVAddRec) Name

func (s *SCEVAddRec) Name() string

ssa.Value Stubs

func (*SCEVAddRec) Parent

func (s *SCEVAddRec) Parent() *ssa.Function

func (*SCEVAddRec) Pos

func (s *SCEVAddRec) Pos() token.Pos

func (*SCEVAddRec) Referrers

func (s *SCEVAddRec) Referrers() *[]ssa.Instruction

func (*SCEVAddRec) String

func (s *SCEVAddRec) String() string

func (*SCEVAddRec) StringWithRenamer

func (s *SCEVAddRec) StringWithRenamer(r Renamer) string

func (*SCEVAddRec) Type

func (s *SCEVAddRec) Type() types.Type

type SCEVConstant

type SCEVConstant struct {
	Value *big.Int
}

SCEVConstant represents a literal integer constant.

func SCEVFromConst

func SCEVFromConst(c *ssa.Const) *SCEVConstant

func (*SCEVConstant) EvaluateAt

func (s *SCEVConstant) EvaluateAt(k *big.Int) *big.Int

func (*SCEVConstant) IsLoopInvariant

func (s *SCEVConstant) IsLoopInvariant(loop *Loop) bool

func (*SCEVConstant) Name

func (s *SCEVConstant) Name() string

ssa.Value Stubs

func (*SCEVConstant) Parent

func (s *SCEVConstant) Parent() *ssa.Function

func (*SCEVConstant) Pos

func (s *SCEVConstant) Pos() token.Pos

func (*SCEVConstant) Referrers

func (s *SCEVConstant) Referrers() *[]ssa.Instruction

func (*SCEVConstant) String

func (s *SCEVConstant) String() string

func (*SCEVConstant) StringWithRenamer

func (s *SCEVConstant) StringWithRenamer(r Renamer) string

func (*SCEVConstant) Type

func (s *SCEVConstant) Type() types.Type

type SCEVGenericExpr

type SCEVGenericExpr struct {
	Op token.Token
	X  SCEV
	Y  SCEV
}

SCEVGenericExpr represents binary operations like Add/Mul for formulas.

func (*SCEVGenericExpr) EvaluateAt

func (s *SCEVGenericExpr) EvaluateAt(k *big.Int) *big.Int

func (*SCEVGenericExpr) IsLoopInvariant

func (s *SCEVGenericExpr) IsLoopInvariant(loop *Loop) bool

func (*SCEVGenericExpr) Name

func (s *SCEVGenericExpr) Name() string

ssa.Value Stubs

func (*SCEVGenericExpr) Parent

func (s *SCEVGenericExpr) Parent() *ssa.Function

func (*SCEVGenericExpr) Pos

func (s *SCEVGenericExpr) Pos() token.Pos

func (*SCEVGenericExpr) Referrers

func (s *SCEVGenericExpr) Referrers() *[]ssa.Instruction

func (*SCEVGenericExpr) String

func (s *SCEVGenericExpr) String() string

func (*SCEVGenericExpr) StringWithRenamer

func (s *SCEVGenericExpr) StringWithRenamer(r Renamer) string

func (*SCEVGenericExpr) Type

func (s *SCEVGenericExpr) Type() types.Type

type SCEVUnknown

type SCEVUnknown struct {
	Value       ssa.Value
	IsInvariant bool // Explicitly tracks invariance relative to the analysis loop scope
}

SCEVUnknown represents a symbolic value (e.g., parameter or unanalyzable instr).

func (*SCEVUnknown) EvaluateAt

func (s *SCEVUnknown) EvaluateAt(k *big.Int) *big.Int

func (*SCEVUnknown) IsLoopInvariant

func (s *SCEVUnknown) IsLoopInvariant(loop *Loop) bool

func (*SCEVUnknown) Name

func (s *SCEVUnknown) Name() string

ssa.Value Stubs

func (*SCEVUnknown) Parent

func (s *SCEVUnknown) Parent() *ssa.Function

func (*SCEVUnknown) Pos

func (s *SCEVUnknown) Pos() token.Pos

func (*SCEVUnknown) Referrers

func (s *SCEVUnknown) Referrers() *[]ssa.Instruction

func (*SCEVUnknown) String

func (s *SCEVUnknown) String() string

func (*SCEVUnknown) StringWithRenamer

func (s *SCEVUnknown) StringWithRenamer(r Renamer) string

func (*SCEVUnknown) Type

func (s *SCEVUnknown) Type() types.Type

Directories

Path Synopsis
cmd
sfw command
Package main provides the sfw CLI tool for semantic fingerprinting of Go source files.
Package main provides the sfw CLI tool for semantic fingerprinting of Go source files.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL