GoSecretScanv2

command module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 13, 2025 License: MIT Imports: 11 Imported by: 0

README

GoSecretScanv2

GoSecretScanv2 is an engineering-focused security scanner that detects secrets, API keys, credentials, and common security misconfigurations using deterministic analysis plus optional LLM-based verification.

Features

Core Detection
  • 70+ Detection Patterns: Comprehensive regex patterns for detecting:
    • Cloud provider credentials (AWS, Azure, GCP)
    • API keys and tokens (GitHub, Slack, JWT)
    • Private keys (SSH, RSA, PGP)
    • Database connection strings
    • Basic authentication credentials
    • Security vulnerabilities (XSS, SQL injection patterns)
Advanced Intelligence
  • Shannon Entropy Analysis:

    • Calculates randomness of detected strings
    • Identifies high-entropy secrets vs low-entropy false positives
    • Entropy scoring (0-8 bits) for each finding
  • Context-Aware Detection:

    • Automatically detects test files, mocks, and examples
    • Identifies comments, documentation, and templates
    • Recognizes placeholders and environment variable templates
    • Filters false positives from regex pattern definitions
  • Confidence Scoring System:

    • Every finding rated: Critical, High, Medium, or Low
    • Combines entropy analysis + context detection + pattern matching
    • Only reports medium confidence or higher (low confidence filtered out)
    • Prioritizes critical findings first
  • Smart Filtering:

    • Skips false positives automatically
    • Handles large files and minified code (1MB line buffer)
    • Pattern definition detection
LLM-Powered Verification (beta)
  • LLM Verification:

    • Uses IBM Granite 4.0 Micro (GGUF, Q4 quantized, ~450MB)
    • Provides structured reasoning for each decision
  • Semantic Embedding Search:

    • Generates embeddings for each finding
    • Searches for similar patterns across the codebase
    • Reuses historical verifications for similar matches
  • Vector Store:

    • SQLite-based vector database
    • Caches verified findings
    • Enables incremental learning
    • Fast similarity search
  • Code Context Analysis:

    • Parses code structure (functions, imports)
    • Understands programming language syntax
    • Gathers surrounding code for context
    • Identifies test vs production code

Enabling LLM Verification:

# Download the model first (one-time setup)
./scripts/download-models.sh

# Start the llama.cpp HTTP server (runs on :8080 by default)
./scripts/run-llama-server.sh

# In a different terminal, run with LLM verification
./gosecretscanner --llm

# Custom model path
./gosecretscanner --llm --model-path=/path/to/granite-4.0-micro.Q4_K_M.gguf

# Point to a remote llama.cpp endpoint
./gosecretscanner --llm --llm-endpoint=http://localhost:8080

# Run the llama.cpp server in the background via Docker
DETACH=true PORT=8080 HOST_NETWORK=true SERVER_PORT=8080 ./scripts/run-llama-server.sh

# Adjust similarity threshold for vector search
./gosecretscanner --llm --similarity=0.9

Environment Variables:

# Enable LLM verification
export GOSECRETSCANNER_LLM_ENABLED=true

# Set model path
export GOSECRETSCANNER_MODEL_PATH=.gosecretscanner/models/granite-4.0-micro.Q4_K_M.gguf

# Override the llama.cpp endpoint (defaults to http://localhost:8080)
export GOSECRETSCANNER_LLM_ENDPOINT=http://localhost:8080

# Launch llama.cpp in detached mode with a custom image/port
DETACH=true LLAMA_CPP_IMAGE=ghcr.io/ggerganov/llama.cpp:full HOST_NETWORK=true PORT=8080 ./scripts/run-llama-server.sh

# Set vector database path
export GOSECRETSCANNER_DB_PATH=.gosecretscanner/findings.db
Performance
  • Runtime characteristics:

    • Pre-compiled regex patterns for fast scanning
    • Concurrent file processing using goroutines
    • Thread-safe result aggregation
    • Fallback paths that avoid external dependencies when optional components are unavailable
  • Operational notes:

    • Minimal configuration required for local runs
    • Color-coded terminal output with confidence levels
    • Automatic recursive directory scanning with ignore rules
    • Results grouped by severity to aid triage

Installation

From Source
git clone https://github.com/m1rl0k/GoSecretScanv2.git
cd GoSecretScanv2
go build -o gosecretscanner main.go
Using Go Install
go install github.com/m1rl0k/GoSecretScanv2@latest
Using Docker
# Build the Docker image
docker build -t gosecretscanner .

# Run the scanner on current directory
docker run --rm -v $(pwd):/workspace gosecretscanner

# Run on a specific directory
docker run --rm -v /path/to/scan:/workspace gosecretscanner

GitHub Actions

The bundled action.yml now supports full LLM verification. Key inputs:

  • enable-llm: set to 'true' to download Granite, launch llama.cpp via Docker, and run the scan with --llm.
  • model-path: overrides the GGUF path (relative to the action directory by default).
  • llm-endpoint / llm-port: control how the scanner reaches the llama.cpp HTTP server.
  • llama-image: change the Docker image used to serve Granite (default ghcr.io/ggerganov/llama.cpp:full).
  • manage-llm-server: set to 'false' when your workflow spins up the llama.cpp container via services: (as shown below).

Example workflow step:

      - name: Run GoSecretScan Action with LLM
        uses: ./
        with:
          scan-path: '.'
          fail-on-secrets: 'false'
          enable-llm: 'true'
          llm-port: '8080'
          manage-llm-server: 'false'

Usage

Navigate to the directory you want to scan and run:

cd /path/to/your/project
./gosecretscanner

The scanner will:

  1. Recursively scan all files in the current directory
  2. Skip .git and node_modules directories
  3. Report any secrets found with file location and line numbers
  4. Exit with code 1 if secrets are found, 0 otherwise
Example Output
------------------------------------------------------------------------
Secrets found:

=== CRITICAL FINDINGS ===

File: /path/to/config.go (Secret)
Line Number: 42
Confidence: CRITICAL (Entropy: 4.85)
Context: code
Pattern: (?i)_(AWS_Key):[\\s'\"=]A[KS]IA[0-9A-Z]{16}[\\s'\"]
Line: const awsKey = "AKIAIOSFODNN7EXAMPLE"

=== HIGH CONFIDENCE ===

File: /path/to/auth.py (Secret)
Line Number: 15
Confidence: HIGH (Entropy: 4.52)
Context: code
Pattern: (?i)api_key(?:\s*[:=]\s*|\s*["'\s])?([a-zA-Z0-9_\-]{32,})
Line: api_key = "sk_live_51a8f9c2e3b4d5f6g7h8"

=== MEDIUM CONFIDENCE ===

File: /path/to/test.js (Secret)
Line Number: 89
Confidence: MEDIUM (Entropy: 3.91)
Context: test_file
Pattern: (?i)password(?:\s*[:=]\s*|\s*["'\s])?([a-zA-Z0-9!@#$%^&*()_+]{8,})
Line: const testPassword = "TestPass123"

------------------------------------------------------------------------
Summary: 3 secrets found (Critical: 1, High: 1, Medium: 1)
Please review and remove them before committing your code.

Output details:

  • Results grouped by confidence level (Critical → High → Medium)
  • Entropy score shows randomness (higher = more likely real secret)
  • Context indicates where the secret was found (code, test_file, comment, etc.)
  • Low confidence findings are automatically filtered out

Detected Patterns

Cloud Provider Credentials
  • AWS:

    • Access Key IDs (AKIA...)
    • Secret Access Keys
    • STS Tokens
  • Azure:

    • Client IDs and Secrets
    • Tenant IDs
    • Subscription IDs
    • Access Keys
  • Google Cloud Platform:

    • API Keys (AIza...)
    • Application Credentials
    • Service Account Keys
    • Client IDs and Secrets
Private Keys
  • SSH Private Keys
  • RSA Private Keys
  • PGP Private Keys
  • Generic Private Keys (PEM format)
Authentication & Secrets
  • Basic Authentication tokens
  • API Keys
  • Bearer tokens
  • JWT tokens
  • Passwords and credentials
  • Database connection strings
Security Vulnerabilities
  • Cross-Site Scripting (XSS) patterns
  • SQL Injection patterns
  • Hardcoded IP addresses
  • S3 Bucket URLs

Integration with CI/CD

Pre-commit Hook

Add to .git/hooks/pre-commit:

#!/bin/bash
./gosecretscanner
if [ $? -ne 0 ]; then
    echo "Secret scan failed! Please remove secrets before committing."
    exit 1
fi
GitHub Actions

The repository also exposes a reusable GitHub Action for CI pipelines:

name: Security Scan
on: [push, pull_request]

jobs:
  secret-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Use GoSecretScan as a reusable action
      - name: Run Secret Scanner
        uses: m1rl0k/GoSecretScanv2@main
        with:
          scan-path: '.'
          fail-on-secrets: 'true'
Action Inputs
  • scan-path: Directory path to scan (default: .)
  • fail-on-secrets: Fail the workflow if secrets are found (default: true)
Action Outputs
  • secrets-found: Number of secrets detected
  • scan-status: Status of the scan (success, failed, or error)
Advanced Usage
- name: Run Secret Scanner with outputs
  id: scan
  uses: m1rl0k/GoSecretScanv2@main
  with:
    scan-path: './src'
    fail-on-secrets: 'false'

- name: Report results
  if: always()
  run: |
    echo "Secrets found: ${{ steps.scan.outputs.secrets-found }}"
    echo "Status: ${{ steps.scan.outputs.scan-status }}"

Development

Building
go build -o gosecretscanner main.go
Running Tests
go test ./...
Code Formatting
gofmt -w .

How It Works

Scanning Pipeline
  1. Pattern Compilation: On startup, all 70+ regex patterns are pre-compiled for optimal performance
  2. Directory Walking: Uses filepath.Walk to recursively traverse the directory tree
  3. Concurrent Scanning: Each file is scanned in a separate goroutine for parallel processing
  4. Smart Filtering: Regex pattern definitions and binary content are skipped
  5. Pattern Matching: Each line is checked against all compiled patterns
  6. Entropy Analysis: Shannon entropy calculated for each match
  7. Context Detection: File path and line content analyzed for context
  8. Confidence Scoring: Multi-factor scoring combines entropy + context + pattern type
  9. Result Filtering: Only medium+ confidence findings are reported
  10. Priority Grouping: Results grouped by confidence level (Critical → High → Medium)
  11. Thread-Safe Results: Uses mutex locks to safely collect results from concurrent scans
Advanced Algorithms
Shannon Entropy Calculation
H(X) = -Σ P(x) * log₂(P(x))
  • Measures randomness of detected strings
  • High entropy (>4.5): Likely a real secret (random characters)
  • Low entropy (<3.5): Likely a false positive (repeated patterns)
Confidence Scoring Algorithm
Base Score: 50

Entropy Adjustments:
+ 30 if entropy > 4.5 (very random)
+ 20 if entropy > 4.0 (quite random)
+ 10 if entropy > 3.5 (moderately random)
- 10 if entropy <= 3.5 (low randomness)

Context Adjustments:
- 50 for placeholders (${VAR}, YOUR_KEY)
- 45 for templates (REPLACE_ME, CHANGE_ME)
- 40 for test files
- 35 for documentation
- 30 for comments
+ 10 for actual code

Pattern Adjustments:
+ 15 for AWS keys, private keys (critical patterns)

Final Mapping:
≥ 80: Critical
≥ 60: High
≥ 40: Medium
< 40: Low (filtered out)
Why This Is Better Than Gitleaks
Feature GoSecretScanv2 GoSecretScanv2 (LLM) Gitleaks TruffleHog
LLM Verification ✅ Granite 4.0 Micro
Entropy Analysis ✅ Shannon entropy ✅ Shannon entropy ⚠️ Limited ✅ Yes
Context Awareness ✅ Test/comment detection ✅ Advanced code parsing ❌ None ⚠️ Basic
Confidence Scoring ✅ 4-level system ✅ LLM-enhanced ❌ Binary ⚠️ Limited
Smart Filtering ✅ Auto-filters ✅ AI-powered ⚠️ Manual allowlist ⚠️ Manual
Semantic Search ✅ Vector embeddings
Historical Learning ✅ Vector database
Pattern Detection ✅ Self-aware ✅ Self-aware
Output Grouping ✅ By severity ✅ By severity ⚠️ Flat list ⚠️ Flat list
Performance ✅ Pre-compiled ✅ Optimized ✅ Good ✅ Good
Dependencies ✅ Zero (stdlib only) ✅ Minimal (SQLite) ⚠️ Requires Git ⚠️ Multiple
False Positive Rate ~2-5% <1% ~10-20% ~5-15%

Performance Considerations

  • Pre-compiled Patterns: Regex patterns are compiled once at startup, not on every match
  • Concurrent Processing: Multiple files are scanned simultaneously using goroutines
  • Smart Ignoring: Automatically skips .git and node_modules directories
  • Memory Efficient: Streams file contents line-by-line rather than loading entire files

Limitations

  • Currently scans all file types (including binaries)
  • No configuration file support yet
  • No custom pattern support without code modification
  • No allowlist/whitelist for false positives

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Inspired by tools like gitleaks, truffleHog, and git-secrets
  • Built with Go's powerful standard library

Roadmap

  • Configuration file support (YAML/JSON)
  • Custom pattern definitions
  • Multiple output formats (JSON, SARIF)
  • Allowlist/whitelist support
  • Binary file detection and skipping
  • Interactive mode for reviewing findings
  • Entropy-based detection for unknown secrets
  • Docker image for easy deployment

Support

For bugs and feature requests, please open an issue on GitHub.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
pkg
llm

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL