GoSecretScanv2

command module

v1.0.0 Latest Latest Go to latest Published: Nov 13, 2025 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/m1rl0k/GoSecretScanv2

Links

Open Source Insights

README ¶

GoSecretScanv2

GoSecretScanv2 is an engineering-focused security scanner that detects secrets, API keys, credentials, and common security misconfigurations using deterministic analysis plus optional LLM-based verification.

Features

Core Detection

70+ Detection Patterns: Comprehensive regex patterns for detecting:
- Cloud provider credentials (AWS, Azure, GCP)
- API keys and tokens (GitHub, Slack, JWT)
- Private keys (SSH, RSA, PGP)
- Database connection strings
- Basic authentication credentials
- Security vulnerabilities (XSS, SQL injection patterns)

Advanced Intelligence

Shannon Entropy Analysis:
- Calculates randomness of detected strings
- Identifies high-entropy secrets vs low-entropy false positives
- Entropy scoring (0-8 bits) for each finding
Context-Aware Detection:
- Automatically detects test files, mocks, and examples
- Identifies comments, documentation, and templates
- Recognizes placeholders and environment variable templates
- Filters false positives from regex pattern definitions
Confidence Scoring System:
- Every finding rated: Critical, High, Medium, or Low
- Combines entropy analysis + context detection + pattern matching
- Only reports medium confidence or higher (low confidence filtered out)
- Prioritizes critical findings first
Smart Filtering:
- Skips false positives automatically
- Handles large files and minified code (1MB line buffer)
- Pattern definition detection

LLM-Powered Verification (beta)

LLM Verification:
- Uses IBM Granite 4.0 Micro (GGUF, Q4 quantized, ~450MB)
- Provides structured reasoning for each decision
Semantic Embedding Search:
- Generates embeddings for each finding
- Searches for similar patterns across the codebase
- Reuses historical verifications for similar matches
Vector Store:
- SQLite-based vector database
- Caches verified findings
- Enables incremental learning
- Fast similarity search
Code Context Analysis:
- Parses code structure (functions, imports)
- Understands programming language syntax
- Gathers surrounding code for context
- Identifies test vs production code

Enabling LLM Verification:

# Download the model first (one-time setup)
./scripts/download-models.sh

# Start the llama.cpp HTTP server (runs on :8080 by default)
./scripts/run-llama-server.sh

# In a different terminal, run with LLM verification
./gosecretscanner --llm

# Custom model path
./gosecretscanner --llm --model-path=/path/to/granite-4.0-micro.Q4_K_M.gguf

# Point to a remote llama.cpp endpoint
./gosecretscanner --llm --llm-endpoint=http://localhost:8080

# Run the llama.cpp server in the background via Docker
DETACH=true PORT=8080 HOST_NETWORK=true SERVER_PORT=8080 ./scripts/run-llama-server.sh

# Adjust similarity threshold for vector search
./gosecretscanner --llm --similarity=0.9

Environment Variables:

# Enable LLM verification
export GOSECRETSCANNER_LLM_ENABLED=true

# Set model path
export GOSECRETSCANNER_MODEL_PATH=.gosecretscanner/models/granite-4.0-micro.Q4_K_M.gguf

# Override the llama.cpp endpoint (defaults to http://localhost:8080)
export GOSECRETSCANNER_LLM_ENDPOINT=http://localhost:8080

# Launch llama.cpp in detached mode with a custom image/port
DETACH=true LLAMA_CPP_IMAGE=ghcr.io/ggerganov/llama.cpp:full HOST_NETWORK=true PORT=8080 ./scripts/run-llama-server.sh

# Set vector database path
export GOSECRETSCANNER_DB_PATH=.gosecretscanner/findings.db

Performance

Runtime characteristics:
- Pre-compiled regex patterns for fast scanning
- Concurrent file processing using goroutines
- Thread-safe result aggregation
- Fallback paths that avoid external dependencies when optional components are unavailable
Operational notes:
- Minimal configuration required for local runs
- Color-coded terminal output with confidence levels
- Automatic recursive directory scanning with ignore rules
- Results grouped by severity to aid triage

Installation

From Source

git clone https://github.com/m1rl0k/GoSecretScanv2.git
cd GoSecretScanv2
go build -o gosecretscanner main.go

Using Go Install

go install github.com/m1rl0k/GoSecretScanv2@latest

Using Docker

# Build the Docker image
docker build -t gosecretscanner .

# Run the scanner on current directory
docker run --rm -v $(pwd):/workspace gosecretscanner

# Run on a specific directory
docker run --rm -v /path/to/scan:/workspace gosecretscanner

GitHub Actions

The bundled action.yml now supports full LLM verification. Key inputs:

enable-llm: set to 'true' to download Granite, launch llama.cpp via Docker, and run the scan with --llm.
model-path: overrides the GGUF path (relative to the action directory by default).
llm-endpoint / llm-port: control how the scanner reaches the llama.cpp HTTP server.
llama-image: change the Docker image used to serve Granite (default ghcr.io/ggerganov/llama.cpp:full).
manage-llm-server: set to 'false' when your workflow spins up the llama.cpp container via services: (as shown below).

Example workflow step:

      - name: Run GoSecretScan Action with LLM
        uses: ./
        with:
          scan-path: '.'
          fail-on-secrets: 'false'
          enable-llm: 'true'
          llm-port: '8080'
          manage-llm-server: 'false'

Usage

Navigate to the directory you want to scan and run:

cd /path/to/your/project
./gosecretscanner

The scanner will:

Recursively scan all files in the current directory
Skip .git and node_modules directories
Report any secrets found with file location and line numbers
Exit with code 1 if secrets are found, 0 otherwise

Example Output

------------------------------------------------------------------------
Secrets found:

=== CRITICAL FINDINGS ===

File: /path/to/config.go (Secret)
Line Number: 42
Confidence: CRITICAL (Entropy: 4.85)
Context: code
Pattern: (?i)_(AWS_Key):[\\s'\"=]A[KS]IA[0-9A-Z]{16}[\\s'\"]
Line: const awsKey = "AKIAIOSFODNN7EXAMPLE"

=== HIGH CONFIDENCE ===

File: /path/to/auth.py (Secret)
Line Number: 15
Confidence: HIGH (Entropy: 4.52)
Context: code
Pattern: (?i)api_key(?:\s*[:=]\s*|\s*["'\s])?([a-zA-Z0-9_\-]{32,})
Line: api_key = "sk_live_51a8f9c2e3b4d5f6g7h8"

=== MEDIUM CONFIDENCE ===

File: /path/to/test.js (Secret)
Line Number: 89
Confidence: MEDIUM (Entropy: 3.91)
Context: test_file
Pattern: (?i)password(?:\s*[:=]\s*|\s*["'\s])?([a-zA-Z0-9!@#$%^&*()_+]{8,})
Line: const testPassword = "TestPass123"

------------------------------------------------------------------------
Summary: 3 secrets found (Critical: 1, High: 1, Medium: 1)
Please review and remove them before committing your code.

Output details:

Results grouped by confidence level (Critical → High → Medium)
Entropy score shows randomness (higher = more likely real secret)
Context indicates where the secret was found (code, test_file, comment, etc.)
Low confidence findings are automatically filtered out

Detected Patterns

Cloud Provider Credentials

AWS:
- Access Key IDs (AKIA...)
- Secret Access Keys
- STS Tokens
Azure:
- Client IDs and Secrets
- Tenant IDs
- Subscription IDs
- Access Keys
Google Cloud Platform:
- API Keys (AIza...)
- Application Credentials
- Service Account Keys
- Client IDs and Secrets

Private Keys

SSH Private Keys
RSA Private Keys
PGP Private Keys
Generic Private Keys (PEM format)

Authentication & Secrets

Basic Authentication tokens
API Keys
Bearer tokens
JWT tokens
Passwords and credentials
Database connection strings

Security Vulnerabilities

Cross-Site Scripting (XSS) patterns
SQL Injection patterns
Hardcoded IP addresses
S3 Bucket URLs

Integration with CI/CD

Pre-commit Hook

Add to .git/hooks/pre-commit:

#!/bin/bash
./gosecretscanner
if [ $? -ne 0 ]; then
    echo "Secret scan failed! Please remove secrets before committing."
    exit 1
fi

GitHub Actions

The repository also exposes a reusable GitHub Action for CI pipelines:

name: Security Scan
on: [push, pull_request]

jobs:
  secret-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Use GoSecretScan as a reusable action
      - name: Run Secret Scanner
        uses: m1rl0k/GoSecretScanv2@main
        with:
          scan-path: '.'
          fail-on-secrets: 'true'

Action Inputs

scan-path: Directory path to scan (default: .)
fail-on-secrets: Fail the workflow if secrets are found (default: true)

Action Outputs

secrets-found: Number of secrets detected
scan-status: Status of the scan (success, failed, or error)

Advanced Usage

- name: Run Secret Scanner with outputs
  id: scan
  uses: m1rl0k/GoSecretScanv2@main
  with:
    scan-path: './src'
    fail-on-secrets: 'false'

- name: Report results
  if: always()
  run: |
    echo "Secrets found: ${{ steps.scan.outputs.secrets-found }}"
    echo "Status: ${{ steps.scan.outputs.scan-status }}"

Development

Building

go build -o gosecretscanner main.go

Running Tests

go test ./...

Code Formatting

gofmt -w .

How It Works

Scanning Pipeline

Pattern Compilation: On startup, all 70+ regex patterns are pre-compiled for optimal performance
Directory Walking: Uses filepath.Walk to recursively traverse the directory tree
Concurrent Scanning: Each file is scanned in a separate goroutine for parallel processing
Smart Filtering: Regex pattern definitions and binary content are skipped
Pattern Matching: Each line is checked against all compiled patterns
Entropy Analysis: Shannon entropy calculated for each match
Context Detection: File path and line content analyzed for context
Confidence Scoring: Multi-factor scoring combines entropy + context + pattern type
Result Filtering: Only medium+ confidence findings are reported
Priority Grouping: Results grouped by confidence level (Critical → High → Medium)
Thread-Safe Results: Uses mutex locks to safely collect results from concurrent scans

Advanced Algorithms

Shannon Entropy Calculation

H(X) = -Σ P(x) * log₂(P(x))

Measures randomness of detected strings
High entropy (>4.5): Likely a real secret (random characters)
Low entropy (<3.5): Likely a false positive (repeated patterns)

Confidence Scoring Algorithm

Base Score: 50

Entropy Adjustments:
+ 30 if entropy > 4.5 (very random)
+ 20 if entropy > 4.0 (quite random)
+ 10 if entropy > 3.5 (moderately random)
- 10 if entropy <= 3.5 (low randomness)

Context Adjustments:
- 50 for placeholders (${VAR}, YOUR_KEY)
- 45 for templates (REPLACE_ME, CHANGE_ME)
- 40 for test files
- 35 for documentation
- 30 for comments
+ 10 for actual code

Pattern Adjustments:
+ 15 for AWS keys, private keys (critical patterns)

Final Mapping:
≥ 80: Critical
≥ 60: High
≥ 40: Medium
< 40: Low (filtered out)

Why This Is Better Than Gitleaks

Feature	GoSecretScanv2	GoSecretScanv2 (LLM)	Gitleaks	TruffleHog
LLM Verification	❌	✅ Granite 4.0 Micro	❌	❌
Entropy Analysis	✅ Shannon entropy	✅ Shannon entropy	⚠️ Limited	✅ Yes
Context Awareness	✅ Test/comment detection	✅ Advanced code parsing	❌ None	⚠️ Basic
Confidence Scoring	✅ 4-level system	✅ LLM-enhanced	❌ Binary	⚠️ Limited
Smart Filtering	✅ Auto-filters	✅ AI-powered	⚠️ Manual allowlist	⚠️ Manual
Semantic Search	❌	✅ Vector embeddings	❌	❌
Historical Learning	❌	✅ Vector database	❌	❌
Pattern Detection	✅ Self-aware	✅ Self-aware	❌	❌
Output Grouping	✅ By severity	✅ By severity	⚠️ Flat list	⚠️ Flat list
Performance	✅ Pre-compiled	✅ Optimized	✅ Good	✅ Good
Dependencies	✅ Zero (stdlib only)	✅ Minimal (SQLite)	⚠️ Requires Git	⚠️ Multiple
False Positive Rate	~2-5%	<1%	~10-20%	~5-15%

Performance Considerations

Pre-compiled Patterns: Regex patterns are compiled once at startup, not on every match
Concurrent Processing: Multiple files are scanned simultaneously using goroutines
Smart Ignoring: Automatically skips .git and node_modules directories
Memory Efficient: Streams file contents line-by-line rather than loading entire files