README
¶
GoSecretScanv2
GoSecretScanv2 is an engineering-focused security scanner that detects secrets, API keys, credentials, and common security misconfigurations using deterministic analysis plus optional LLM-based verification.
Features
Core Detection
- 70+ Detection Patterns: Comprehensive regex patterns for detecting:
- Cloud provider credentials (AWS, Azure, GCP)
- API keys and tokens (GitHub, Slack, JWT)
- Private keys (SSH, RSA, PGP)
- Database connection strings
- Basic authentication credentials
- Security vulnerabilities (XSS, SQL injection patterns)
Advanced Intelligence
-
Shannon Entropy Analysis:
- Calculates randomness of detected strings
- Identifies high-entropy secrets vs low-entropy false positives
- Entropy scoring (0-8 bits) for each finding
-
Context-Aware Detection:
- Automatically detects test files, mocks, and examples
- Identifies comments, documentation, and templates
- Recognizes placeholders and environment variable templates
- Filters false positives from regex pattern definitions
-
Confidence Scoring System:
- Every finding rated: Critical, High, Medium, or Low
- Combines entropy analysis + context detection + pattern matching
- Only reports medium confidence or higher (low confidence filtered out)
- Prioritizes critical findings first
-
Smart Filtering:
- Skips false positives automatically
- Handles large files and minified code (1MB line buffer)
- Pattern definition detection
LLM-Powered Verification (beta)
-
LLM Verification:
- Uses IBM Granite 4.0 Micro (GGUF, Q4 quantized, ~450MB)
- Provides structured reasoning for each decision
-
Semantic Embedding Search:
- Generates embeddings for each finding
- Searches for similar patterns across the codebase
- Reuses historical verifications for similar matches
-
Vector Store:
- SQLite-based vector database
- Caches verified findings
- Enables incremental learning
- Fast similarity search
-
Code Context Analysis:
- Parses code structure (functions, imports)
- Understands programming language syntax
- Gathers surrounding code for context
- Identifies test vs production code
Enabling LLM Verification:
# Download the model first (one-time setup)
./scripts/download-models.sh
# Start the llama.cpp HTTP server (runs on :8080 by default)
./scripts/run-llama-server.sh
# In a different terminal, run with LLM verification
./gosecretscanner --llm
# Custom model path
./gosecretscanner --llm --model-path=/path/to/granite-4.0-micro.Q4_K_M.gguf
# Point to a remote llama.cpp endpoint
./gosecretscanner --llm --llm-endpoint=http://localhost:8080
# Run the llama.cpp server in the background via Docker
DETACH=true PORT=8080 HOST_NETWORK=true SERVER_PORT=8080 ./scripts/run-llama-server.sh
# Adjust similarity threshold for vector search
./gosecretscanner --llm --similarity=0.9
Environment Variables:
# Enable LLM verification
export GOSECRETSCANNER_LLM_ENABLED=true
# Set model path
export GOSECRETSCANNER_MODEL_PATH=.gosecretscanner/models/granite-4.0-micro.Q4_K_M.gguf
# Override the llama.cpp endpoint (defaults to http://localhost:8080)
export GOSECRETSCANNER_LLM_ENDPOINT=http://localhost:8080
# Launch llama.cpp in detached mode with a custom image/port
DETACH=true LLAMA_CPP_IMAGE=ghcr.io/ggerganov/llama.cpp:full HOST_NETWORK=true PORT=8080 ./scripts/run-llama-server.sh
# Set vector database path
export GOSECRETSCANNER_DB_PATH=.gosecretscanner/findings.db
Performance
-
Runtime characteristics:
- Pre-compiled regex patterns for fast scanning
- Concurrent file processing using goroutines
- Thread-safe result aggregation
- Fallback paths that avoid external dependencies when optional components are unavailable
-
Operational notes:
- Minimal configuration required for local runs
- Color-coded terminal output with confidence levels
- Automatic recursive directory scanning with ignore rules
- Results grouped by severity to aid triage
Installation
From Source
git clone https://github.com/m1rl0k/GoSecretScanv2.git
cd GoSecretScanv2
go build -o gosecretscanner main.go
Using Go Install
go install github.com/m1rl0k/GoSecretScanv2@latest
Using Docker
# Build the Docker image
docker build -t gosecretscanner .
# Run the scanner on current directory
docker run --rm -v $(pwd):/workspace gosecretscanner
# Run on a specific directory
docker run --rm -v /path/to/scan:/workspace gosecretscanner
GitHub Actions
The bundled action.yml now supports full LLM verification. Key inputs:
enable-llm: set to'true'to download Granite, launch llama.cpp via Docker, and run the scan with--llm.model-path: overrides the GGUF path (relative to the action directory by default).llm-endpoint/llm-port: control how the scanner reaches the llama.cpp HTTP server.llama-image: change the Docker image used to serve Granite (defaultghcr.io/ggerganov/llama.cpp:full).manage-llm-server: set to'false'when your workflow spins up the llama.cpp container viaservices:(as shown below).
Example workflow step:
- name: Run GoSecretScan Action with LLM
uses: ./
with:
scan-path: '.'
fail-on-secrets: 'false'
enable-llm: 'true'
llm-port: '8080'
manage-llm-server: 'false'
Usage
Navigate to the directory you want to scan and run:
cd /path/to/your/project
./gosecretscanner
The scanner will:
- Recursively scan all files in the current directory
- Skip
.gitandnode_modulesdirectories - Report any secrets found with file location and line numbers
- Exit with code 1 if secrets are found, 0 otherwise
Example Output
------------------------------------------------------------------------
Secrets found:
=== CRITICAL FINDINGS ===
File: /path/to/config.go (Secret)
Line Number: 42
Confidence: CRITICAL (Entropy: 4.85)
Context: code
Pattern: (?i)_(AWS_Key):[\\s'\"=]A[KS]IA[0-9A-Z]{16}[\\s'\"]
Line: const awsKey = "AKIAIOSFODNN7EXAMPLE"
=== HIGH CONFIDENCE ===
File: /path/to/auth.py (Secret)
Line Number: 15
Confidence: HIGH (Entropy: 4.52)
Context: code
Pattern: (?i)api_key(?:\s*[:=]\s*|\s*["'\s])?([a-zA-Z0-9_\-]{32,})
Line: api_key = "sk_live_51a8f9c2e3b4d5f6g7h8"
=== MEDIUM CONFIDENCE ===
File: /path/to/test.js (Secret)
Line Number: 89
Confidence: MEDIUM (Entropy: 3.91)
Context: test_file
Pattern: (?i)password(?:\s*[:=]\s*|\s*["'\s])?([a-zA-Z0-9!@#$%^&*()_+]{8,})
Line: const testPassword = "TestPass123"
------------------------------------------------------------------------
Summary: 3 secrets found (Critical: 1, High: 1, Medium: 1)
Please review and remove them before committing your code.
Output details:
- Results grouped by confidence level (Critical → High → Medium)
- Entropy score shows randomness (higher = more likely real secret)
- Context indicates where the secret was found (code, test_file, comment, etc.)
- Low confidence findings are automatically filtered out
Detected Patterns
Cloud Provider Credentials
-
AWS:
- Access Key IDs (AKIA...)
- Secret Access Keys
- STS Tokens
-
Azure:
- Client IDs and Secrets
- Tenant IDs
- Subscription IDs
- Access Keys
-
Google Cloud Platform:
- API Keys (AIza...)
- Application Credentials
- Service Account Keys
- Client IDs and Secrets
Private Keys
- SSH Private Keys
- RSA Private Keys
- PGP Private Keys
- Generic Private Keys (PEM format)
Authentication & Secrets
- Basic Authentication tokens
- API Keys
- Bearer tokens
- JWT tokens
- Passwords and credentials
- Database connection strings
Security Vulnerabilities
- Cross-Site Scripting (XSS) patterns
- SQL Injection patterns
- Hardcoded IP addresses
- S3 Bucket URLs
Integration with CI/CD
Pre-commit Hook
Add to .git/hooks/pre-commit:
#!/bin/bash
./gosecretscanner
if [ $? -ne 0 ]; then
echo "Secret scan failed! Please remove secrets before committing."
exit 1
fi
GitHub Actions
The repository also exposes a reusable GitHub Action for CI pipelines:
name: Security Scan
on: [push, pull_request]
jobs:
secret-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Use GoSecretScan as a reusable action
- name: Run Secret Scanner
uses: m1rl0k/GoSecretScanv2@main
with:
scan-path: '.'
fail-on-secrets: 'true'
Action Inputs
scan-path: Directory path to scan (default:.)fail-on-secrets: Fail the workflow if secrets are found (default:true)
Action Outputs
secrets-found: Number of secrets detectedscan-status: Status of the scan (success,failed, orerror)
Advanced Usage
- name: Run Secret Scanner with outputs
id: scan
uses: m1rl0k/GoSecretScanv2@main
with:
scan-path: './src'
fail-on-secrets: 'false'
- name: Report results
if: always()
run: |
echo "Secrets found: ${{ steps.scan.outputs.secrets-found }}"
echo "Status: ${{ steps.scan.outputs.scan-status }}"
Development
Building
go build -o gosecretscanner main.go
Running Tests
go test ./...
Code Formatting
gofmt -w .
How It Works
Scanning Pipeline
- Pattern Compilation: On startup, all 70+ regex patterns are pre-compiled for optimal performance
- Directory Walking: Uses
filepath.Walkto recursively traverse the directory tree - Concurrent Scanning: Each file is scanned in a separate goroutine for parallel processing
- Smart Filtering: Regex pattern definitions and binary content are skipped
- Pattern Matching: Each line is checked against all compiled patterns
- Entropy Analysis: Shannon entropy calculated for each match
- Context Detection: File path and line content analyzed for context
- Confidence Scoring: Multi-factor scoring combines entropy + context + pattern type
- Result Filtering: Only medium+ confidence findings are reported
- Priority Grouping: Results grouped by confidence level (Critical → High → Medium)
- Thread-Safe Results: Uses mutex locks to safely collect results from concurrent scans
Advanced Algorithms
Shannon Entropy Calculation
H(X) = -Σ P(x) * log₂(P(x))
- Measures randomness of detected strings
- High entropy (>4.5): Likely a real secret (random characters)
- Low entropy (<3.5): Likely a false positive (repeated patterns)
Confidence Scoring Algorithm
Base Score: 50
Entropy Adjustments:
+ 30 if entropy > 4.5 (very random)
+ 20 if entropy > 4.0 (quite random)
+ 10 if entropy > 3.5 (moderately random)
- 10 if entropy <= 3.5 (low randomness)
Context Adjustments:
- 50 for placeholders (${VAR}, YOUR_KEY)
- 45 for templates (REPLACE_ME, CHANGE_ME)
- 40 for test files
- 35 for documentation
- 30 for comments
+ 10 for actual code
Pattern Adjustments:
+ 15 for AWS keys, private keys (critical patterns)
Final Mapping:
≥ 80: Critical
≥ 60: High
≥ 40: Medium
< 40: Low (filtered out)
Why This Is Better Than Gitleaks
| Feature | GoSecretScanv2 | GoSecretScanv2 (LLM) | Gitleaks | TruffleHog |
|---|---|---|---|---|
| LLM Verification | ❌ | ✅ Granite 4.0 Micro | ❌ | ❌ |
| Entropy Analysis | ✅ Shannon entropy | ✅ Shannon entropy | ⚠️ Limited | ✅ Yes |
| Context Awareness | ✅ Test/comment detection | ✅ Advanced code parsing | ❌ None | ⚠️ Basic |
| Confidence Scoring | ✅ 4-level system | ✅ LLM-enhanced | ❌ Binary | ⚠️ Limited |
| Smart Filtering | ✅ Auto-filters | ✅ AI-powered | ⚠️ Manual allowlist | ⚠️ Manual |
| Semantic Search | ❌ | ✅ Vector embeddings | ❌ | ❌ |
| Historical Learning | ❌ | ✅ Vector database | ❌ | ❌ |
| Pattern Detection | ✅ Self-aware | ✅ Self-aware | ❌ | ❌ |
| Output Grouping | ✅ By severity | ✅ By severity | ⚠️ Flat list | ⚠️ Flat list |
| Performance | ✅ Pre-compiled | ✅ Optimized | ✅ Good | ✅ Good |
| Dependencies | ✅ Zero (stdlib only) | ✅ Minimal (SQLite) | ⚠️ Requires Git | ⚠️ Multiple |
| False Positive Rate | ~2-5% | <1% | ~10-20% | ~5-15% |
Performance Considerations
- Pre-compiled Patterns: Regex patterns are compiled once at startup, not on every match
- Concurrent Processing: Multiple files are scanned simultaneously using goroutines
- Smart Ignoring: Automatically skips
.gitandnode_modulesdirectories - Memory Efficient: Streams file contents line-by-line rather than loading entire files
Limitations
- Currently scans all file types (including binaries)
- No configuration file support yet
- No custom pattern support without code modification
- No allowlist/whitelist for false positives
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Inspired by tools like gitleaks, truffleHog, and git-secrets
- Built with Go's powerful standard library
Roadmap
- Configuration file support (YAML/JSON)
- Custom pattern definitions
- Multiple output formats (JSON, SARIF)
- Allowlist/whitelist support
- Binary file detection and skipping
- Interactive mode for reviewing findings
- Entropy-based detection for unknown secrets
- Docker image for easy deployment
Support
For bugs and feature requests, please open an issue on GitHub.
Documentation
¶
There is no documentation for this package.