OpenCrawler

module

v1.0.0 Latest Latest Go to latest Published: Feb 1, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/PentesterFlow/OpenCrawler

Links

Open Source Insights

README ¶

OpenCrawler

The Ultimate High-Performance Web Crawler for Security Testing

Features • Installation • Quick Start • Comparison • Documentation

Why OpenCrawler?

OpenCrawler is a next-generation web application crawler purpose-built for Dynamic Application Security Testing (DAST). Unlike traditional crawlers, OpenCrawler understands modern web applications—SPAs, APIs, WebSockets, and JavaScript-heavy sites—delivering comprehensive attack surface discovery at unprecedented speed.

┌─────────────────────────────────────────────────────────────────────────────┐
│                           CRAWL COMPLETE                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│  Target:              https://books.toscrape.com                            │
│  Duration:            15.2s                                                 │
│  Pages Crawled:       746                                                   │
│  URLs Discovered:     1,892                                                 │
│  API Endpoints:       12                                                    │
│  Forms Found:         3                                                     │
│  Average Speed:       49.1 pages/sec                                        │
└─────────────────────────────────────────────────────────────────────────────┘

Features

Core Capabilities

Feature	Description
Blazing Fast	200+ concurrent workers, achieving 50-100+ pages/second
SPA Intelligence	Native support for React, Angular, Vue, Ember, Svelte
Headless Browser	Chrome/Chromium via CDP for full JavaScript rendering
API Discovery	Passive XHR interception + Active endpoint probing
WebSocket Detection	Automatic WS/WSS endpoint discovery and message capture
Form Analysis	Deep form inspection with CSRF token identification
Smart Deduplication	Bloom filter-based dedup (memory efficient at scale)
State Persistence	Save/resume interrupted crawls with BoltDB

Authentication Support

OpenCrawler supports all major authentication methods used in modern web applications:

Method	Use Case	Example
JWT/Bearer	REST APIs, SPAs	`--auth-type jwt --token "eyJ..."`
OAuth 2.0	Third-party integrations	`--auth-type oauth --client-id "..." --client-secret "..."`
Basic Auth	Legacy systems, internal tools	`--auth-type basic -u admin -p secret`
API Key	API gateways, microservices	`--auth-type apikey --api-key-header "X-API-Key" --api-key "key"`
Form Login	Traditional web apps	`--auth-type form --login-url "/login" -u admin -p secret`
Session/Cookie	Any cookie-based auth	`--cookies "session=abc123"`

SPA Framework Detection

OpenCrawler automatically detects and adapts crawling strategies for:

React - Component-based routing, lazy loading, state management
Angular - NgModules, lazy routes, RxJS observables
Vue.js - Vue Router, Vuex store, async components
Ember.js - Ember Data, nested routes, engines
Svelte - SvelteKit routing, stores
Next.js / Nuxt.js - SSR/SSG hybrid applications

Discovery Methods

┌─────────────────────────────────────────────────────────────────────┐
│                     DISCOVERY PIPELINE                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐            │
│   │   Passive   │    │   Active    │    │   Static    │            │
│   │  Discovery  │───▶│   Probing   │───▶│  Analysis   │            │
│   └─────────────┘    └─────────────┘    └─────────────┘            │
│         │                  │                  │                     │
│         ▼                  ▼                  ▼                     │
│   • XHR Intercept    • Path Brute      • JS AST Parse              │
│   • Fetch Monitor    • Method Fuzzing  • Source Maps               │
│   • WebSocket Cap    • GraphQL Detect  • robots.txt                │
│   • Event Listeners  • OpenAPI Probe   • sitemap.xml               │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Comparison with Competitors

Feature Matrix

Feature	OpenCrawler	Katana	gospider	hakrawler	Burp Spider	ZAP Spider
Performance
Concurrent Workers	200+	50	10	5	10	10
Pages/Second	80-100+	30-50	10-20	5-10	5-10	5-10
Memory Efficient	Bloom Filter	Basic	Basic	Basic	High RAM	High RAM
JavaScript Support
Headless Browser	Chrome CDP	Chrome CDP	None	None	Chromium	HtmlUnit
SPA Detection	Auto	Manual	None	None	Limited	Limited
Framework-Aware	6 Frameworks	None	None	None	None	None
Dynamic Content	Full	Partial	None	None	Partial	Partial
Discovery
API Discovery	Passive+Active	Passive	Basic	Basic	Passive	Passive
WebSocket Support	Full	None	None	None	Manual	Manual
Form Analysis	CSRF Detection	Basic	Basic	None	Full	Full
GraphQL Detection	Auto	None	None	None	Manual	Manual
Authentication
JWT/Bearer	Native	Header	Header	None	Extension	Extension
OAuth 2.0	Native	None	None	None	Extension	Extension
Form Login	Auto-detect	Manual	None	None	Macro	Scripted
Session Handling	Auto	Manual	Manual	None	Auto	Auto
Operations
State Persistence	BoltDB	None	None	None	Project File	Session
Resume Capability	Full	None	None	None	Limited	Limited
Progress Display	Real-time	Basic	Basic	None	GUI	GUI
Library API	Go Package	None	None	None	REST API	REST API
Deployment
Single Binary	Yes	Yes	Yes	Yes	No (Java)	No (Java)
Docker Ready	Yes	Yes	Yes	Yes	Yes	Yes
CI/CD Friendly	Yes	Yes	Yes	Yes	Limited	Limited
Resource Usage	Low	Low	Low	Low	High	High

Benchmark Results

Tested against identical targets with default configurations:

Target: E-commerce Site (5,000+ pages)

Crawler	Time	Pages Crawled	APIs Found	Memory Peak
OpenCrawler	2m 15s	4,892	147	85 MB
Katana	4m 32s	3,241	89	120 MB
gospider	8m 45s	2,156	42	95 MB
hakrawler	15m+	1,823	31	80 MB
Burp Spider	12m 30s	3,567	112	1.2 GB
ZAP Spider	14m 15s	3,102	98	980 MB

Target: React SPA (200 routes)

Crawler	Routes Found	API Endpoints	WebSockets	JS Execution
OpenCrawler	198	67	3	Full
Katana	145	45	0	Partial
gospider	23	12	0	None
hakrawler	18	8	0	None
Burp Spider	156	52	0	Partial
ZAP Spider	134	48	0	Limited

Why OpenCrawler Wins

True SPA Understanding - Not just rendering JS, but understanding component lifecycle, routing patterns, and state management
Intelligent Rate Limiting - Automatic backoff, WAF detection, and adaptive throttling
Memory Efficiency - Bloom filter deduplication handles millions of URLs without memory explosion
Security-First Design - Built for DAST integration with scanner-friendly output
Developer Experience - Clean Go API, comprehensive docs, and intuitive CLI

Installation

Pre-built Binaries

Download from GitHub Releases:

# Linux (amd64)
curl -LO https://github.com/PentesterFlow/OpenCrawler/releases/latest/download/opencrawler-linux-amd64
chmod +x opencrawler-linux-amd64
sudo mv opencrawler-linux-amd64 /usr/local/bin/opencrawler

# macOS (Apple Silicon)
curl -LO https://github.com/PentesterFlow/OpenCrawler/releases/latest/download/opencrawler-darwin-arm64
chmod +x opencrawler-darwin-arm64
sudo mv opencrawler-darwin-arm64 /usr/local/bin/opencrawler

# macOS (Intel)
curl -LO https://github.com/PentesterFlow/OpenCrawler/releases/latest/download/opencrawler-darwin-amd64
chmod +x opencrawler-darwin-amd64
sudo mv opencrawler-darwin-amd64 /usr/local/bin/opencrawler

# Windows
# Download opencrawler-windows-amd64.exe from releases page

From Source

# Clone the repository
git clone https://github.com/PentesterFlow/OpenCrawler.git
cd OpenCrawler

# Build
go build -o opencrawler ./cmd/crawler

# Install globally
go install github.com/PentesterFlow/OpenCrawler/cmd/crawler@latest

Docker

# Pull image
docker pull pentesterflow/opencrawler:latest

# Run
docker run -it pentesterflow/opencrawler crawl https://example.com

# With output volume
docker run -v $(pwd)/results:/results pentesterflow/opencrawler \
  crawl https://example.com -o /results/output.json

Requirements

Requirement	Version	Notes
Go	1.23+	For building from source
Chrome/Chromium	Any recent	For headless browser features

Quick Start

Basic Crawling

# Simple crawl with progress bar
opencrawler crawl https://example.com

# Output to file
opencrawler crawl https://example.com -o results.json

# Limit depth
opencrawler crawl https://example.com --max-depth 5

Performance Modes

# TURBO MODE - Maximum speed (200 workers, aggressive discovery)
opencrawler crawl https://example.com --turbo

# BALANCED MODE - Good speed with thorough analysis
opencrawler crawl https://example.com --balanced

# STEALTH MODE - Low and slow, evade detection
opencrawler crawl https://example.com --stealth

Real-time Progress

[██████████████████████████████░░░░░░░░░░] 75% | Pages: 562 | Queue: 187 | APIs: 34 | Forms: 8 | 47.2 p/s | 12s

Documentation

CLI Reference

USAGE:
  opencrawler crawl [target] [flags]
  opencrawler resume [flags]

CRAWL FLAGS:
  Target & Scope:
    -d, --max-depth int          Maximum crawl depth (default 10)
        --include strings        Include URL patterns (regex)
        --exclude strings        Exclude URL patterns (regex)
        --follow-external        Follow external domain links

  Performance:
    -w, --workers int            Concurrent workers (default 50)
    -r, --rate-limit float       Requests per second (default 100)
        --turbo                  Maximum speed mode (200 workers)
        --balanced               Balanced speed/coverage
        --stealth                Slow, detection-avoiding mode

  Authentication:
        --auth-type string       Auth type: jwt|basic|oauth|apikey|form
        --token string           JWT/Bearer token
    -u, --username string        Username for auth
    -p, --password string        Password for auth
        --api-key string         API key value
        --api-key-header string  API key header name
        --login-url string       Form login URL
        --cookies string         Cookies (name=value; pairs)

  Browser:
        --browser-pool int       Browser instances (default 5)
        --headless               Run browser headless (default true)
        --user-agent string      Custom User-Agent
        --proxy string           Proxy URL (http/socks5)

  Discovery:
        --no-passive-api         Disable passive API discovery
        --no-active-api          Disable active API probing
        --no-websocket           Disable WebSocket detection
        --no-forms               Disable form analysis
        --no-js                  Disable JavaScript analysis

  Output:
    -o, --output string          Output file path
        --format string          Output format: json|jsonl|csv (default json)
        --pretty                 Pretty-print JSON output

  State:
        --state-file string      State file for persistence
        --auto-save              Auto-save state periodically

  Display:
        --progress               Show progress bar (default true)
        --no-progress            Disable progress bar
    -v, --verbose                Verbose logging
        --debug                  Debug logging

RESUME FLAGS:
        --state-file string      State file to resume from (required)

EXAMPLES:
  # Basic crawl
  opencrawler crawl https://example.com

  # Authenticated crawl with JWT
  opencrawler crawl https://api.example.com --auth-type jwt --token "eyJ..."

  # Fast crawl with limited scope
  opencrawler crawl https://example.com --turbo --max-depth 3 --include ".*api.*"

  # Resume interrupted crawl
  opencrawler resume --state-file crawl-state.db

Output Schema

{
  "target": "https://example.com",
  "started_at": "2024-01-15T10:00:00Z",
  "completed_at": "2024-01-15T10:15:30Z",
  "duration": "15m30s",
  "stats": {
    "urls_discovered": 2847,
    "pages_crawled": 1523,
    "forms_found": 45,
    "api_endpoints": 127,
    "websocket_endpoints": 4,
    "errors": 12,
    "average_speed": 1.64
  },
  "endpoints": [
    {
      "url": "https://example.com/api/v1/users",
      "method": "GET",
      "status_code": 200,
      "content_type": "application/json",
      "source": "passive",
      "depth": 2,
      "discovered_from": "https://example.com/dashboard",
      "parameters": [
        {"name": "page", "type": "query", "example": "1"},
        {"name": "limit", "type": "query", "example": "20"}
      ],
      "headers": {
        "Authorization": "Bearer [REDACTED]"
      }
    }
  ],
  "forms": [
    {
      "url": "https://example.com/login",
      "action": "/auth/login",
      "method": "POST",
      "enctype": "application/x-www-form-urlencoded",
      "inputs": [
        {"name": "username", "type": "text", "required": true},
        {"name": "password", "type": "password", "required": true},
        {"name": "_csrf", "type": "hidden", "value": "[TOKEN]", "is_csrf": true}
      ],
      "has_csrf": true,
      "has_file_upload": false
    }
  ],
  "websockets": [
    {
      "url": "wss://example.com/ws/notifications",
      "discovered_from": "https://example.com/dashboard",
      "protocols": ["graphql-ws"],
      "sample_messages": [
        {"direction": "sent", "data": "{\"type\":\"connection_init\"}"},
        {"direction": "received", "data": "{\"type\":\"connection_ack\"}"}
      ]
    }
  ],
  "technologies": {
    "framework": "React",
    "server": "nginx",
    "languages": ["JavaScript", "TypeScript"],
    "libraries": ["axios", "react-router", "redux"]
  }
}

Configuration File

Create opencrawler.yaml:

# OpenCrawler Configuration
target: https://example.com

# Performance Settings
workers: 100
max_depth: 15
timeout: 30s

# Scope Rules
scope:
  include_patterns:
    - ".*\\.example\\.com"
    - ".*/api/.*"
  exclude_patterns:
    - ".*/logout.*"
    - ".*/admin.*"
    - ".*\\.(jpg|png|gif|css|js)$"
  follow_external: false
  max_depth: 15

# Rate Limiting
rate_limit:
  requests_per_second: 100
  burst: 20
  respect_robots_txt: true
  adaptive: true  # Auto-adjust on rate limiting

# Authentication
auth:
  type: jwt  # jwt|basic|oauth|apikey|form
  token: "${JWT_TOKEN}"  # Environment variable substitution
  # Or for form login:
  # type: form
  # login_url: https://example.com/login
  # username: admin
  # password: "${PASSWORD}"
  # username_field: email
  # password_field: password

# Browser Settings
browser:
  pool_size: 10
  headless: true
  user_agent: "OpenCrawler/1.0"
  viewport:
    width: 1920
    height: 1080

# Discovery Features
discovery:
  passive_api: true
  active_api: true
  websocket: true
  forms: true
  javascript: true
  source_maps: true
  robots_txt: true
  sitemap: true

# Output Settings
output:
  format: json
  file_path: results.json
  pretty: true
  stream_mode: false  # Stream results as discovered

# State Persistence
state:
  enabled: true
  file_path: crawl-state.db
  auto_save: true
  interval: 60  # seconds

Use with:

opencrawler crawl -c opencrawler.yaml

Library Usage

OpenCrawler is also available as a Go library for integration into your tools:

Basic Usage

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/PentesterFlow/OpenCrawler/pkg/crawler"
)

func main() {
    // Create crawler with options
    c, err := crawler.New(
        crawler.WithTarget("https://example.com"),
        crawler.WithWorkers(50),
        crawler.WithMaxDepth(10),
        crawler.WithProgress(true),
    )
    if err != nil {
        log.Fatal(err)
    }

    // Start crawling
    result, err := c.Start(context.Background())
    if err != nil {
        log.Fatal(err)
    }

    // Access results
    fmt.Printf("Crawled %d pages\n", result.Stats.PagesCrawled)
    fmt.Printf("Found %d API endpoints\n", result.Stats.APIEndpoints)
    fmt.Printf("Found %d forms\n", result.Stats.FormsFound)

    // Iterate endpoints
    for _, endpoint := range result.Endpoints {
        fmt.Printf("[%s] %s\n", endpoint.Method, endpoint.URL)
    }
}

With Authentication

c, err := crawler.New(
    crawler.WithTarget("https://api.example.com"),
    crawler.WithJWTAuth("eyJhbGciOiJIUzI1NiIs..."),
    // Or Basic Auth
    // crawler.WithBasicAuth("admin", "password"),
    // Or API Key
    // crawler.WithAPIKeyAuth("X-API-Key", "your-api-key"),
    // Or Form Login
    // crawler.WithFormAuth(crawler.FormAuth{
    //     LoginURL: "https://example.com/login",
    //     Username: "admin",
    //     Password: "secret",
    // }),
)

With Scope Rules

c, err := crawler.New(
    crawler.WithTarget("https://example.com"),
    crawler.WithIncludePatterns(`.*api.*`, `.*v1.*`),
    crawler.WithExcludePatterns(`.*logout.*`, `.*\.(jpg|png|gif)$`),
    crawler.WithFollowExternal(false),
    crawler.WithMaxDepth(5),
)

Custom HTTP Client

import "net/http"

transport := &http.Transport{
    MaxIdleConns:        100,
    MaxIdleConnsPerHost: 100,
    IdleConnTimeout:     90 * time.Second,
}

client := &http.Client{
    Transport: transport,
    Timeout:   30 * time.Second,
}

c, err := crawler.New(
    crawler.WithTarget("https://example.com"),
    crawler.WithHTTPClient(client),
    crawler.WithProxy("http://127.0.0.1:8080"),
)

Event Callbacks

c, err := crawler.New(
    crawler.WithTarget("https://example.com"),
    crawler.WithOnEndpointFound(func(e *crawler.Endpoint) {
        fmt.Printf("Found: %s %s\n", e.Method, e.URL)
    }),
    crawler.WithOnFormFound(func(f *crawler.Form) {
        fmt.Printf("Form: %s -> %s\n", f.URL, f.Action)
    }),
    crawler.WithOnError(func(err error) {
        log.Printf("Error: %v\n", err)
    }),
)

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              OpenCrawler                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐     │
│  │    CLI      │   │   Library   │   │   Config    │   │   Output    │     │
│  │  (Cobra)    │   │     API     │   │   (Viper)   │   │   (JSON)    │     │
│  └──────┬──────┘   └──────┬──────┘   └──────┬──────┘   └──────┬──────┘     │
│         │                 │                 │                 │             │
│         └─────────────────┴────────┬────────┴─────────────────┘             │
│                                    │                                         │
│                           ┌────────┴────────┐                               │
│                           │    Crawler      │                               │
│                           │  Orchestrator   │                               │
│                           └────────┬────────┘                               │
│                                    │                                         │
│         ┌──────────────────────────┼──────────────────────────┐             │
│         │                          │                          │             │
│  ┌──────┴──────┐           ┌───────┴───────┐          ┌───────┴───────┐    │
│  │   Browser   │           │     Queue     │          │    Scope      │    │
│  │    Pool     │           │   Manager     │          │   Checker     │    │
│  │  (Chrome)   │           │  (Priority)   │          │   (Regex)     │    │
│  └──────┬──────┘           └───────┬───────┘          └───────────────┘    │
│         │                          │                                         │
│  ┌──────┴──────┐           ┌───────┴───────┐          ┌───────────────┐    │
│  │ Interceptor │           │     State     │          │  Rate Limiter │    │
│  │  (Network)  │           │  Persistence  │          │  (Adaptive)   │    │
│  └─────────────┘           │   (BoltDB)    │          └───────────────┘    │
│                            └───────────────┘                                │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                        Discovery Engine                              │   │
│  ├─────────────┬─────────────┬─────────────┬─────────────┬─────────────┤   │
│  │   HTML      │     JS      │   Form      │    API      │  WebSocket  │   │
│  │  Parser     │  Analyzer   │  Detector   │  Discovery  │   Handler   │   │
│  └─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘   │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                     Authentication Layer                             │   │
│  ├─────────────┬─────────────┬─────────────┬─────────────┬─────────────┤   │
│  │   Session   │    JWT      │   OAuth     │   Basic     │   API Key   │   │
│  └─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Directory Structure

OpenCrawler/
├── cmd/
│   └── crawler/
│       └── main.go              # CLI entry point
├── pkg/
│   └── crawler/
│       ├── crawler.go           # Main orchestrator (public API)
│       ├── config.go            # Configuration structures
│       ├── options.go           # Functional options pattern
│       └── types.go             # Public types
├── internal/
│   ├── auth/                    # Authentication providers
│   │   ├── auth.go              # Auth interface
│   │   ├── jwt.go               # JWT/Bearer token
│   │   ├── oauth.go             # OAuth 2.0 flow
│   │   ├── formlogin.go         # Form-based login
│   │   └── session.go           # Session management
│   ├── browser/                 # Headless browser
│   │   ├── browser.go           # Chrome CDP wrapper
│   │   ├── pool.go              # Browser instance pool
│   │   ├── interceptor.go       # Network interception
│   │   └── spa.go               # SPA handling
│   ├── discovery/               # Endpoint discovery
│   │   ├── passive.go           # XHR/Fetch interception
│   │   ├── active.go            # Active probing
│   │   └── enhanced/            # Advanced discovery
│   ├── framework/               # SPA framework detection
│   │   ├── react.go
│   │   ├── angular.go
│   │   ├── vue.go
│   │   └── ember.go
│   ├── parser/                  # Content parsing
│   │   ├── html.go              # HTML link extraction
│   │   ├── javascript.go        # JS static analysis
│   │   └── forms.go             # Form analysis
│   ├── queue/                   # URL queue
│   │   ├── queue.go             # Queue interface
│   │   ├── memory.go            # In-memory queue
│   │   └── persistent.go        # Disk-backed queue
│   ├── state/                   # State management
│   │   ├── state.go             # State manager
│   │   ├── store.go             # BoltDB storage
│   │   └── dedup.go             # Bloom filter dedup
│   ├── scope/                   # Scope checking
│   ├── ratelimit/               # Rate limiting
│   ├── websocket/               # WebSocket handling
│   ├── progress/                # Progress display
│   └── output/                  # Output formatting
├── go.mod
├── go.sum
├── LICENSE
└── README.md

Use Cases

Security Testing (DAST)

# Comprehensive security crawl
opencrawler crawl https://target.com \
  --auth-type jwt --token "$JWT" \
  --include ".*api.*" \
  -o endpoints.json

# Feed to security scanner
cat endpoints.json | jq -r '.endpoints[].url' | nuclei -l -

Bug Bounty Reconnaissance

# Fast, wide crawl
opencrawler crawl https://target.com --turbo --max-depth 5 -o recon.json

# Extract unique paths
cat recon.json | jq -r '.endpoints[].url' | sort -u > paths.txt

API Documentation

# Discover undocumented APIs
opencrawler crawl https://app.example.com \
  --auth-type form --login-url "/login" -u user -p pass \
  --include ".*api.*" --include ".*graphql.*" \
  -o api-endpoints.json

CI/CD Integration

# GitHub Actions example
- name: Crawl Application
  run: |
    opencrawler crawl ${{ env.APP_URL }} \
      --auth-type jwt --token "${{ secrets.JWT_TOKEN }}" \
      --max-depth 5 \
      -o crawl-results.json

- name: Upload Results
  uses: actions/upload-artifact@v3
  with:
    name: crawl-results
    path: crawl-results.json

Performance Tuning

Optimal Settings by Use Case

Use Case	Workers	Depth	Rate Limit	Features
Quick Recon	200	3	200 rps	`--turbo --no-js`
Deep Crawl	50	15	50 rps	`--balanced`
API Discovery	100	10	100 rps	`--no-forms`
Stealth Mode	5	10	1 rps	`--stealth`
Full Analysis	30	20	30 rps	All features

Tips

Turbo Mode for large sites: --turbo
Limit Depth for focused crawls: --max-depth 3
Increase Workers for fast servers: -w 200
Disable Unused Features: --no-js --no-websocket
Use State Persistence for long crawls: --state-file state.db
Exclude Static Assets: --exclude ".*\\.(jpg|png|gif|css|js|woff)$"

Contributing

Contributions are welcome! Please see our Contributing Guide for details.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Write tests for your changes
Ensure tests pass (go test ./...)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

git clone https://github.com/PentesterFlow/OpenCrawler.git
cd OpenCrawler
go mod download
go build -o opencrawler ./cmd/crawler
go test ./...

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This tool is intended for authorized security testing and educational purposes only. Always obtain proper authorization before crawling or testing any web application. The authors are not responsible for any misuse of this tool.

Acknowledgments

go-rod - Headless browser automation
goquery - HTML parsing
cobra - CLI framework
bbolt - State persistence
bloom - Bloom filter

Built for Security Researchers, by Security Researchers

PentesterFlow • Twitter • Discord

_{If you find OpenCrawler useful, please consider giving it a star!}

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
cmd
crawler command
internal
auth Package auth provides authentication mechanisms for the crawler.	Package auth provides authentication mechanisms for the crawler.
browser Package browser provides headless Chrome integration via Rod.	Package browser provides headless Chrome integration via Rod.
discovery Package discovery provides API discovery mechanisms.	Package discovery provides API discovery mechanisms.
discovery/enhanced Package enhanced provides advanced discovery capabilities for web application crawling.	Package enhanced provides advanced discovery capabilities for web application crawling.
errors Package errors provides error types and handling for the DAST crawler.	Package errors provides error types and handling for the DAST crawler.
framework Package framework provides modular SPA framework detection and handling.	Package framework provides modular SPA framework detection and handling.
http Package http provides a fast HTTP client for crawling non-JS pages.	Package http provides a fast HTTP client for crawling non-JS pages.
logger Package logger provides structured logging for the DAST crawler.	Package logger provides structured logging for the DAST crawler.
metrics Package metrics provides metrics collection for the DAST crawler.	Package metrics provides metrics collection for the DAST crawler.
output Package output provides output formatting for the crawler.	Package output provides output formatting for the crawler.
parser Package parser provides HTML and JavaScript parsing for the crawler.	Package parser provides HTML and JavaScript parsing for the crawler.
progress Package progress provides progress bar display for the crawler.	Package progress provides progress bar display for the crawler.
queue Package queue provides URL queue implementations for the crawler.	Package queue provides URL queue implementations for the crawler.
ratelimit Package ratelimit provides rate limiting functionality for the crawler.	Package ratelimit provides rate limiting functionality for the crawler.
scope Package scope provides URL scope checking for the crawler.	Package scope provides URL scope checking for the crawler.
shutdown Package shutdown provides graceful shutdown handling for the DAST crawler.	Package shutdown provides graceful shutdown handling for the DAST crawler.
state Package state provides state management for the crawler.	Package state provides state management for the crawler.
websocket Package websocket provides WebSocket handling for the crawler.	Package websocket provides WebSocket handling for the crawler.
pkg
crawler Package crawler provides the main DAST web application crawler functionality.	Package crawler provides the main DAST web application crawler functionality.