common

package

v1.3.0 Latest Latest Go to latest Published: Dec 1, 2025 License: MIT Imports: 27 Imported by: 5

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/projectdiscovery/katana

Links

Documentation ¶

Index ¶

Variables
func BuildHttpClient(dialer *fastdialer.Dialer, options *types.Options, ...) (*retryablehttp.Client, *fastdialer.Dialer, error)
type CrawlSession
type DoRequestFunc
type RedirectCallback
type Shared
- func NewShared(options *types.CrawlerOptions) (*Shared, error)

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrOutOfScope = errors.New("out of scope")

Functions ¶

func BuildHttpClient ¶ added in v1.0.0

func BuildHttpClient(dialer *fastdialer.Dialer, options *types.Options, redirectCallback RedirectCallback) (*retryablehttp.Client, *fastdialer.Dialer, error)

BuildHttpClient builds a http client based on a profile

Types ¶

type CrawlSession ¶ added in v1.0.1

type CrawlSession struct {
	Ctx        context.Context
	CancelFunc context.CancelFunc
	URL        *url.URL
	Hostname   string
	Queue      *queue.Queue
	HttpClient *retryablehttp.Client
	Browser    *rod.Browser
}

CrawlSession represents an active crawling session for a specific target URL. It maintains the session context, cancellation function, parsed URL information, the request queue, and HTTP/browser clients needed for the crawl operation.

type DoRequestFunc ¶ added in v1.0.1

type DoRequestFunc func(crawlSession *CrawlSession, req *navigation.Request) (*navigation.Response, error)

DoRequestFunc is a function type for executing navigation requests. Implementations should perform the actual HTTP request or browser navigation and return the response or an error. This allows different crawling strategies (standard HTTP vs. headless browser) to provide their own request logic.

type RedirectCallback ¶ added in v1.0.0

type RedirectCallback func(resp *http.Response, depth int)

type Shared ¶ added in v1.0.1

type Shared struct {
	Headers    map[string]string
	KnownFiles *files.KnownFiles
	Options    *types.CrawlerOptions
	Jar        *httputil.CookieJar
}

Shared represents the shared state and configuration used across all crawl sessions. It maintains common resources like HTTP headers, cookie jars, known files database, and crawler options that are reused for efficiency across multiple crawl operations.

func NewShared ¶ added in v1.0.1

func NewShared(options *types.CrawlerOptions) (*Shared, error)

NewShared creates a new Shared instance with the provided crawler options. It initializes the HTTP headers, known files database (if configured), and an empty cookie jar. Returns an error if the HTTP client or cookie jar creation fails.

func (*Shared) Do ¶ added in v1.0.1

func (s *Shared) Do(crawlSession *CrawlSession, doRequest DoRequestFunc) error

Do executes the main crawling loop for the given crawl session. It processes items from the queue concurrently (respecting the Concurrency limit), validates each request (URL format, path filters, scope), applies rate limiting and delays, executes the request using the provided doRequest function, writes results to output, and enqueues any newly discovered URLs from responses.

The method returns when the queue is empty or the session context is cancelled (due to timeout or manual cancellation). Returns an error if the context is cancelled.

func (*Shared) Enqueue ¶ added in v1.0.1

func (s *Shared) Enqueue(queue *queue.Queue, navigationRequests ...*navigation.Request)

Enqueue adds one or more navigation requests to the crawl queue after applying validation checks. The method performs the following checks in order:

URL format validation
Query parameter handling (if IgnoreQueryParams is enabled)
Depth filtering - skips URLs exceeding MaxDepth before uniqueness check to prevent caching URLs that would be rejected, allowing them to be processed if discovered later at valid depths via different paths
Uniqueness filtering - prevents duplicate URL crawling
Cycle detection - identifies URLs stuck in redirect loops
Scope validation - ensures URLs belong to the allowed crawl scope

For in-scope URLs, the method also handles path climbing when enabled, extracting and enqueuing parent directory paths. Out-of-scope URLs are sent to output if DisplayOutScope is enabled.

func (*Shared) NewCrawlSessionWithURL ¶ added in v1.0.1

func (s *Shared) NewCrawlSessionWithURL(URL string) (*CrawlSession, error)

NewCrawlSessionWithURL creates and initializes a new crawl session for the specified URL. It performs the following initialization steps:

Creates a context with optional timeout based on CrawlDuration setting
Parses the target URL and extracts the hostname
Initializes the request queue with the configured strategy
Enqueues the initial URL and any known files for the target
Sets up the HTTP client with response parsing callbacks

Returns the initialized CrawlSession or an error if initialization fails.

func (*Shared) Output ¶ added in v1.0.1

func (s *Shared) Output(navigationRequest *navigation.Request, navigationResponse *navigation.Response, err error)

Output writes a crawl result to the configured output writer. It creates a Result object containing the navigation request, response (if any), and error information (if any), then writes it to the output writer. If an OnResult callback is configured and output writing succeeds, the callback is invoked.

func (*Shared) ValidateScope ¶ added in v1.0.1

func (s *Shared) ValidateScope(URL string, root string) bool

ValidateScope checks whether a given URL is within the allowed crawling scope based on the configured scope rules and the root hostname. Returns true if the URL passes scope validation, false otherwise.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL