subscraping

package
v2.12.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 12, 2026 License: MIT Imports: 16 Imported by: 205

Documentation

Overview

Package subscraping contains the logic of scraping agents

Index

Constants

View Source
const MultipleKeyPartsLength = 2

Variables

This section is empty.

Functions

func CreateApiKeys

func CreateApiKeys[T any](keys []string, provider func(k, v string) T) []T

func PickRandom

func PickRandom[T any](v []T, sourceName string) T

Types

type BasicAuth

type BasicAuth struct {
	Username string
	Password string
}

BasicAuth request's Authorization header

type CtxArg

type CtxArg string
const (
	CtxSourceArg CtxArg = "source"
)

type CustomRateLimit

type CustomRateLimit struct {
	Custom mapsutil.SyncLockMap[string, uint]
}

type KeyRequirement added in v2.12.0

type KeyRequirement int

KeyRequirement represents the API key requirement level for a source

const (
	NoKey KeyRequirement = iota
	OptionalKey
	RequiredKey
)

type RegexSubdomainExtractor

type RegexSubdomainExtractor struct {
	// contains filtered or unexported fields
}

RegexSubdomainExtractor is a concrete implementation of the SubdomainExtractor interface, using regex for extraction.

func NewSubdomainExtractor

func NewSubdomainExtractor(domain string) (*RegexSubdomainExtractor, error)

NewSubdomainExtractor creates a new regular expression to extract subdomains from text based on the given domain.

func (*RegexSubdomainExtractor) Extract

func (re *RegexSubdomainExtractor) Extract(text string) []string

Extract implements the SubdomainExtractor interface, using the regex to find subdomains in the given text.

type Result

type Result struct {
	Type   ResultType
	Source string
	Value  string
	Error  error
}

Result is a result structure returned by a source

type ResultType

type ResultType int

ResultType is the type of result returned by the source

const (
	Subdomain ResultType = iota
	Error
)

Types of results returned by the source

type Session

type Session struct {
	//SubdomainExtractor
	Extractor SubdomainExtractor
	// Client is the current http client
	Client *http.Client
	// Rate limit instance
	MultiRateLimiter *ratelimit.MultiLimiter
}

Session is the option passed to the source, an option is created uniquely for each source.

func NewSession

func NewSession(domain string, proxy string, multiRateLimiter *ratelimit.MultiLimiter, timeout int) (*Session, error)

NewSession creates a new session object for a domain

func (*Session) Close

func (s *Session) Close()

Close the session

func (*Session) DiscardHTTPResponse

func (s *Session) DiscardHTTPResponse(response *http.Response)

DiscardHTTPResponse discards the response content by demand

func (*Session) Get

func (s *Session) Get(ctx context.Context, getURL, cookies string, headers map[string]string) (*http.Response, error)

Get makes a GET request to a URL with extended parameters

func (*Session) HTTPRequest

func (s *Session) HTTPRequest(ctx context.Context, method, requestURL, cookies string, headers map[string]string, body io.Reader, basicAuth BasicAuth) (*http.Response, error)

HTTPRequest makes any HTTP request to a URL with extended parameters

func (*Session) Post

func (s *Session) Post(ctx context.Context, postURL, cookies string, headers map[string]string, body io.Reader) (*http.Response, error)

Post makes a POST request to a URL with extended parameters

func (*Session) SimpleGet

func (s *Session) SimpleGet(ctx context.Context, getURL string) (*http.Response, error)

SimpleGet makes a simple GET request to a URL

func (*Session) SimplePost

func (s *Session) SimplePost(ctx context.Context, postURL, contentType string, body io.Reader) (*http.Response, error)

SimplePost makes a simple POST request to a URL

type Source

type Source interface {
	// Run takes a domain as argument and a session object
	// which contains the extractor for subdomains, http client
	// and other stuff.
	Run(context.Context, string, *Session) <-chan Result

	// Name returns the name of the source. It is preferred to use lower case names.
	Name() string

	// IsDefault returns true if the current source should be
	// used as part of the default execution.
	IsDefault() bool

	// HasRecursiveSupport returns true if the current source
	// accepts subdomains (e.g. subdomain.domain.tld),
	// not just root domains.
	HasRecursiveSupport() bool

	// KeyRequirement returns the API key requirement level for this source
	KeyRequirement() KeyRequirement

	// NeedsKey returns true if the source requires an API key.
	// Deprecated: Use KeyRequirement() instead for more granular control.
	NeedsKey() bool

	AddApiKeys([]string)

	// Statistics returns the scrapping statistics for the source
	Statistics() Statistics
}

Source is an interface inherited by each passive source

type Statistics

type Statistics struct {
	TimeTaken time.Duration
	Requests  int
	Errors    int
	Results   int
	Skipped   bool
}

Statistics contains statistics about the scraping process

type SubdomainExtractor

type SubdomainExtractor interface {
	Extract(text string) []string
}

SubdomainExtractor is an interface that defines the contract for subdomain extraction.

Directories

Path Synopsis
sources
alienvault
Package alienvault logic
Package alienvault logic
anubis
Package anubis logic
Package anubis logic
bevigil
Package bevigil logic
Package bevigil logic
bufferover
Package bufferover is a bufferover Scraping Engine in Golang
Package bufferover is a bufferover Scraping Engine in Golang
builtwith
Package builtwith logic
Package builtwith logic
c99
Package c99 logic
Package c99 logic
censys
Package censys logic
Package censys logic
certspotter
Package certspotter logic
Package certspotter logic
chaos
Package chaos logic
Package chaos logic
commoncrawl
Package commoncrawl logic
Package commoncrawl logic
crtsh
Package crtsh logic
Package crtsh logic
digitorus
Package waybackarchive logic
Package waybackarchive logic
dnsdb
Package dnsdb logic
Package dnsdb logic
dnsdumpster
Package dnsdumpster logic
Package dnsdumpster logic
domainsproject
Package domainsproject logic
Package domainsproject logic
fofa
Package fofa logic
Package fofa logic
github
Package github GitHub search package Based on gwen001's https://github.com/gwen001/github-search github-subdomains
Package github GitHub search package Based on gwen001's https://github.com/gwen001/github-search github-subdomains
hackertarget
Package hackertarget logic
Package hackertarget logic
hudsonrock
Package hudsonrock logic
Package hudsonrock logic
intelx
Package intelx logic
Package intelx logic
leakix
Package leakix logic
Package leakix logic
merklemap
Package merklemap logic
Package merklemap logic
netlas
Package netlas logic
Package netlas logic
onyphe
Package onyphe logic
Package onyphe logic
profundis
Package profundis logic
Package profundis logic
pugrecon
Package pugrecon logic
Package pugrecon logic
quake
Package quake logic
Package quake logic
rapiddns
Package rapiddns is a RapidDNS Scraping Engine in Golang
Package rapiddns is a RapidDNS Scraping Engine in Golang
reconcloud
Package reconcloud logic
Package reconcloud logic
redhuntlabs
Package redhuntlabs logic
Package redhuntlabs logic
riddler
Package riddler logic
Package riddler logic
robtex
Package robtex logic
Package robtex logic
securitytrails
Package securitytrails logic
Package securitytrails logic
shodan
Package shodan logic
Package shodan logic
sitedossier
Package sitedossier logic
Package sitedossier logic
thc
Package thc logic
Package thc logic
threatbook
Package threatbook logic
Package threatbook logic
threatminer
Package threatminer logic
Package threatminer logic
virustotal
Package virustotal logic
Package virustotal logic
waybackarchive
Package waybackarchive logic
Package waybackarchive logic
whoisxmlapi
Package whoisxmlapi logic
Package whoisxmlapi logic
windvane
Package windvane logic
Package windvane logic

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL