Documentation
¶
Overview ¶
Package crawler provides bot/crawler defenses for Lesser's public HTTP surface (e.g. robots.txt, request classification, and category-based rate limiting).
Index ¶
Constants ¶
const RobotsTxt = `` /* 1208-byte string literal not displayed */
RobotsTxt is the robots.txt policy for Lesser instances.
It is intentionally restrictive by default to reduce bot-driven costs, while allowing common search engines at low rates and preserving human access.
Variables ¶
This section is empty.
Functions ¶
func Middleware ¶
func Middleware(mode protectionMode, logger *zap.Logger) apptheory.Middleware
Middleware classifies requests and (optionally) logs the classification.
func NewMiddleware ¶
func NewMiddleware(logger *zap.Logger) apptheory.Middleware
NewMiddleware returns the crawler middleware configured by environment.
Modes (CRAWLER_PROTECTION_MODE):
- off (default): no classification, no logging
- observe: classify requests, log the classification, do not enforce
- limit: enforce rate limits for search engines + generic bots
- block: enforce limits + block known AI crawlers
Types ¶
type Category ¶
type Category int
Category represents the classification of a request for crawler controls.
const ( CategoryHuman Category = iota CategoryFederation CategorySearchEngine CategoryAICrawler CategoryGenericBot CategorySuspicious )
Classification categories.
func ClassifyRequest ¶
ClassifyRequest determines the category of a request based on user agent, accept header, and path. It is intentionally pure (no AWS/DB deps) so it can be used across Lambdas.
Priority rule: explicit AI crawler UA matches always win and must not be bypassable by setting ActivityPub-ish accept headers.