crawler

package
v1.1.17 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 21, 2026 License: AGPL-3.0 Imports: 18 Imported by: 0

Documentation

Overview

Package crawler provides bot/crawler defenses for Lesser's public HTTP surface (e.g. robots.txt, request classification, and category-based rate limiting).

Index

Constants

View Source
const RobotsTxt = `` /* 1208-byte string literal not displayed */

RobotsTxt is the robots.txt policy for Lesser instances.

It is intentionally restrictive by default to reduce bot-driven costs, while allowing common search engines at low rates and preserving human access.

Variables

This section is empty.

Functions

func Middleware

func Middleware(mode protectionMode, logger *zap.Logger) apptheory.Middleware

Middleware classifies requests and (optionally) logs the classification.

func NewMiddleware

func NewMiddleware(logger *zap.Logger) apptheory.Middleware

NewMiddleware returns the crawler middleware configured by environment.

Modes (CRAWLER_PROTECTION_MODE):

  • off (default): no classification, no logging
  • observe: classify requests, log the classification, do not enforce
  • limit: enforce rate limits for search engines + generic bots
  • block: enforce limits + block known AI crawlers

func RobotsHandler

func RobotsHandler(*apptheory.Context) (*apptheory.Response, error)

RobotsHandler returns robots.txt with aggressive caching.

Types

type Category

type Category int

Category represents the classification of a request for crawler controls.

const (
	CategoryHuman Category = iota
	CategoryFederation
	CategorySearchEngine
	CategoryAICrawler
	CategoryGenericBot
	CategorySuspicious
)

Classification categories.

func ClassifyRequest

func ClassifyRequest(userAgent, acceptHeader, path string) (Category, string)

ClassifyRequest determines the category of a request based on user agent, accept header, and path. It is intentionally pure (no AWS/DB deps) so it can be used across Lambdas.

Priority rule: explicit AI crawler UA matches always win and must not be bypassable by setting ActivityPub-ish accept headers.

func (Category) String

func (c Category) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL