caddyknownagents

package module
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 7, 2026 License: Apache-2.0, MIT Imports: 13 Imported by: 0

README

Caddy Knownagents Module

Go Reference

A super simple Caddy module for interacting with the Known Agents API.

Building

To compile this Caddy module, follow the instructions from Build from Source and import the github.com/polykernel/caddy-knownagents module.

Configuration

Syntax
knownagents {
  access_token <token>
  robots_txt {
    agent_types <types...>
    disallow <path>
  }
}
  • access_token sets the OAuth authorization token used to communicate with the Known Agents API. Global placeholders are supported in the argument.
  • robots_txt enables generation of robots.txt derived from agent analytics data using the Known Agents API.
    • agent_types specifies a list of agent types to be blocked by the generated robots.txt. The special token "*" is supported as an argument which resolves to all documented agent types. Note: when "*" is passed, there must be no further arguments.
    • disallow specifies the path to disallow for the specified agent types. Default: /.

If the robots_txt block is configured, then the special variable http.vars.dv_robots_txt in the HTTP request context will be set to the raw content of the robots.txt returned by the Known Agents API. Note: the robots.txt query is performed once during the provision phase of the module lifecycle and cached thereafter.

By default, the knownagents directive is ordered before header in the Caddyfile. This ensures that the raw request content (sensitive data such as cookies are still stripped) is used to build a visit event. If this order does not fit your needs, you can change the order using the global order directive. For example:

{
  order knownagents before handle
}
Example

A basic Caddyfile configuration is provided below:

knownagents {
  access_token {env.KA_ACCESS_TOKEN}
  robots_txt {
    agent_types "AI Assistant" "AI Data Scraper"
    disallow /
  }
}

License

Copyright (c) 2024 polykernel

The source code in this repository is made available under the MIT or Apache 2.0 license.

Documentation

Index

Constants

View Source
const AnalyticsEndpoint = "https://api.knownagents.com/visits"

The address for the Known Agents agent analytics API endpoint.

View Source
const RobotsTxtEndpoint = "https://api.knownagents.com/robots-txts"

The address for the Known Agents robots.txt generation API endpoint.

Variables

This section is empty.

Functions

This section is empty.

Types

type AgentType

type AgentType = string

AgentTypes are groups of agent classified by the Known Agents API.

const (
	AIAssistant          AgentType = "AI Assistant"
	AIDataScraper        AgentType = "AI Data Scraper"
	AISearchCrawler      AgentType = "AI Search Crawler"
	Archiver             AgentType = "Archiver"
	DeveloperHelper      AgentType = "Developer Helper"
	Fetcher              AgentType = "Fetcher"
	HeadlessBrowser      AgentType = "Headless Browser"
	IntelligenceGatherer AgentType = "Intelligence Gatherer"
	Scraper              AgentType = "Scraper"
	SearchEngineCrawlers AgentType = "Search Engine Crawler"
	SEOCrawler           AgentType = "SEO Crawler"
	Uncategorized        AgentType = "Uncategorized"
	UndocumentedAIAgent  AgentType = "Undocumented AI Agent"
)

type Knownagents

type Knownagents struct {
	// The access token used to authenticate to the Known Agents agent
	// analytics API endpoint.
	AccessToken string `json:"access_token"`

	// Enables generation of robots.txt derived from agent analytics data using
	// the Known Agents robots.txt generation API endpoint.
	RobotsTxt *RobotsTxt `json:"robots_txt,omitempty"`
	// contains filtered or unexported fields
}

Knownagents is a middleware which implements a HTTP handler that sends HTTP request information as visit events to the Known Agents API.

Its API is still experimental and may be subject to change.

func (Knownagents) CaddyModule

func (Knownagents) CaddyModule() caddy.ModuleInfo

CaddyModule returns the Caddy module information.

func (*Knownagents) FetchRobotsTxt

func (m *Knownagents) FetchRobotsTxt(ctx caddy.Context) error

FetchRobotsTxt queries the Known Agents robots.txt generation API endpoint and stores the returned robots.txt content.

func (*Knownagents) Provision

func (m *Knownagents) Provision(ctx caddy.Context) error

Provision implements caddy.Provisioner.

func (Knownagents) ServeHTTP

func (m Knownagents) ServeHTTP(
	w http.ResponseWriter,
	r *http.Request,
	next caddyhttp.Handler,
) error

ServeHTTP implements caddyhttp.MiddlewareHandler.

func (*Knownagents) UnmarshalCaddyfile

func (m *Knownagents) UnmarshalCaddyfile(d *caddyfile.Dispenser) error

UnmarshalCaddyfile implements caddyfile.Unmarshaler.

func (Knownagents) Validate

func (m Knownagents) Validate() error

Validate implements caddy.Validator.

type RobotsTxt

type RobotsTxt struct {
	// A list of agent types to block.
	AgentTypes []AgentType `json:"agent_types"`

	// The path to disallow access for the specified agent types.
	Disallow string `json:"disallow,omitempty"`
	// contains filtered or unexported fields
}

RobotsTxt configures automated generation of robots.txt via the Known Agents API.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL