Documentation
¶
Overview ¶
Package scraping provides primitives to interact with the openapi HTTP API.
Code generated by github.com/oapi-codegen/oapi-codegen/v2 version v2.5.1 DO NOT EDIT.
Index ¶
- func DownloadContent(ctx context.Context, uri string, securityConfig *ContentSecurityConfig, ...) (string, []byte, error)
- func GetSwagger() (swagger *openapi3.T, err error)
- func ParseDataURI(uri string) (string, []byte, error)
- func PathToRawSpec(pathToFile string) map[string]func() ([]byte, error)
- type ContentSecurityConfig
- type HTTPCredentialConfig
- type HTTPError
- type RemoteContentConfig
- type S3CredentialConfig
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DownloadContent ¶
func DownloadContent( ctx context.Context, uri string, securityConfig *ContentSecurityConfig, s3Creds *s3.Credentials, ) (string, []byte, error)
DownloadContent downloads content from a URL and returns the content type and data. Supports data:, http://, https://, file://, and s3:// URLs.
func GetSwagger ¶
GetSwagger returns the Swagger specification corresponding to the generated code in this file. The external references of Swagger specification are resolved. The logic of resolving external references is tightly connected to "import-mapping" feature. Externally referenced files must be embedded in the corresponding golang packages. Urls can be supported but this task was out of the scope.
func ParseDataURI ¶
ParseDataURI returns the content type and bytes of the data uri.
Types ¶
type ContentSecurityConfig ¶
type ContentSecurityConfig struct {
// AllowedHosts Whitelist of allowed hostnames/IPs for link downloads. If empty, all hosts are allowed (except private IPs if block_private_ips is true).
AllowedHosts []string `json:"allowed_hosts,omitempty,omitzero"`
// AllowedPaths Whitelist of allowed path prefixes for file:// and s3:// URLs. If empty, all paths are allowed. For file:// use absolute paths (e.g., /Users/data/). For s3:// use bucket/prefix (e.g., my-bucket/uploads/).
AllowedPaths []string `json:"allowed_paths,omitempty,omitzero"`
// BlockPrivateIps Block requests to private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
BlockPrivateIps bool `json:"block_private_ips,omitempty,omitzero"`
// DownloadTimeoutSeconds Timeout for individual download operations in seconds
DownloadTimeoutSeconds int `json:"download_timeout_seconds,omitempty,omitzero"`
// MaxDownloadSizeBytes Maximum size of downloaded content in bytes
MaxDownloadSizeBytes int64 `json:"max_download_size_bytes,omitempty,omitzero"`
// MaxImageDimension Maximum image width/height in pixels (images will be resized)
MaxImageDimension int `json:"max_image_dimension,omitempty,omitzero"`
// UserAgent User-Agent header for HTTP downloads. Defaults to 'AntflyDB/1.0' if not set. Some servers (e.g., Wikipedia) reject requests without a User-Agent.
UserAgent string `json:"user_agent,omitempty,omitzero"`
}
ContentSecurityConfig defines model for ContentSecurityConfig.
type HTTPCredentialConfig ¶
type HTTPCredentialConfig struct {
// BaseUrl Base URL prefix this credential applies to.
BaseUrl string `json:"base_url,omitempty,omitzero"`
// Headers HTTP headers to include. Supports keystore syntax (e.g., "${secret:token}").
Headers map[string]string `json:"headers,omitempty,omitzero"`
Security ContentSecurityConfig `json:"security,omitempty,omitzero"`
}
HTTPCredentialConfig HTTP credential for authenticated endpoints.
type HTTPError ¶
HTTPError represents an HTTP response with a non-200 status code. Callers can inspect StatusCode to classify the error (e.g., 404 vs 503).
type RemoteContentConfig ¶
type RemoteContentConfig struct {
// DefaultS3 Default S3 credential name when no bucket pattern matches.
DefaultS3 string `json:"default_s3,omitempty,omitzero"`
// Http Named HTTP credentials for authenticated endpoints.
Http map[string]HTTPCredentialConfig `json:"http,omitempty,omitzero"`
// S3 Named S3 credentials for remote content fetching.
S3 map[string]S3CredentialConfig `json:"s3,omitempty,omitzero"`
Security ContentSecurityConfig `json:"security,omitempty,omitzero"`
}
RemoteContentConfig Configuration for remote content fetching (remotePDF, remoteMedia, remoteText templates). Consolidates S3 credentials and security settings separate from backup storage.
**Credential Resolution Order:** 1. Explicit `credentials="name"` parameter in template 2. First credential where `buckets` glob pattern matches URL's bucket 3. `default_s3` credential 4. Legacy fallback: `storage.s3` credentials (backward compatibility)
type S3CredentialConfig ¶
type S3CredentialConfig struct {
// AccessKeyId AWS access key ID. Supports keystore syntax for secret lookup. Falls back to AWS_ACCESS_KEY_ID environment variable if not set.
AccessKeyId string `json:"access_key_id,omitempty,omitzero"`
// Buckets Glob patterns for bucket names this credential handles. When a URL matches a pattern, this credential is auto-selected.
Buckets []string `json:"buckets,omitempty,omitzero"`
// Endpoint S3-compatible endpoint (e.g., 's3.amazonaws.com' or 'localhost:9000' for MinIO)
Endpoint string `json:"endpoint,omitempty,omitzero"`
// SecretAccessKey AWS secret access key. Supports keystore syntax for secret lookup. Falls back to AWS_SECRET_ACCESS_KEY environment variable if not set.
SecretAccessKey string `json:"secret_access_key,omitempty,omitzero"`
Security ContentSecurityConfig `json:"security,omitempty,omitzero"`
// SessionToken Optional AWS session token for temporary credentials. Supports keystore syntax for secret lookup.
SessionToken string `json:"session_token,omitempty,omitzero"`
// UseSsl Enable SSL/TLS for S3 connections (default: true for AWS, false for local MinIO)
UseSsl bool `json:"use_ssl,omitempty,omitzero"`
}
S3CredentialConfig defines model for S3CredentialConfig.