scraping

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 16, 2026 License: Apache-2.0 Imports: 18 Imported by: 2

Documentation

Overview

Package scraping provides primitives to interact with the openapi HTTP API.

Code generated by github.com/oapi-codegen/oapi-codegen/v2 version v2.5.1 DO NOT EDIT.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DownloadContent

func DownloadContent(
	ctx context.Context,
	uri string,
	securityConfig *ContentSecurityConfig,
	s3Creds *s3.Credentials,
) (string, []byte, error)

DownloadContent downloads content from a URL and returns the content type and data. Supports data:, http://, https://, file://, and s3:// URLs.

func GetSwagger

func GetSwagger() (swagger *openapi3.T, err error)

GetSwagger returns the Swagger specification corresponding to the generated code in this file. The external references of Swagger specification are resolved. The logic of resolving external references is tightly connected to "import-mapping" feature. Externally referenced files must be embedded in the corresponding golang packages. Urls can be supported but this task was out of the scope.

func ParseDataURI

func ParseDataURI(uri string) (string, []byte, error)

ParseDataURI returns the content type and bytes of the data uri.

func PathToRawSpec

func PathToRawSpec(pathToFile string) map[string]func() ([]byte, error)

Constructs a synthetic filesystem for resolving external references when loading openapi specifications.

Types

type ContentSecurityConfig

type ContentSecurityConfig struct {
	// AllowedHosts Whitelist of allowed hostnames/IPs for link downloads. If empty, all hosts are allowed (except private IPs if block_private_ips is true).
	AllowedHosts []string `json:"allowed_hosts,omitempty,omitzero"`

	// AllowedPaths Whitelist of allowed path prefixes for file:// and s3:// URLs. If empty, all paths are allowed. For file:// use absolute paths (e.g., /Users/data/). For s3:// use bucket/prefix (e.g., my-bucket/uploads/).
	AllowedPaths []string `json:"allowed_paths,omitempty,omitzero"`

	// BlockPrivateIps Block requests to private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
	BlockPrivateIps bool `json:"block_private_ips,omitempty,omitzero"`

	// DownloadTimeoutSeconds Timeout for individual download operations in seconds
	DownloadTimeoutSeconds int `json:"download_timeout_seconds,omitempty,omitzero"`

	// MaxDownloadSizeBytes Maximum size of downloaded content in bytes
	MaxDownloadSizeBytes int64 `json:"max_download_size_bytes,omitempty,omitzero"`

	// MaxImageDimension Maximum image width/height in pixels (images will be resized)
	MaxImageDimension int `json:"max_image_dimension,omitempty,omitzero"`

	// UserAgent User-Agent header for HTTP downloads. Defaults to 'AntflyDB/1.0' if not set. Some servers (e.g., Wikipedia) reject requests without a User-Agent.
	UserAgent string `json:"user_agent,omitempty,omitzero"`
}

ContentSecurityConfig defines model for ContentSecurityConfig.

type HTTPCredentialConfig

type HTTPCredentialConfig struct {
	// BaseUrl Base URL prefix this credential applies to.
	BaseUrl string `json:"base_url,omitempty,omitzero"`

	// Headers HTTP headers to include. Supports keystore syntax (e.g., "${secret:token}").
	Headers  map[string]string     `json:"headers,omitempty,omitzero"`
	Security ContentSecurityConfig `json:"security,omitempty,omitzero"`
}

HTTPCredentialConfig HTTP credential for authenticated endpoints.

type HTTPError

type HTTPError struct {
	StatusCode int
	Status     string
}

HTTPError represents an HTTP response with a non-200 status code. Callers can inspect StatusCode to classify the error (e.g., 404 vs 503).

func (*HTTPError) Error

func (e *HTTPError) Error() string

type RemoteContentConfig

type RemoteContentConfig struct {
	// DefaultS3 Default S3 credential name when no bucket pattern matches.
	DefaultS3 string `json:"default_s3,omitempty,omitzero"`

	// Http Named HTTP credentials for authenticated endpoints.
	Http map[string]HTTPCredentialConfig `json:"http,omitempty,omitzero"`

	// S3 Named S3 credentials for remote content fetching.
	S3       map[string]S3CredentialConfig `json:"s3,omitempty,omitzero"`
	Security ContentSecurityConfig         `json:"security,omitempty,omitzero"`
}

RemoteContentConfig Configuration for remote content fetching (remotePDF, remoteMedia, remoteText templates). Consolidates S3 credentials and security settings separate from backup storage.

**Credential Resolution Order:** 1. Explicit `credentials="name"` parameter in template 2. First credential where `buckets` glob pattern matches URL's bucket 3. `default_s3` credential 4. Legacy fallback: `storage.s3` credentials (backward compatibility)

type S3CredentialConfig

type S3CredentialConfig struct {
	// AccessKeyId AWS access key ID. Supports keystore syntax for secret lookup. Falls back to AWS_ACCESS_KEY_ID environment variable if not set.
	AccessKeyId string `json:"access_key_id,omitempty,omitzero"`

	// Buckets Glob patterns for bucket names this credential handles. When a URL matches a pattern, this credential is auto-selected.
	Buckets []string `json:"buckets,omitempty,omitzero"`

	// Endpoint S3-compatible endpoint (e.g., 's3.amazonaws.com' or 'localhost:9000' for MinIO)
	Endpoint string `json:"endpoint,omitempty,omitzero"`

	// SecretAccessKey AWS secret access key. Supports keystore syntax for secret lookup. Falls back to AWS_SECRET_ACCESS_KEY environment variable if not set.
	SecretAccessKey string                `json:"secret_access_key,omitempty,omitzero"`
	Security        ContentSecurityConfig `json:"security,omitempty,omitzero"`

	// SessionToken Optional AWS session token for temporary credentials. Supports keystore syntax for secret lookup.
	SessionToken string `json:"session_token,omitempty,omitzero"`

	// UseSsl Enable SSL/TLS for S3 connections (default: true for AWS, false for local MinIO)
	UseSsl bool `json:"use_ssl,omitempty,omitzero"`
}

S3CredentialConfig defines model for S3CredentialConfig.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL