Documentation
¶
Overview ¶
Package sources provides configuration and validation for SFGA data sources.
This package defines the schema for sources.yaml, which users provide to specify which SFGA (Standard Format for Global Archiving) data sources to import. It handles source configuration validation, filtering, and metadata extraction from SFGA filenames.
See sources-yaml-spec.md for the complete sources.yaml specification.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ExtractOutlinkID ¶
ExtractOutlinkID extracts the outlink ID from a column value. If the column name ends with "col__alternative_id", it extracts the value after "gnoutlink:" from a comma-separated list of namespace:value pairs. Otherwise, it returns the value as-is.
Examples:
- ExtractOutlinkID("taxon.col__id", "12345") → "12345"
- ExtractOutlinkID("taxon.col__alternative_id", "wikidata:Q123,gnoutlink:Homo_sapiens") → "Homo_sapiens"
- ExtractOutlinkID("name.col__alternative_id", "gnoutlink:url-encoded-name") → "url-encoded-name"
- ExtractOutlinkID("taxon.col__alternative_id", "wikidata:Q123") → "" (no gnoutlink namespace)
Types ¶
type DataSourceConfig ¶
type DataSourceConfig struct {
// Core identification (required)
// ID identifies the data source. Convention: < 1000 = official, >= 1000 = custom
ID int `yaml:"id"`
// Parent is the directory or URL containing SFGA files for this source.
// Auto-detected: starts with http:// or https:// = URL, otherwise = directory
// SFGA files are matched by pattern: {4-digit-ID}*.zip or {ID}*.zip
// Examples:
// - http://opendata.globalnames.org/sfga/latest/
// - /home/user/data/sfga/
// - ~/data/sfga/
Parent string `yaml:"parent"`
// Titles and description (override SFGA if needed)
Title string `yaml:"title,omitempty"` // Override SFGA col__title
TitleShort string `yaml:"title_short,omitempty"` // Fallback: col__alias → truncate col__title
Description string `yaml:"description,omitempty"` // Override SFGA col__description
// URLs (override SFGA if needed)
HomeURL string `yaml:"home_url,omitempty"` // Override SFGA col__url
DataURL string `yaml:"data_url,omitempty"` // Download URL (not in SFGA)
// Curation level (quality indicators)
IsCurated bool `yaml:"is_curated,omitempty"` // Manually curated by experts
IsAutoCurated bool `yaml:"is_auto_curated,omitempty"` // Automatically validated
HasClassification bool `yaml:"has_classification,omitempty"` // Has hierarchical taxonomy
// Outlink configuration (for generating links to original records)
IsOutlinkReady bool `yaml:"-"` // Can generate outlinks
OutlinkURL string `yaml:"outlink_url,omitempty"` // URL template with {} placeholder
OutlinkIDColumn string `yaml:"outlink_id_column,omitempty"` // table.column format (e.g., "taxon.col__id", "name.col__alternative_id")
// PreferFlatClassification indicates that this source should use
// flat classification (no hierarchy) when imported.
PreferFlatClassification bool `yaml:"prefer_flat_classification"`
}
DataSourceConfig represents configuration for a single data source.
SFGA provides these fields (only override if needed):
- col__id (id)
- col__title (title)
- col__description (description)
- col__version (version) - NEVER in YAML, always from SFGA or filename
- col__issued (release_date) - NEVER in YAML, always from SFGA or filename
- col__url (home_url)
- col__doi, col__license, col__citation, etc.
NOT in SFGA (can be provided here):
- title_short (optional, falls back to col__alias or truncated col__title)
- data_url (optional download link)
- is_curated, is_auto_curated, has_classification (optional quality flags)
- outlink configuration (optional)
func (*DataSourceConfig) Validate ¶
func (d *DataSourceConfig) Validate(index int) ([]ValidationWarning, error)
Validate checks a single data source configuration for data structure validity. File system validation (directory existence) is deferred to runtime (I/O layer). Returns a slice of warnings (non-fatal issues) and an error (fatal issues).
type FileMetadata ¶
type FileMetadata struct {
ID int // Extracted from filename
Version string // Extracted from filename (if present)
ReleaseDate string // Extracted from filename in YYYY-MM-DD format (if present)
IsURL bool // True if file is a URL
}
FileMetadata contains metadata extracted from SFGA filename.
type Sources ¶
type Sources interface {
Load() (*SourcesConfig, error)
}
type SourcesConfig ¶
type SourcesConfig struct {
// DataSources is the list of data sources to import.
DataSources []DataSourceConfig `yaml:"data_sources"`
// Warnings holds non-fatal validation warnings (not serialized)
Warnings []ValidationWarning `yaml:"-"`
}
SourcesConfig represents the complete sources.yaml configuration file.
func (*SourcesConfig) Validate ¶
func (c *SourcesConfig) Validate() error
Validate checks the configuration for errors and applies defaults.