Documentation
¶
Index ¶
- Constants
- func BuildENALookup(dir string, f ENAFilter, keep map[string]struct{}) (map[string]ENARecord, error)
- func BuildENASampleSet(dir string, f ENAFilter) (map[string]struct{}, error)
- func SortResults(rows []ResultRow, sortBy string, desc bool)
- type ENAFilter
- type ENARecord
- type FilterFile
- type Filters
- func (f *Filters) HasSampleFilter() bool
- func (f *Filters) LoadSampleFile() error
- func (f *Filters) MatchesSpecies(species string) bool
- func (f *Filters) MatchesSpeciesLike(species string) bool
- func (f *Filters) NeedsAssemblyStats() bool
- func (f *Filters) NeedsCheckM2() bool
- func (f *Filters) NeedsENA() bool
- func (f *Filters) NeedsSylph() bool
- func (f *Filters) SampleSet() map[string]struct{}
- type OutputConfig
- type QueryPlan
- type ResultRow
Constants ¶
const ENAFileName = "ena_20250506.parquet"
ENAFileName is the name of the ENA metadata parquet file on disk.
Variables ¶
This section is empty.
Functions ¶
func BuildENALookup ¶ added in v0.12.0
func BuildENALookup(dir string, f ENAFilter, keep map[string]struct{}) (map[string]ENARecord, error)
BuildENALookup streams ena_20250506.parquet and returns a map of sample_accession -> ENARecord for every row matching the filter.
When keep is non-nil, rows whose sample_accession is not in keep are skipped, so callers that already have a result set only materialise records they will use. First match wins when a sample has multiple ENA rows.
Returns (nil, nil) when the filter is inactive and keep is nil, so callers can skip the scan entirely.
func BuildENASampleSet ¶ added in v0.11.0
BuildENASampleSet is the set-only variant of BuildENALookup. Retained for callers that only need the intersection key set.
func SortResults ¶
SortResults sorts rows in place by the given column. If sortBy is empty, it is a no-op. Numeric values are compared numerically; everything else falls back to lexicographic string comparison.
Types ¶
type ENAFilter ¶ added in v0.11.0
type ENAFilter struct {
Country string
Platform string
CollectionDateFrom string
CollectionDateTo string
}
ENAFilter holds optional filters that require joining against the ENA metadata table. It is the subset of Filters that any command can reuse without pulling in the full query Filters struct.
type ENARecord ¶ added in v0.12.0
ENARecord is the subset of ENA columns exposed in output rows. It is deliberately narrow so that enrichment maps stay small even when the ENA table has millions of entries.
type FilterFile ¶
type FilterFile struct {
Filter Filters `toml:"filter"`
Output OutputConfig `toml:"output"`
}
FilterFile is the top-level structure for a TOML filter file.
func LoadFilterFile ¶
func LoadFilterFile(path string) (*FilterFile, error)
LoadFilterFile parses a TOML filter file at the given path.
type Filters ¶
type Filters struct {
Species string `toml:"species"`
SpeciesLike string `toml:"species_like"`
Genus string `toml:"genus"`
Samples []string `toml:"samples"`
SampleFile string `toml:"sample_file"`
HQOnly bool `toml:"hq_only"`
MinCompleteness float64 `toml:"min_completeness"`
MaxContamination float64 `toml:"max_contamination"`
MinN50 int64 `toml:"min_n50"`
Dataset string `toml:"dataset"`
HasAssembly bool `toml:"has_assembly"`
Country string `toml:"country"`
Platform string `toml:"platform"`
CollectionDateFrom string `toml:"collection_date_from"`
CollectionDateTo string `toml:"collection_date_to"`
// contains filtered or unexported fields
}
Filters holds all query filter criteria.
func (*Filters) HasSampleFilter ¶
HasSampleFilter reports whether any sample filter is active.
func (*Filters) LoadSampleFile ¶
LoadSampleFile reads sample accessions from the file referenced by SampleFile. Lines starting with # and blank lines are skipped.
func (*Filters) MatchesSpecies ¶
MatchesSpecies performs a case-insensitive exact match against the Species filter. An empty filter matches everything.
func (*Filters) MatchesSpeciesLike ¶
MatchesSpeciesLike performs a wildcard match using % (prefix, suffix, contains) against the SpeciesLike filter. An empty filter matches everything.
func (*Filters) NeedsAssemblyStats ¶
NeedsAssemblyStats reports whether assembly statistics are required.
func (*Filters) NeedsCheckM2 ¶
NeedsCheckM2 reports whether CheckM2 quality metrics are required.
func (*Filters) NeedsENA ¶
NeedsENA reports whether ENA metadata (country, platform, dates) is required.
func (*Filters) NeedsSylph ¶
NeedsSylph is reserved for future use and always returns false.
type OutputConfig ¶
type OutputConfig struct {
Columns []string `toml:"columns"`
SortBy string `toml:"sort_by"`
SortDesc bool `toml:"sort_desc"`
Limit int `toml:"limit"`
Offset int `toml:"offset"`
Format string `toml:"format"`
Output string `toml:"output"`
}
OutputConfig controls how query results are formatted and written.
type QueryPlan ¶
type QueryPlan struct {
Tables []string
}
QueryPlan describes which parquet tables to read for a query.