query

package
v0.12.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 22, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Index

Constants

View Source
const ENAFileName = "ena_20250506.parquet"

ENAFileName is the name of the ENA metadata parquet file on disk.

Variables

This section is empty.

Functions

func BuildENALookup added in v0.12.0

func BuildENALookup(dir string, f ENAFilter, keep map[string]struct{}) (map[string]ENARecord, error)

BuildENALookup streams ena_20250506.parquet and returns a map of sample_accession -> ENARecord for every row matching the filter.

When keep is non-nil, rows whose sample_accession is not in keep are skipped, so callers that already have a result set only materialise records they will use. First match wins when a sample has multiple ENA rows.

Returns (nil, nil) when the filter is inactive and keep is nil, so callers can skip the scan entirely.

func BuildENASampleSet added in v0.11.0

func BuildENASampleSet(dir string, f ENAFilter) (map[string]struct{}, error)

BuildENASampleSet is the set-only variant of BuildENALookup. Retained for callers that only need the intersection key set.

func SortResults

func SortResults(rows []ResultRow, sortBy string, desc bool)

SortResults sorts rows in place by the given column. If sortBy is empty, it is a no-op. Numeric values are compared numerically; everything else falls back to lexicographic string comparison.

Types

type ENAFilter added in v0.11.0

type ENAFilter struct {
	Country            string
	Platform           string
	CollectionDateFrom string
	CollectionDateTo   string
}

ENAFilter holds optional filters that require joining against the ENA metadata table. It is the subset of Filters that any command can reuse without pulling in the full query Filters struct.

func (ENAFilter) Active added in v0.11.0

func (f ENAFilter) Active() bool

Active reports whether any ENA filter is set.

type ENARecord added in v0.12.0

type ENARecord struct {
	Country            string
	CollectionDate     string
	InstrumentPlatform string
}

ENARecord is the subset of ENA columns exposed in output rows. It is deliberately narrow so that enrichment maps stay small even when the ENA table has millions of entries.

type FilterFile

type FilterFile struct {
	Filter Filters      `toml:"filter"`
	Output OutputConfig `toml:"output"`
}

FilterFile is the top-level structure for a TOML filter file.

func LoadFilterFile

func LoadFilterFile(path string) (*FilterFile, error)

LoadFilterFile parses a TOML filter file at the given path.

type Filters

type Filters struct {
	Species            string   `toml:"species"`
	SpeciesLike        string   `toml:"species_like"`
	Genus              string   `toml:"genus"`
	Samples            []string `toml:"samples"`
	SampleFile         string   `toml:"sample_file"`
	HQOnly             bool     `toml:"hq_only"`
	MinCompleteness    float64  `toml:"min_completeness"`
	MaxContamination   float64  `toml:"max_contamination"`
	MinN50             int64    `toml:"min_n50"`
	Dataset            string   `toml:"dataset"`
	HasAssembly        bool     `toml:"has_assembly"`
	Country            string   `toml:"country"`
	Platform           string   `toml:"platform"`
	CollectionDateFrom string   `toml:"collection_date_from"`
	CollectionDateTo   string   `toml:"collection_date_to"`
	// contains filtered or unexported fields
}

Filters holds all query filter criteria.

func (*Filters) HasSampleFilter

func (f *Filters) HasSampleFilter() bool

HasSampleFilter reports whether any sample filter is active.

func (*Filters) LoadSampleFile

func (f *Filters) LoadSampleFile() error

LoadSampleFile reads sample accessions from the file referenced by SampleFile. Lines starting with # and blank lines are skipped.

func (*Filters) MatchesSpecies

func (f *Filters) MatchesSpecies(species string) bool

MatchesSpecies performs a case-insensitive exact match against the Species filter. An empty filter matches everything.

func (*Filters) MatchesSpeciesLike

func (f *Filters) MatchesSpeciesLike(species string) bool

MatchesSpeciesLike performs a wildcard match using % (prefix, suffix, contains) against the SpeciesLike filter. An empty filter matches everything.

func (*Filters) NeedsAssemblyStats

func (f *Filters) NeedsAssemblyStats() bool

NeedsAssemblyStats reports whether assembly statistics are required.

func (*Filters) NeedsCheckM2

func (f *Filters) NeedsCheckM2() bool

NeedsCheckM2 reports whether CheckM2 quality metrics are required.

func (*Filters) NeedsENA

func (f *Filters) NeedsENA() bool

NeedsENA reports whether ENA metadata (country, platform, dates) is required.

func (*Filters) NeedsSylph

func (f *Filters) NeedsSylph() bool

NeedsSylph is reserved for future use and always returns false.

func (*Filters) SampleSet

func (f *Filters) SampleSet() map[string]struct{}

SampleSet returns a deduplicated set of all sample accessions from both the Samples slice and any loaded sample file.

type OutputConfig

type OutputConfig struct {
	Columns  []string `toml:"columns"`
	SortBy   string   `toml:"sort_by"`
	SortDesc bool     `toml:"sort_desc"`
	Limit    int      `toml:"limit"`
	Offset   int      `toml:"offset"`
	Format   string   `toml:"format"`
	Output   string   `toml:"output"`
}

OutputConfig controls how query results are formatted and written.

type QueryPlan

type QueryPlan struct {
	Tables []string
}

QueryPlan describes which parquet tables to read for a query.

func Plan

func Plan(filters Filters, columns []string) QueryPlan

Plan determines which parquet tables are needed based on filters and output columns. Tables are returned in canonical order: assembly, checkm2, assembly_stats, sylph, run, ena_20250506.

type ResultRow

type ResultRow map[string]string

ResultRow is a single query result as a map of column name to string value.

func Execute

func Execute(dataDir string, filters Filters, columns []string) ([]ResultRow, error)

Execute runs a full query against the parquet data directory and returns matching rows.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL