sflib

package module
v0.5.11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 2, 2026 License: MIT Imports: 11 Imported by: 10

README

SFlib

DOI

SFlib is a Go library containing shared functionality for Species File Group (SFG) projects. It primarily focuses on handling functionality for Species File Group Archives (SFGAs).

Overview

SFlib provides a unified interface for working with biodiversity data across several archive formats. It allows converting any supported format into any other one using SFGA as an intermediary.

Supported Archive Formats
Package Format Description
pkg/sfga SFGA SQLite-based lossless interchange format
pkg/coldp CoLDP Catalogue of Life Data Package
pkg/dwca DwCA Darwin Core Archive
pkg/xsv CSV / TSV / PSV Delimited values files with DwC headers
pkg/text Plain text One scientific name per line (UTF-8)

Each format exposes a consistent Archive interface with Fetch, Create, and Export methods, created via factory functions in the root package:

sflib.NewText(opts...)
sflib.NewXsv(opts...)
sflib.NewColdp(opts...)
sflib.NewDwca(opts...)
sflib.NewSfga(opts...)
Species File Group Archive (pkg/sfga)

SFGA is an SQLite-based archive format based heavily on CoLDP standard. SFGA is designed as a lossless interchange format, preserving all information from any format it converts to or from.

Packager API

SFGA Interface API

CoLDP Data Model (pkg/coldp)

The CoLDP package provides base models for both SFGA and CoLDP.

  • NameUsage — consolidated record combining Name, Taxon, and Synonym data
  • Name — scientific name with parsed components and nomenclatural metadata
  • Taxon — full taxonomic classification (Kingdom through Realm)
  • Reference — bibliographic references
  • Author — person/organisation metadata
  • Distribution — geographic distribution records
  • Vernacular — common/vernacular names
  • Synonym, NameRelation, TaxonConceptRelation — relationship types
  • Treatment, TypeMaterial, SpeciesEstimate, SpeciesInteraction, TaxonProperty, Media — extended data types

Comprehensive enumerated types are provided for taxonomic rank, status, nomenclatural status, habitat, sex, and more.

Packager API

CoLDP Interface API

Darwin Core Archive (pkg/dwca)

DwCA package allows to convert to and from Darwin Core Archives.

XSV Format (pkg/xsv)

XSV package reads and writes comma-separated, tab-separated and pipe-separated files (CSV,TSV,PSV files). It automatically detects field delimiters and maps DarwinCore and CoLDP column headers to SFGA.

Packager API

XSV Interface API

Text Format (pkg/text)

Text package reads and writes a plain text file where every line contains one scientific name.

Packager API

Text Interface API

Configuration
  • OptNomCode — nomenclatural code (Zoological, Botanical, Bacterial, …)
  • OptJobsNum — number of concurrent workers (default: 5)
  • OptBatchSize — records per batch (default: 50,000)
  • OptWithParents — reconstruct parent/child hierarchy from flat data
  • OptLocalSchemaPath — use a local SFGA schema file instead of fetching from GitHub

Installation

Requires Go 1.25 or later. No CGO or system libraries are required — SQLite support is provided by a pure-Go driver.

go get github.com/sfborg/sflib

Usage Examples

All formats use the same pattern: create an archive with a New* factory, call Fetch to load the source into a cache directory, then read records through a channel.

For more complete real-world usage, see sf — the primary tool built on top of SFlib.

Load names from a text file
import (
  "context"
  "sync"

  "github.com/gnames/gnlib/ent/nomcode"
  "github.com/sfborg/sflib"
  "github.com/sfborg/sflib/pkg/coldp"
)

a := sflib.NewText()
err := a.Fetch("names.txt", "/tmp/text-cache")

ch := make(chan coldp.NameUsage)
var wg sync.WaitGroup
wg.Add(1)
go func() {
  defer wg.Done()
  for nu := range ch {
    // process nu
  }
}()

err = a.Load(context.Background(), ch, 5, nomcode.Zoological)
close(ch)
wg.Wait()
Convert a DwCA archive to SFGA
import (
  "context"
  "sync"

  "github.com/sfborg/sflib"
  "github.com/sfborg/sflib/pkg/coldp"
)

dwca := sflib.NewDwca()
err := dwca.Fetch("archive.zip", "/tmp/dwca-cache")

sfga := sflib.NewSfga()
err = sfga.Create("/tmp/sfga-cache")
_, err = sfga.Connect()
defer sfga.Close()

ch := make(chan coldp.Data)
var wg sync.WaitGroup
wg.Add(1)
go func() {
  defer wg.Done()
  for d := range ch {
    sfga.InsertNameUsages(d.NameUsages)
  }
}()

err = dwca.LoadCore(context.Background(), ch)
close(ch)
wg.Wait()
Convert an SFGA archive to CoLDP

Continuing from the previous example, the SFGA archive can be converted to any other supported format. This demonstrates the full round-trip: DwCA → SFGA → CoLDP.

import (
  "context"
  "path/filepath"
  "sync"

  "github.com/sfborg/sflib"
  "github.com/sfborg/sflib/pkg/coldp"
)

sfga := sflib.NewSfga()
err := sfga.Fetch("archive.sqlite.zip", "/tmp/sfga-cache")
_, err = sfga.Connect()
defer sfga.Close()

coldpDir := "/tmp/coldp-cache"
cl := sflib.NewColdp()
err = cl.Create(coldpDir)

ch := make(chan coldp.NameUsage)
ctx := context.Background()
var wg sync.WaitGroup
wg.Add(1)
go func() {
  defer wg.Done()
  coldp.Write(ctx, ch, filepath.Join(coldpDir, "NameUsage.tsv"))
}()

err = sfga.LoadNameUsages(ctx, ch)
close(ch)
wg.Wait()

err = cl.Export("output.zip", true)

Testing

As the library modifies file system, running tests in parallel might create running conditions, that will break some tests. To make sure running only one thread with tests either use

just test

or run tests with -p 1 option:

go test ./... -p 1

Authors

License

Released under MIT license

Documentation

Overview

Package sflib is a library that provides functionality to create and manage various types of archives for biodiversity data, including text, CSV/TSV/PSV, CoLDP, DwCA and SFGA formats.

sflib provides a unified interface for working with different archive formats, abstracting away the underlying implementation details. It allows users to create new archives of specific types and manage them through a consistent set of methods.

The package supports the following archive types:

  • Text: Archives containing plain text files, typically with one scientific name per line.
  • Xsv: Archives containing delimited files (CSV, TSV, PSV).
  • Coldp: Archives following the Catalogue of Life Data Package (CoLDP) standard.
  • DwCA: Archives following the Darwin Core Archive standard.
  • SFGA: Archives following the Species File Group Archive standard -- SQLite based archive that is close to CoLDP format.

Usage:

To create a new archive, use one of the `New...` functions:

// Create a new text archive.
textArchive := sflib.NewText()

// Create a new Xsv archive.
xsvArchive := sflib.NewXsv()

// Create a new Dwca archive.
dwcaArchive := sflib.NewDwca()

// Create a new Coldp archive.
coldpArchive := sflib.NewColdp()

// Create a new Sfga archive.
sfgaArchive := sflib.NewSfga()

Each of these functions returns an interface type (e.g., `text.Archive`, `xsv.Archive`, `dwca.Archive`, `coldp.Archive`, `sfga.Archive`) that can be used to interact with the archive.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewColdp

func NewColdp(opts ...config.Option) coldp.Archive

NewColdp creates a new coldp.Archive instance, which allows managing archives following the Catalogue of Life Data Package (CoLDP) standard.

Example:

coldpArchive := sflib.NewColdp()
// ... use coldpArchive to manage the archive ...

func NewDwca

func NewDwca(opts ...config.Option) dwca.Archive

func NewSfga

func NewSfga(opts ...config.Option) sfga.Archive

NewSfga creates a new sfga.Archive instance, which allows managing archives following the Species File Group Archive. It is based on SQLite database and is close to CoLDP format.

Example:

sfgaArchive := sflib.NewSfga()
// ... use sfgaArchive to manage the archive ...

func NewText

func NewText(opts ...config.Option) text.Archive

NewText creates a new text.Archive instance, which allows managing archives containing textual scientific names data. It assumes that a text file has UTF-8 encoding and contains one seientific name per line and no other information.

Example:

textArchive := sflib.NewText()
// ... use textArchive to manage the archive ...

func NewXsv

func NewXsv(opts ...config.Option) xsv.Archive

NewXsv creates a new xsv.Archive instance, which allows managing archives containing xsv (e.g., CSV, TSV or PSV) files.

Example:

xsvArchive := sflib.NewXsv()
// ... use xsvArchive to manage the archive ...

Types

This section is empty.

Directories

Path Synopsis
internal
pkg
xsv

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL