kv

package
v0.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 13, 2026 License: MIT Imports: 13 Imported by: 0

Documentation

Index

Constants

View Source
const MASK_REVERSE = 1

MASK_REVERSE is the mask of reversed flag

View Source
const MaskUse3BytesForSeedPos uint8 = 1

Variables

View Source
var ErrBrokenFile = errors.New("k-mer-value data: broken file")

ErrBrokenFile means the file is not complete.

View Source
var ErrInvalidFileFormat = errors.New("k-mer-value data: invalid binary format")

ErrInvalidFileFormat means invalid file format.

View Source
var ErrKOverflow = errors.New("k-mer-value data: k-mer size [1, 32] overflow")

ErrKOverflow means K < 1 or K > 32.

View Source
var ErrVersionMismatch = errors.New("k-mer-value data: version mismatch")

ErrVersionMismatch means version mismatch between files and program

View Source
var KVIndexFileExt = ".idx"

KVIndexFileExt is the file extension of k-mer data index file.

View Source
var Magic = [8]byte{'.', 'k', 'v', '-', 'd', 'a', 't', 'a'}

Magic number for checking file format

View Source
var MagicIdx = [8]byte{'.', 'k', 'v', 'i', 'n', 'd', 'e', 'x'}

Magic number for the index file

View Source
var MainVersion uint8 = 1

MainVersion is use for checking compatibility

View Source
var MinorVersion uint8 = 1

MinorVersion is less important

View Source
var PoolKmerData = &sync.Pool{New: func() interface{} {
	m := make(map[uint64]*[]uint64, 1024)
	return &m
}}

Functions

func AnchorExtracter added in v0.4.0

func AnchorExtracter(k uint8, maskPrefix uint8, anchorPrefix uint8) func(uint64) uint64

AnchorExtracter returns the function for extracting anchors, i.e., CCCCC below

maskPrefix
-------
AAAAAAA CCCCC NNNNNNNN
        -----
        anchorPrefix

func CreateKVIndex added in v0.3.0

func CreateKVIndex(file string, nAnchors int) error

CreateKVIndex recreates kv index file for the kv-data file.

func PutUint64ThreeBytes added in v0.6.0

func PutUint64ThreeBytes(b []byte, v uint64)

PutUint64ThreeBytes puts uint64 to 7 low bytes.

func ReadKVIndex

func ReadKVIndex(file string) (uint8, int, [][]uint64, uint8, uint8, uint8, error)

ReadKVIndex parses the k-mer-value index file.

Returned:

k-mer size
Index (0-based) of the first Mask in current chunk.
Index data of masks saved in a list, the list size equals to the number of masks.
error

A list of k-mer and offset pairs are intermittently saved in a []uint64. e.g., [k1, o1, k2, o2].

func ReadKVIndexInfo added in v0.4.0

func ReadKVIndexInfo(file string) (uint8, int, int, uint8, uint8, error)

ReadKVIndexInfo read the information.

func RecycleKmerData

func RecycleKmerData(m *map[uint64]*[]uint64)

RecycleKmerData recycles a k-mer data object.

func RecycleSearchResults

func RecycleSearchResults(sr *[]*SearchResult)

RecycleSearchResults recycles search results objects.

func Uint64ThreeBytes added in v0.6.0

func Uint64ThreeBytes(b []byte) uint64

Uint64ThreeBytes returns an uint64 from 7 bytes

func WriteKVData

func WriteKVData(k uint8, MaskOffset int, data []*map[uint64]*[]uint64, file string, maskPrefix uint8, anchorPrefix uint8, nbatches int, clearData bool) (int, error)

WriteKVData writes k-mer-value data of a chunk of masks to a file. At the same time, the index file is also created with the number of anchors `nAnchors` (default: sqrt(#kmers)).

Header (32 bytes):

Magic number, 8 bytes, ".kv-data".
Main and minor versions, 2 bytes.
K size, 1 byte.
Config1, 1 byte, including one bit for use3BytesForSeedPos
Blank, 5 bytes.
Mask start index, 8 bytes. The index of the first index.
Mask chunk size, 8 bytes. The number of masks in this file.

For each mask:

Number of k-mers, 8 bytes.
k-mer-values pairs, sizes vary.
	Control byte for 2 k-mers, 1 byte
	Delta values of the 2 k-mers, 2-16 bytes
	Control byte for numbers of values, 1 byte
	Numbers of values of the 2 k-mers, 2-16 bytes, 2 bytes for most cases.
	Values of the 2 k-mers, 8*n bytes for batches>512, 7*n for batches <=512, 14 or 16 bytes for most cases.

Index file stores 4^p' k-mers (anchors) and their offsets in the kv-data file for fast access, the time complexity would be O(1) instead of previous O(log2N)

AAAAAAA CCCCC NNNNNNNN
------- -----
p       p'

Locations of these anchors vary, and some of them might not exist.

kkvvkkvvkkvvkkvvkkvvkkvvkkvvkkvvkkvvkkvvkkvv
AA        AC       AG   AT   CA

Header (40 bytes):

Magic number, 8 bytes, ".kvindex".
Main and minor versions, 2 bytes.
K size, 1 byte.
Mask prefix length, 1  byte. e.g., 7
Anchor prefix length, 1 byte. e.g., 5
Config1, 1 byte, including one bit for use3BytesForSeedPos
Blank, 2 bytes.
Mask start index, 8 bytes. The index of the first index.
Mask chunk size, 8 bytes. The number of masks in this file.

For each mask:

Number of anchors, 8 bytes.
kmer-offset data:

	k-mer: 8 bytes
	offset: 8 bytes

Types

type InMemorySearcher added in v0.3.0

type InMemorySearcher struct {
	K          uint8 // kmer size
	ChunkIndex int   // index of the first mask in this chunk
	ChunkSize  int   // the number of masks in this chunk

	// kv data of the ChunkSize masks.
	// A list of k-mer and value pairs are intermittently saved in a []uint64
	KVdata  [][]uint64
	Indexes [][]int
	// contains filtered or unexported fields
}

Searcher provides searching service of querying k-mer values in a k-mer-value file.

func NewInMemomrySearcher added in v0.3.0

func NewInMemomrySearcher(file string) (*InMemorySearcher, error)

NewSearcher creates a new Searcher for the given kv-data file.

func (*InMemorySearcher) Close added in v0.3.0

func (scr *InMemorySearcher) Close() error

Close closes the searcher.

func (*InMemorySearcher) Search added in v0.3.0

func (scr *InMemorySearcher) Search(kmers []uint64, p uint8, checkFlag bool, reversedKmer bool) (*[]*SearchResult, error)

Search queries a k-mer and returns k-mers with a minimum prefix of p, and maximum m mismatches. For m <0 or m >= k-p, mismatch will not be checked.

Please remember to recycle the results object with RecycleSearchResults().

func (*InMemorySearcher) Search2 added in v0.4.0

func (scr *InMemorySearcher) Search2(kmers []*[]uint64, p uint8, checkFlag bool, reversedKmer bool) (*[]*SearchResult, error)

Search2 is very similar to Search, only the data structure of input kmers is different.

type IndexReader

type IndexReader struct {
	K          uint8 // kmer size
	ChunkIndex int   // index of the first mask in this chunk
	ChunkSize  int   // the number of masks in this chunk
	NAnchors   int

	Use3BytesForSeedPos bool
	// contains filtered or unexported fields
}

IndexReader provides methods for reading kv-data index data.

func NewIndexReader

func NewIndexReader(file string) (*IndexReader, error)

NewIndexReader creates a index reader

func (*IndexReader) Close

func (rdr *IndexReader) Close() error

Close closes the reader

type Reader

type Reader struct {
	K          uint8 // kmer size
	ChunkIndex int   // index of the first mask in this chunk
	ChunkSize  int   // the number of masks in this chunk

	Use3BytesForSeedPos bool
	// contains filtered or unexported fields
}

Reader provides methods for reading kv data of a mask, used in kv-data merging.

func NewReader

func NewReader(file string) (*Reader, error)

NewReader creates a reader.

func (*Reader) Close

func (rdr *Reader) Close() error

Close closes the reader

func (*Reader) ReadDataOfAMaskAndAppendToMap added in v0.8.0

func (rdr *Reader) ReadDataOfAMaskAndAppendToMap(m *map[uint64]*[]uint64) error

ReadDataOfAMaskAsMap reads data of a mask. Please remember to recycle the result.

func (*Reader) ReadDataOfAMaskAsList added in v0.3.0

func (rdr *Reader) ReadDataOfAMaskAsList() ([]uint64, error)

ReadDataOfAMaskAsList reads data of a mask Returned: a list of k-mer and value pairs are intermittently saved in a []uint64.

func (*Reader) ReadDataOfAMaskAsListAndCreateIndex added in v0.3.0

func (rdr *Reader) ReadDataOfAMaskAsListAndCreateIndex() ([]uint64, []int, uint8, uint8, error)

ReadDataOfAMaskAsListAndCreateIndex reads data of a mask, and create a new index with n anchors. Returned: a list of k-mer and value pairs are intermittently saved in a []uint64.

func (*Reader) ReadDataOfAMaskAsMap added in v0.3.0

func (rdr *Reader) ReadDataOfAMaskAsMap() (*map[uint64]*[]uint64, error)

ReadDataOfAMaskAsMap reads data of a mask. Please remember to recycle the result.

type SearchResult

type SearchResult struct {
	IQuery int // index of the query kmer, i.e., index of mask
	// Kmer      uint64 // matched kmer
	IQuery2  int   // index of the query of the mask, cause a mask would have multiple k-mers when matchinged by suffx
	Len      uint8 // length of common prefix/suffix between the query and this k-mer
	IsSuffix bool  // if matched by suffix
	// Mismatch  uint8    // number of mismatch, it has meanning only when checking mismatch!
	Values []uint64 // value of this key
}

SearchResult represents a search result.

func (*SearchResult) Reset

func (r *SearchResult) Reset()

Reset just resets the stats of a SearchResult

type Searcher

type Searcher struct {
	K          uint8 // kmer size
	ChunkIndex int   // index of the first mask in this chunk
	ChunkSize  int   // the number of masks in this chunk

	// indexes of the ChunkSize masks.
	// A list of k-mer and offset pairs are intermittently saved in a []uint64
	Indexes [][]uint64

	Use3BytesForSeedPos bool
	// contains filtered or unexported fields
}

Searcher provides searching service of querying k-mer values in a k-mer-value file.

func NewSearcher

func NewSearcher(file string) (*Searcher, error)

NewSearcher creates a new Searcher for the given kv-data file.

func (*Searcher) Close

func (scr *Searcher) Close() error

Close closes the searcher.

func (*Searcher) Search

func (scr *Searcher) Search(kmers []uint64, p uint8, checkFlag bool, reversedKmer bool) (*[]*SearchResult, error)

Search queries a k-mer and returns k-mers with a minimum prefix of p, and maximum m mismatches. For m <0 or m >= k-p, mismatch will not be checked.

Please remember to recycle the results object with RecycleSearchResults().

func (*Searcher) Search2 added in v0.4.0

func (scr *Searcher) Search2(kmers []*[]uint64, p uint8, checkFlag bool, reversedKmer bool) (*[]*SearchResult, error)

Search2 is very similar to Search, only the data structure of input kmers is different.

type Writer

type Writer struct {
	K          uint8 // kmer size
	ChunkIndex int   // index of the first mask in this chunk
	ChunkSize  int   // the number of masks in this chunk

	// for kv data
	N int // the number of bytes.
	// contains filtered or unexported fields
}

Writer is used for k-mer-value data for multiple mask

func NewWriter

func NewWriter(k uint8, MaskOffset int, chunkSize int, file string, maskPrefix uint8, anchorPrefix uint8, use3BytesForSeedPos bool) (*Writer, error)

NewWriter returns a new writer

func (*Writer) Close

func (wtr *Writer) Close() (err error)

Close is very important

func (*Writer) WriteDataOfAMask

func (wtr *Writer) WriteDataOfAMask(m map[uint64]*[]uint64) (err error)

WriteDataOfAMask writes data of one mask.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL