Documentation
¶
Index ¶
- Variables
- func ReadKVIndex(file string) (uint8, int, [][]uint64, error)
- func RecycleKmerData(m *map[uint64]*[]uint64)
- func RecycleSearchResults(sr *[]*SearchResult)
- func WriteKVData(k uint8, MaskOffset int, data []*map[uint64]*[]uint64, file string, ...) (int, error)
- type IndexReader
- type Reader
- type SearchResult
- type Searcher
- type Writer
Constants ¶
This section is empty.
Variables ¶
var ErrBrokenFile = errors.New("k-mer-value data: broken file")
ErrBrokenFile means the file is not complete.
var ErrInvalidFileFormat = errors.New("k-mer-value data: invalid binary format")
ErrInvalidFileFormat means invalid file format.
var ErrKOverflow = errors.New("k-mer-value data: k-mer size [1, 32] overflow")
ErrKOverflow means K < 1 or K > 32.
var ErrVersionMismatch = errors.New("k-mer-value data: version mismatch")
ErrVersionMismatch means version mismatch between files and program
var KVIndexFileExt = ".idx"
KVIndexFileExt is the file extension of k-mer data index file.
var Magic = [8]byte{'.', 'k', 'v', '-', 'd', 'a', 't', 'a'}
Magic number for checking file format
var MagicIdx = [8]byte{'.', 'k', 'v', 'i', 'n', 'd', 'e', 'x'}
Magic number for the index file
var MainVersion uint8 = 0
MainVersion is use for checking compatibility
var MinorVersion uint8 = 1
MinorVersion is less important
Functions ¶
func ReadKVIndex ¶
ReadKVIndex parses the k-mer-value index file.
Returned:
k-mer size Index (0-based) of the first Mask in current chunk. Index data of masks saved in a list, the list size equals to the number of masks. error
A list of k-mer and offset pairs are intermittently saved in a []uint64. e.g., [k1, o1, k2, o2].
func RecycleKmerData ¶
RecycleKmerData recycles a k-mer data object.
func RecycleSearchResults ¶
func RecycleSearchResults(sr *[]*SearchResult)
RecycleSearchResults recycles search results objects.
func WriteKVData ¶
func WriteKVData(k uint8, MaskOffset int, data []*map[uint64]*[]uint64, file string, nAnchors int) (int, error)
WriteKVData writes k-mer-value data of a chunk of masks to a file. At the same time, the index file is also created with the number of anchors `nAnchors` (default: sqrt(#kmers)).
Header (32 bytes):
Magic number, 8 bytes, ".kv-data". Main and minor versions, 2 bytes. K size, 1 byte. Blank, 5 bytes. Mask start index, 8 bytes. The index of the first index. Mask chunk size, 8 bytes. The number of masks in this file.
For each mask:
Number of k-mers, 8 bytes. k-mer-values pairs, sizes vary. Control byte for 2 k-mers, 1 byte Delta values of the 2 k-mers, 2-16 bytes Control byte for numbers of values, 1 byte Numbers of values of the 2 k-mers, 2-16 bytes, 2 bytes for most cases. Values of the 2 k-mers, 8*n bytes, 16 bytes for most cases.
Index file stores a fix number of k-mers (anchors) and their offsets in the kv-data file for fast access.
Locations of these anchors, e.g., 5 anchors.
kkvvkkvvkkvvkkvvkkvvkkvvkkvvkkvvkkvvkkvvkkvv A A A A A
Header (40 bytes):
Magic number, 8 bytes, ".kvindex". Main and minor versions, 2 bytes. K size, 1 byte. Blank, 5 bytes. Mask start index, 8 bytes. The index of the first index. Mask chunk size, 8 bytes. The number of masks in this file. Number of anchors, 8 bytes, default: $(squre root of ref genomes).
For each mask:
kmer-offset data: k-mer: 8 bytes offset: 8 bytes
Types ¶
type IndexReader ¶
type IndexReader struct {
K uint8 // kmer size
ChunkIndex int // index of the first mask in this chunk
ChunkSize int // the number of masks in this chunk
NAnchors int
// contains filtered or unexported fields
}
func NewIndexReader ¶
func NewIndexReader(file string) (*IndexReader, error)
NewIndexReader creates a index reader
type Reader ¶
type Reader struct {
K uint8 // kmer size
ChunkIndex int // index of the first mask in this chunk
ChunkSize int // the number of masks in this chunk
// contains filtered or unexported fields
}
Reader provides
type SearchResult ¶
type SearchResult struct {
IQuery int // index of the query kmer
Kmer uint64 // matched kmer
LenPrefix uint8 // length of common prefix between the query and this k-mer
Mismatch uint8 // number of mismatch, it has meanning only when checking mismatch!
Values []uint64 // value of this key
}
SearchResult represents a search result.
func (*SearchResult) Reset ¶
func (r *SearchResult) Reset()
Reset just resets the stats of a SearchResult
type Searcher ¶
type Searcher struct {
K uint8 // kmer size
ChunkIndex int // index of the first mask in this chunk
ChunkSize int // the number of masks in this chunk
Indexes [][]uint64 // indexes of the ChunkSize masks
// contains filtered or unexported fields
}
Searcher provides searching service of querying k-mer values in a k-mer-value file.
func NewSearcher ¶
NewSearcher creates a new Searcher for the given kv-data file.
type Writer ¶
type Writer struct {
K uint8 // kmer size
ChunkIndex int // index of the first mask in this chunk
ChunkSize int // the number of masks in this chunk
// for kv data
N int // the number of bytes.
// contains filtered or unexported fields
}
Writer is used for k-mer-value data for multiple mask