Documentation
¶
Index ¶
- Constants
- Variables
- func AnchorExtracter(k uint8, maskPrefix uint8, anchorPrefix uint8) func(uint64) uint64
- func CreateKVIndex(file string, nAnchors int) error
- func PutUint64ThreeBytes(b []byte, v uint64)
- func ReadKVIndex(file string) (uint8, int, [][]uint64, uint8, uint8, uint8, error)
- func ReadKVIndexInfo(file string) (uint8, int, int, uint8, uint8, error)
- func RecycleKmerData(m *map[uint64]*[]uint64)
- func RecycleSearchResults(sr *[]*SearchResult)
- func Uint64ThreeBytes(b []byte) uint64
- func WriteKVData(k uint8, MaskOffset int, data []*map[uint64]*[]uint64, file string, ...) (int, error)
- type InMemorySearcher
- type IndexReader
- type Reader
- func (rdr *Reader) Close() error
- func (rdr *Reader) ReadDataOfAMaskAndAppendToMap(m *map[uint64]*[]uint64) error
- func (rdr *Reader) ReadDataOfAMaskAsList() ([]uint64, error)
- func (rdr *Reader) ReadDataOfAMaskAsListAndCreateIndex() ([]uint64, []int, uint8, uint8, error)
- func (rdr *Reader) ReadDataOfAMaskAsMap() (*map[uint64]*[]uint64, error)
- type SearchResult
- type Searcher
- type Writer
Constants ¶
const MASK_REVERSE = 1
MASK_REVERSE is the mask of reversed flag
const MaskUse3BytesForSeedPos uint8 = 1
Variables ¶
var ErrBrokenFile = errors.New("k-mer-value data: broken file")
ErrBrokenFile means the file is not complete.
var ErrInvalidFileFormat = errors.New("k-mer-value data: invalid binary format")
ErrInvalidFileFormat means invalid file format.
var ErrKOverflow = errors.New("k-mer-value data: k-mer size [1, 32] overflow")
ErrKOverflow means K < 1 or K > 32.
var ErrVersionMismatch = errors.New("k-mer-value data: version mismatch")
ErrVersionMismatch means version mismatch between files and program
var KVIndexFileExt = ".idx"
KVIndexFileExt is the file extension of k-mer data index file.
var Magic = [8]byte{'.', 'k', 'v', '-', 'd', 'a', 't', 'a'}
Magic number for checking file format
var MagicIdx = [8]byte{'.', 'k', 'v', 'i', 'n', 'd', 'e', 'x'}
Magic number for the index file
var MainVersion uint8 = 1
MainVersion is use for checking compatibility
var MinorVersion uint8 = 1
MinorVersion is less important
Functions ¶
func AnchorExtracter ¶ added in v0.4.0
AnchorExtracter returns the function for extracting anchors, i.e., CCCCC below
maskPrefix
-------
AAAAAAA CCCCC NNNNNNNN
-----
anchorPrefix
func CreateKVIndex ¶ added in v0.3.0
CreateKVIndex recreates kv index file for the kv-data file.
func PutUint64ThreeBytes ¶ added in v0.6.0
PutUint64ThreeBytes puts uint64 to 7 low bytes.
func ReadKVIndex ¶
ReadKVIndex parses the k-mer-value index file.
Returned:
k-mer size Index (0-based) of the first Mask in current chunk. Index data of masks saved in a list, the list size equals to the number of masks. error
A list of k-mer and offset pairs are intermittently saved in a []uint64. e.g., [k1, o1, k2, o2].
func ReadKVIndexInfo ¶ added in v0.4.0
ReadKVIndexInfo read the information.
func RecycleKmerData ¶
RecycleKmerData recycles a k-mer data object.
func RecycleSearchResults ¶
func RecycleSearchResults(sr *[]*SearchResult)
RecycleSearchResults recycles search results objects.
func Uint64ThreeBytes ¶ added in v0.6.0
Uint64ThreeBytes returns an uint64 from 7 bytes
func WriteKVData ¶
func WriteKVData(k uint8, MaskOffset int, data []*map[uint64]*[]uint64, file string, maskPrefix uint8, anchorPrefix uint8, nbatches int, clearData bool) (int, error)
WriteKVData writes k-mer-value data of a chunk of masks to a file. At the same time, the index file is also created with the number of anchors `nAnchors` (default: sqrt(#kmers)).
Header (32 bytes):
Magic number, 8 bytes, ".kv-data". Main and minor versions, 2 bytes. K size, 1 byte. Config1, 1 byte, including one bit for use3BytesForSeedPos Blank, 5 bytes. Mask start index, 8 bytes. The index of the first index. Mask chunk size, 8 bytes. The number of masks in this file.
For each mask:
Number of k-mers, 8 bytes. k-mer-values pairs, sizes vary. Control byte for 2 k-mers, 1 byte Delta values of the 2 k-mers, 2-16 bytes Control byte for numbers of values, 1 byte Numbers of values of the 2 k-mers, 2-16 bytes, 2 bytes for most cases. Values of the 2 k-mers, 8*n bytes for batches>512, 7*n for batches <=512, 14 or 16 bytes for most cases.
Index file stores 4^p' k-mers (anchors) and their offsets in the kv-data file for fast access, the time complexity would be O(1) instead of previous O(log2N)
AAAAAAA CCCCC NNNNNNNN ------- ----- p p'
Locations of these anchors vary, and some of them might not exist.
kkvvkkvvkkvvkkvvkkvvkkvvkkvvkkvvkkvvkkvvkkvv AA AC AG AT CA
Header (40 bytes):
Magic number, 8 bytes, ".kvindex". Main and minor versions, 2 bytes. K size, 1 byte. Mask prefix length, 1 byte. e.g., 7 Anchor prefix length, 1 byte. e.g., 5 Config1, 1 byte, including one bit for use3BytesForSeedPos Blank, 2 bytes. Mask start index, 8 bytes. The index of the first index. Mask chunk size, 8 bytes. The number of masks in this file.
For each mask:
Number of anchors, 8 bytes. kmer-offset data: k-mer: 8 bytes offset: 8 bytes
Types ¶
type InMemorySearcher ¶ added in v0.3.0
type InMemorySearcher struct {
K uint8 // kmer size
ChunkIndex int // index of the first mask in this chunk
ChunkSize int // the number of masks in this chunk
// kv data of the ChunkSize masks.
// A list of k-mer and value pairs are intermittently saved in a []uint64
KVdata [][]uint64
Indexes [][]int
// contains filtered or unexported fields
}
Searcher provides searching service of querying k-mer values in a k-mer-value file.
func NewInMemomrySearcher ¶ added in v0.3.0
func NewInMemomrySearcher(file string) (*InMemorySearcher, error)
NewSearcher creates a new Searcher for the given kv-data file.
func (*InMemorySearcher) Close ¶ added in v0.3.0
func (scr *InMemorySearcher) Close() error
Close closes the searcher.
func (*InMemorySearcher) Search ¶ added in v0.3.0
func (scr *InMemorySearcher) Search(kmers []uint64, p uint8, checkFlag bool, reversedKmer bool) (*[]*SearchResult, error)
Search queries a k-mer and returns k-mers with a minimum prefix of p, and maximum m mismatches. For m <0 or m >= k-p, mismatch will not be checked.
Please remember to recycle the results object with RecycleSearchResults().
func (*InMemorySearcher) Search2 ¶ added in v0.4.0
func (scr *InMemorySearcher) Search2(kmers []*[]uint64, p uint8, checkFlag bool, reversedKmer bool) (*[]*SearchResult, error)
Search2 is very similar to Search, only the data structure of input kmers is different.
type IndexReader ¶
type IndexReader struct {
K uint8 // kmer size
ChunkIndex int // index of the first mask in this chunk
ChunkSize int // the number of masks in this chunk
NAnchors int
Use3BytesForSeedPos bool
// contains filtered or unexported fields
}
IndexReader provides methods for reading kv-data index data.
func NewIndexReader ¶
func NewIndexReader(file string) (*IndexReader, error)
NewIndexReader creates a index reader
type Reader ¶
type Reader struct {
K uint8 // kmer size
ChunkIndex int // index of the first mask in this chunk
ChunkSize int // the number of masks in this chunk
Use3BytesForSeedPos bool
// contains filtered or unexported fields
}
Reader provides methods for reading kv data of a mask, used in kv-data merging.
func (*Reader) ReadDataOfAMaskAndAppendToMap ¶ added in v0.8.0
ReadDataOfAMaskAsMap reads data of a mask. Please remember to recycle the result.
func (*Reader) ReadDataOfAMaskAsList ¶ added in v0.3.0
ReadDataOfAMaskAsList reads data of a mask Returned: a list of k-mer and value pairs are intermittently saved in a []uint64.
func (*Reader) ReadDataOfAMaskAsListAndCreateIndex ¶ added in v0.3.0
ReadDataOfAMaskAsListAndCreateIndex reads data of a mask, and create a new index with n anchors. Returned: a list of k-mer and value pairs are intermittently saved in a []uint64.
type SearchResult ¶
type SearchResult struct {
IQuery int // index of the query kmer, i.e., index of mask
// Kmer uint64 // matched kmer
IQuery2 int // index of the query of the mask, cause a mask would have multiple k-mers when matchinged by suffx
Len uint8 // length of common prefix/suffix between the query and this k-mer
IsSuffix bool // if matched by suffix
// Mismatch uint8 // number of mismatch, it has meanning only when checking mismatch!
Values []uint64 // value of this key
}
SearchResult represents a search result.
func (*SearchResult) Reset ¶
func (r *SearchResult) Reset()
Reset just resets the stats of a SearchResult
type Searcher ¶
type Searcher struct {
K uint8 // kmer size
ChunkIndex int // index of the first mask in this chunk
ChunkSize int // the number of masks in this chunk
// indexes of the ChunkSize masks.
// A list of k-mer and offset pairs are intermittently saved in a []uint64
Indexes [][]uint64
Use3BytesForSeedPos bool
// contains filtered or unexported fields
}
Searcher provides searching service of querying k-mer values in a k-mer-value file.
func NewSearcher ¶
NewSearcher creates a new Searcher for the given kv-data file.
func (*Searcher) Search ¶
func (scr *Searcher) Search(kmers []uint64, p uint8, checkFlag bool, reversedKmer bool) (*[]*SearchResult, error)
Search queries a k-mer and returns k-mers with a minimum prefix of p, and maximum m mismatches. For m <0 or m >= k-p, mismatch will not be checked.
Please remember to recycle the results object with RecycleSearchResults().
type Writer ¶
type Writer struct {
K uint8 // kmer size
ChunkIndex int // index of the first mask in this chunk
ChunkSize int // the number of masks in this chunk
// for kv data
N int // the number of bytes.
// contains filtered or unexported fields
}
Writer is used for k-mer-value data for multiple mask