Documentation
¶
Overview ¶
Package fai implements fasta sequence file index handling, including creating , reading and random accessing.
Code of fai data structure were copied and edited from [1].
But I wrote the code of creating and reading fai, and so did test code.
Code of random accessing subsequences were copied from [2], but I extended them a lot.
Reference:
[1]. https://github.com/biogo/biogo/blob/master/io/seqio/fai/fai.go
[2]. https://github.com/brentp/faidx/blob/master/faidx.go
## General Usage
import "github.com/shenwei356/bio/seqio/fai"
file := "seq.fa"
faidx, err := fai.New(file)
checkErr(err)
defer func() {
checkErr(faidx.Close())
}()
// whole sequence
seq, err := faidx.Seq("cel-mir-2")
checkErr(err)
// single base
s, err := faidx.Base("cel-let-7", 1)
checkErr(err)
// subsequence. start and end are all 1-based
seq, err := faidx.SubSeq("cel-mir-2", 15, 19)
checkErr(err)
## Extended SubSeq
For extended SubSeq, negative position is allowed.
This is my custom locating strategy. Start and end are all 1-based. To better understand the locating strategy, see examples below:
1-based index 1 2 3 4 5 6 7 8 9 10
negative index 0-9-8-7-6-5-4-3-2-1
seq A C G T N a c g t n
1:1 A
2:4 C G T
-4:-2 c g t
-4:-1 c g t n
-1:-1 n
2:-2 C G T N a c g t
1:-1 A C G T N a c g t n
1:12 A C G T N a c g t n
-12:-1 A C G T N a c g t n
Examples:
// last 12 bases
seq, err := faidx.SubSeq("cel-mir-2", -12, -1)
checkErr(err)
## Advanced Usage
Function `fai.New(file string)` is a wraper to simplify the process of creating and reading FASTA index . Let's see what's happened inside:
func New(file string) (*Faidx, error) {
fileFai := file + ".fai"
var index Index
if _, err := os.Stat(fileFai); os.IsNotExist(err) {
index, err = Create(file)
if err != nil {
return nil, err
}
} else {
index, err = Read(fileFai)
if err != nil {
return nil, err
}
}
return NewWithIndex(file, index)
}
By default, sequence ID is used as key in FASTA index file. Inside the package, a regular expression is used to get sequence ID from full head. The default value is `^([^\s]+)\s?`, i.e. getting first non-space characters of head. So you can just use `fai.Create(file string)` to create .fai file.
If you want to use full head instead of sequence ID (first non-space characters of head), you could use `fai.CreateWithIDRegexp(file string, idRegexp string)` to create faidx. Here, the `idRegexp` should be `^(.+)$`. For convenience, you can use another function `CreateWithFullHead`.
## More Advanced Usages
Note that, ***by default, whole file is mapped into shared memory***, which is OK for small files (smaller than your RAM). For very big files, you should disable that. Instead, file seeking is used.
// change the global variable fai.MapWholeFile = false // then do other things
Index ¶
- Variables
- func SubLocation(length, start, end int) (int, int, bool)
- type Faidx
- func (f *Faidx) Base(chr string, pos int) (byte, error)
- func (f *Faidx) Close() error
- func (f *Faidx) Seq(chr string) ([]byte, error)
- func (f *Faidx) SeqNotCleaned(chr string) ([]byte, error)
- func (f *Faidx) SubSeq(chr string, start int, end int) ([]byte, error)
- func (f *Faidx) SubSeqNotCleaned(chr string, start int, end int) ([]byte, error)
- type Index
- type Record
Constants ¶
This section is empty.
Variables ¶
var ErrSeqNotExists = fmt.Errorf("sequence not exists")
ErrSeqNotExists means that sequence not exists
var IDRegexp = regexp.MustCompile(defaultIDRegexp)
IDRegexp is regexp for parsing record id
var MapWholeFile = true
MapWholeFile is a globle flag to decides whether map whole file
Functions ¶
func SubLocation ¶
SubLocation is my sublocation strategy, start, end and returned start and end are all 1-based
1-based index 1 2 3 4 5 6 7 8 9 10
negative index 0-9-8-7-6-5-4-3-2-1
seq A C G T N a c g t n 1:1 A 2:4 C G T -4:-2 c g t -4:-1 c g t n -1:-1 n 2:-2 C G T N a c g t 1:-1 A C G T N a c g t n 1:12 A C G T N a c g t n -12:-1 A C G T N a c g t n
Types ¶
type Faidx ¶
type Faidx struct {
Index Index
// contains filtered or unexported fields
}
Faidx is
func NewWithCustomExt ¶
NewWithCustomExt try to get Faidx from fasta file, and .fai is specified
func NewWithIndex ¶
NewWithIndex return faidx from file and readed Index. Useful for using custom IDRegexp
func (*Faidx) SeqNotCleaned ¶
SeqNotCleaned returns sequences without cleaning "\r", and "\n"
type Index ¶
Index is FASTA index
func CreateWithFullHead ¶
CreateWithFullHead uses full head instead of just sequence ID
func CreateWithIDRegexp ¶
CreateWithIDRegexp uses custom regular expression to get sequence ID