Documentation
¶
Index ¶
- Constants
- Variables
- func CheckText(content []byte, maxTrigramCount int) error
- func ReadMetadata(inf IndexFile) (*Repository, *IndexMetadata, error)
- func SortFilesByScore(ms []FileMatch)
- type Document
- type DocumentSection
- type FileMatch
- type IndexBuilder
- type IndexFile
- type IndexMetadata
- type LineFragmentMatch
- type LineMatch
- type RepoList
- type RepoListEntry
- type RepoStats
- type Repository
- type RepositoryBranch
- type SearchOptions
- type SearchResult
- type Searcher
- type Stats
Constants ¶
const FeatureVersion = 10
FeatureVersion is increased if a feature is added that requires reindexing data without changing the format version 2: Rank field for shards. 3: Rank documents within shards 4: Dedup file bugfix 5: Remove max line size limit 6: Include '#' into the LineFragment template 7: Record skip reasons in the index. 8: Record source path in the index. 9: Bump default max file size. 10: Switch to a more flexible TOC format.
const IndexFormatVersion = 15
FormatVersion is a version number. It is increased every time the on-disk index format is changed. 5: subrepositories. 6: remove size prefix for posting varint list. 7: move subrepos into Repository struct. 8: move repoMetaData out of indexMetadata 9: use bigendian uint64 for trigrams. 10: sections for rune offsets. 11: file ends in rune offsets. 12: 64-bit branchmasks. 13: content checksums 14: languages 15: rune based symbol sections 16 (TBA): TODO: remove fallback parsing in readTOC
const ReadMinFeatureVersion = 8
ReadMinFeatureVersion constrains backwards compatibility by refusing to load a file with a FeatureVersion below it.
const WriteMinFeatureVersion = 10
WriteMinFeatureVersion constrains forwards compatibility by emitting files that won't load in zoekt with a FeatureVersion below it.
Variables ¶
var DebugScore = false
DebugScore controls whether we collect data on match scores are constructed. Intended for use in tests.
var Version string
Filled by the linker (see build-deploy.sh)
Functions ¶
func ReadMetadata ¶
func ReadMetadata(inf IndexFile) (*Repository, *IndexMetadata, error)
ReadMetadata returns the metadata of index shard without reading the index data. The IndexFile is not closed.
Types ¶
type Document ¶
type Document struct {
Name string
Content []byte
Branches []string
SubRepositoryPath string
Language string
// If set, something is wrong with the file contents, and this
// is the reason it wasn't indexed.
SkipReason string
// Document sections for symbols. Offsets should use bytes.
Symbols []DocumentSection
}
Document holds a document (file) to index.
type DocumentSection ¶
type DocumentSection struct {
Start, End uint32
}
type FileMatch ¶
type FileMatch struct {
// Ranking; the higher, the better.
Score float64 // TODO - hide this field?
// For debugging. Needs DebugScore set, but public so tests in
// other packages can print some diagnostics.
Debug string
FileName string
// Repository is the globally unique name of the repo of the
// match
Repository string
Branches []string
LineMatches []LineMatch
// Only set if requested
Content []byte
// Checksum of the content.
Checksum []byte
// Detected language of the result.
Language string
// SubRepositoryName is the globally unique name of the repo,
// if it came from a subrepository
SubRepositoryName string
// SubRepositoryPath holds the prefix where the subrepository
// was mounted.
SubRepositoryPath string
// Commit SHA1 (hex) of the (sub)repo holding the file.
Version string
}
FileMatch contains all the matches within a file.
type IndexBuilder ¶
type IndexBuilder struct {
// contains filtered or unexported fields
}
IndexBuilder builds a single index shard.
func NewIndexBuilder ¶
func NewIndexBuilder(r *Repository) (*IndexBuilder, error)
NewIndexBuilder creates a fresh IndexBuilder. The passed in Repository contains repo metadata, and may be set to nil.
func (*IndexBuilder) Add ¶
func (b *IndexBuilder) Add(doc Document) error
Add a file which only occurs in certain branches.
func (*IndexBuilder) AddFile ¶
func (b *IndexBuilder) AddFile(name string, content []byte) error
AddFile is a convenience wrapper for Add
func (*IndexBuilder) ContentSize ¶
func (b *IndexBuilder) ContentSize() uint32
ContentSize returns the number of content bytes so far ingested.
type IndexFile ¶
type IndexFile interface {
Read(off uint32, sz uint32) ([]byte, error)
Size() (uint32, error)
Close()
Name() string
}
IndexFile is a file suitable for concurrent read access. For performance reasons, it allows a mmap'd implementation.
type IndexMetadata ¶
type IndexMetadata struct {
IndexFormatVersion int
IndexFeatureVersion int
IndexMinReaderVersion int
IndexTime time.Time
PlainASCII bool
LanguageMap map[string]byte
ZoektVersion string
}
IndexMetadata holds metadata stored in the index file. It contains data generated by the core indexing library.
type LineFragmentMatch ¶
type LineFragmentMatch struct {
// Offset within the line, in bytes.
LineOffset int
// Offset from file start, in bytes.
Offset uint32
// Number bytes that match.
MatchLength int
}
LineFragmentMatch a segment of matching text within a line.
type LineMatch ¶
type LineMatch struct {
// The line in which a match was found.
Line []byte
LineStart int
LineEnd int
LineNumber int
// If set, this was a match on the filename.
FileName bool
// The higher the better. Only ranks the quality of the match
// within the file, does not take rank of file into account
Score float64
LineFragments []LineFragmentMatch
}
LineMatch holds the matches within a single line in a file.
type RepoList ¶
type RepoList struct {
Repos []*RepoListEntry
Crashes int
}
RepoList holds a set of Repository metadata.
type RepoListEntry ¶
type RepoListEntry struct {
Repository Repository
IndexMetadata IndexMetadata
Stats RepoStats
}
type RepoStats ¶
type RepoStats struct {
// Repos is used for aggregrating the number of repositories.
Repos int
// Shards is the total number of search shards.
Shards int
// Documents holds the number of documents or files.
Documents int
// IndexBytes is the amount of RAM used for index overhead.
IndexBytes int64
// ContentBytes is the amount of RAM used for raw content.
ContentBytes int64
}
Statistics of a (collection of) repositories.
type Repository ¶
type Repository struct {
// The repository name
Name string
// The repository URL.
URL string
// The physical source where this repo came from, eg. full
// path to the zip filename or git repository directory. This
// will not be exposed in the UI, but can be used to detect
// orphaned index shards.
Source string
// The branches indexed in this repo.
Branches []RepositoryBranch
// Nil if this is not the super project.
SubRepoMap map[string]*Repository
// URL template to link to the commit of a branch
CommitURLTemplate string
// The repository URL for getting to a file. Has access to
// {{Branch}}, {{Path}}
FileURLTemplate string
// The URL fragment to add to a file URL for line numbers. has
// access to {{LineNumber}}. The fragment should include the
// separator, generally '#' or ';'.
LineFragmentTemplate string
// All zoekt.* configuration settings.
RawConfig map[string]string
// Importance of the repository, bigger is more important
Rank uint16
// IndexOptions is a hash of the options used to create the index for the
// repo.
IndexOptions string
}
Repository holds repository metadata.
type RepositoryBranch ¶
RepositoryBranch describes an indexed branch, which is a name combined with a version.
type SearchOptions ¶
type SearchOptions struct {
// Return an upper-bound estimate of eligible documents in
// stats.ShardFilesConsidered.
EstimateDocCount bool
// Return the whole file.
Whole bool
// Maximum number of matches: skip all processing an index
// shard after we found this many non-overlapping matches.
ShardMaxMatchCount int
// Maximum number of matches: stop looking for more matches
// once we have this many matches across shards.
TotalMaxMatchCount int
// Maximum number of important matches: skip processing
// shard after we found this many important matches.
ShardMaxImportantMatch int
// Maximum number of important matches across shards.
TotalMaxImportantMatch int
// Abort the search after this much time has passed.
MaxWallTime time.Duration
// Trim the number of results after collating and sorting the
// results
MaxDocDisplayCount int
}
func (*SearchOptions) SetDefaults ¶
func (o *SearchOptions) SetDefaults()
func (*SearchOptions) String ¶
func (s *SearchOptions) String() string
type SearchResult ¶
type SearchResult struct {
Stats
Files []FileMatch
// RepoURLs holds a repo => template string map.
RepoURLs map[string]string
// FragmentNames holds a repo => template string map, for
// the line number fragment.
LineFragments map[string]string
}
SearchResult contains search matches and extra data
type Searcher ¶
type Searcher interface {
Search(ctx context.Context, q query.Q, opts *SearchOptions) (*SearchResult, error)
// List lists repositories. The query `q` can only contain
// query.Repo atoms.
List(ctx context.Context, q query.Q) (*RepoList, error)
Close()
// Describe the searcher for debug messages.
String() string
}
func NewSearcher ¶
NewSearcher creates a Searcher for a single index file. Search results coming from this searcher are valid only for the lifetime of the Searcher itself, ie. []byte members should be copied into fresh buffers if the result is to survive closing the shard.
type Stats ¶
type Stats struct {
// Amount of I/O for reading contents.
ContentBytesLoaded int64
// Amount of I/O for reading from index.
IndexBytesLoaded int64
// Number of search shards that had a crash.
Crashes int
// Wall clock time for this search
Duration time.Duration
// Number of files containing a match.
FileCount int
// Number of files in shards that we considered.
ShardFilesConsidered int
// Files that we evaluated. Equivalent to files for which all
// atom matches (including negations) evaluated to true.
FilesConsidered int
// Files for which we loaded file content to verify substring matches
FilesLoaded int
// Candidate files whose contents weren't examined because we
// gathered enough matches.
FilesSkipped int
// Shards that we did not process because a query was canceled.
ShardsSkipped int
// Number of non-overlapping matches
MatchCount int
// Number of candidate matches as a result of searching ngrams.
NgramMatches int
// Wall clock time for queued search.
Wait time.Duration
// Number of times regexp was called on files that we evaluated.
RegexpsConsidered int
}
Stats contains interesting numbers on the search
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
package build implements a more convenient interface for building zoekt indices.
|
package build implements a more convenient interface for building zoekt indices. |
|
zoekt
command
|
|
|
zoekt-archive-index
command
Command zoekt-archive-index indexes an archive.
|
Command zoekt-archive-index indexes an archive. |
|
zoekt-git-clone
command
This binary fetches all repos of a user or organization and clones them.
|
This binary fetches all repos of a user or organization and clones them. |
|
zoekt-git-index
command
|
|
|
zoekt-hg-index
command
zoekt-hg-index provides bare-bones Mercurial indexing
|
zoekt-hg-index provides bare-bones Mercurial indexing |
|
zoekt-index
command
|
|
|
zoekt-indexserver
command
|
|
|
zoekt-mirror-bitbucket-server
command
This binary fetches all repos of a project, and of a specific type, in case these are specified, and clones them.
|
This binary fetches all repos of a project, and of a specific type, in case these are specified, and clones them. |
|
zoekt-mirror-gerrit
command
|
|
|
zoekt-mirror-github
command
This binary fetches all repos of a user or organization and clones them.
|
This binary fetches all repos of a user or organization and clones them. |
|
zoekt-mirror-gitiles
command
This binary fetches all repos of a Gitiles host.
|
This binary fetches all repos of a Gitiles host. |
|
zoekt-mirror-gitlab
command
This binary fetches all repos for a user from gitlab.
|
This binary fetches all repos for a user from gitlab. |
|
zoekt-repo-index
command
zoekt-repo-index indexes a repo-based repository.
|
zoekt-repo-index indexes a repo-based repository. |
|
zoekt-test
command
zoekt-test compares the search engine results with raw substring search
|
zoekt-test compares the search engine results with raw substring search |
|
zoekt-webserver
command
|
|
|
Package gitindex provides functions for indexing Git repositories.
|
Package gitindex provides functions for indexing Git repositories. |