index_analyzer

command
v0.62.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 21, 2025 License: Apache-2.0 Imports: 16 Imported by: 0

README

seq-db Index Analyzer

To run|for example:

> go run ./cmd/index_analyzer/... ./data/*.index | tee ~/report.txt

if you have a set of index filed in ./data/ directory

As result you will get a set of statistic data in CSV format

Uniq Tokens Stats

N docs docsSum fields tokens tokensSum uniqTokensSum tokenSize tokenSizeSum uniqTokenSizeSum
1 1021281 1021281 455 6370602 6370602 6370602 100973051 100973051 100973051
2 1002724 2024005 469 6378849 12749451 11806947 100243436 201216487 189662176

Each row contains data of next processed fraction index.

  • docs - Number of documents in fraction
  • docsSum - Cumulative total number of documents
  • fields - Cumulative total number of fields
  • tokens - Number of tokens in fraction
  • tokensSum - Cumulative total number of tokens
  • uniqTokensSum - Number of unique tokens in fractions for this table row and above
  • tokenSize - Total size of tokens in fraction (bytes)
  • tokenSizeSum - Cumulative total size of tokens
  • uniqTokenSizeSum - Total size of all uniq tokens in fractions for this table row and above

LIDs Histogram

N 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216
1 5066713 665965 319099 139046 72002 33557 24001 15789 10758 11128 5067 3065 1684 1384 798 293 147 70 11 5 20 0 0 0 0
2 9413836 1234627 612087 254150 118411 66178 33443 25276 14643 10928 10917 5066 3051 1614 1380 794 297 147 66 11 5 20 0 0 0

The table represents the distribution|where

  • in the table header -- the bucket size (number of documents)
  • the first column -- the number of fractions
  • in each cell -- the number of tokens that occur in the corresponding number of documents (shown in the header)

Tokens Histogram

N 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216
1 78 33 26 49 45 36 28 27 29 36 14 11 16 11 4 1 4 1 1 0 2 3 0 0 0
2 72 26 36 32 52 38 31 30 25 37 29 8 15 15 6 5 1 4 1 1 0 3 2 0 0

The table represents the distribution|where

  • in the table header -- the bucket size (number of tokens)
  • the first column -- the number of fractions
  • in each cell -- the number of fields with corresponding number of tokens (shown in the header)

Uniq LIDs Stats

Uniq LIDs Stats

N docs lidsCount lidsUniqCount
1 1021281 90613582 69135909
2 1002724 88643738 67335634
3 1049143 91475838 69721189
4 1018191 89420909 68159272

Table shows difference of size full LIDs section and size only uniq LIDs lists

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL