history-rebuild

command

v0.4.7-1 Latest Latest Go to latest Published: Sep 18, 2025 License: GPL-2.0 Imports: 19 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/go-while/go-pugleaf

Links

Open Source Insights

README ¶

UseShortHashLen

ShortHashLen can NOT be changed after hashdb creation.

One should think wisely before creating the hashdb.

Changing this value later will require a complete rebuild of the history database, which not really tested yet.

Hash routing breakdown:

1st char: Database selection (0-f) (16 hash databases)
2nd+3rd chars: Table selection (s00-sff) (256 tables per database)
4th-6th chars: short hash (default UseShortHashLen=3)
Total: Default UseShortHashLen = 3 + 3 (db[0-f]:table[s00-sff]) = 6 characters = 16^6 = 16,777,216 combinations

ShortHashLen:

"2" (+ 3 for db[0-f]:table[s00-sff] selection) total of 5 chars = 16^5 combinations = 1,048,576)
"3" (+ 3 for db[0-f]:table[s00-sff] selection) total of 6 chars = 16^6 combinations = 16,777,216)
"4" (+ 3 for db[0-f]:table[s00-sff] selection) total of 7 chars = 16^7 combinations = 268,435,456)
"5" (+ 3 for db[0-f]:table[s00-sff] selection) total of 8 chars = 16^8 combinations = 4,294,967,296)
"6" (+ 3 for db[0-f]:table[s00-sff] selection) total of 9 chars = 16^9 combinations = 68,719,476,736)
"7" (+ 3 for db[0-f]:table[s00-sff] selection) total of 10 chars = 16^10 combinations = 1,099,511,627,776)

Collision Example: UseShortHashLen=2 with 1M articles

The math:

UseShortHashLen=2 = 1,048,576 possible hash slots
1M articles = nearly 100% capacity utilization
Result: Many collisions, but system handles them gracefully

What happens in practice:

Many hash entries store multiple comma-separated file offsets
Example: A single hash might store "12345,23456,34567,45678" (4 colliding articles)
During lookups, the system reads all 4 positions from history.dat and verifies the full hash of each to find the correct article
Performance impact: Instead of 1 disk read, many lookups require 3-5+ disk reads

Why performance degrades:

With 1M articles in 1M slots, frequent collisions occur
Each collision increases lookup time proportionally
The system works correctly, but slower due to additional I/O operations

The collision handling is robust and reliable - it's a performance trade-off, not a failure mode.

Implementation Background

This hash-based history system follows design principles similar to traditional Usenet news servers, particularly inspired by approaches used in systems like INN2 (InterNetNews). The core concept involves:

Traditional Usenet History Approach:

Hash-based duplicate detection using message-id hashes
File offset storage for quick article retrieval
Sharded databases to distribute load and improve concurrency
Collision handling through chaining (multiple offsets per hash)

Our Implementation:

Hash Routing: Uses first 3 characters for database/table selection (16 DBs × 256 tables = 4,096 shards)
Storage Hash: Configurable UseShortHashLen for collision resistance vs. storage efficiency
File Offsets: Stores byte positions in history.dat for direct article access
Collision Management: Comma-separated offsets with full hash verification during lookups
Graceful Degradation: System remains functional under high collision scenarios

The sharding strategy (16 databases × 256 tables each) provides excellent distribution and allows for high-concurrency operations while maintaining the simplicity and reliability that traditional Usenet history systems are known for.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL