toobig

command module
v0.5.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 13, 2025 License: Apache-2.0 Imports: 7 Imported by: 0

README

TooBig

go.dev reference Go Report Card Releases LICENSE

TooBig was created from a need to manage large files like photos, videos, and other files. Over the last number of years I have found it has worked wonderfully for managing my decades of family photos, but also for managing photo and video assets for development projects, and other uses here and there.

TooBig converts a set of files into a set of refs and blobs. A blob is the original file, but the name has been changed to the 64 character SHA-256 checksum of the contents of the file. A ref is a small file that has the same name and directory structure of the original file, but the file itself is tiny, containing just the SHA-256 checksum of the original file and a timestamp.

  • toobig update converts files to refs/blobs.
  • toobig restore converts refs/blobs to files.

Why Use TooBig?

TooBig's architecture of separating file metadata (refs) from file content (blobs) provides several advantages:

Efficient Syncing and Backups

Have you ever reorganized a photo library and then had to wait for rsync to re-upload gigabytes of data, even though the files themselves didn't change? With TooBig, only the tiny ref files would need to be re-uploaded. The large blobs (your actual data) wasn't renamed or moved, making synchronization and backups dramatically faster and more efficient.

Verifiable Data Integrity

Because each blob's filename is its SHA-256 checksum, verifying your entire collection is trivial. toobig fsck can easily confirm that none of your files have suffered from bit rot or corruption during transfers between hard drives, cloud storage, or backup media over the years.

Version Control Your Files with git

git is powerful, but struggles with large binary files. TooBig allows you to commit your directory structure—the lightweight ref files—to a git repository. This lets you track every change, rename, and reorganization of your files without bloating the repository. You get a complete version history for your large files without the performance penalty of storing the actual data in git.

Data Deduplication

TODO: Need words here... If you have the same file multiple times this will store only one blob to represent all of those original files.

Warnings

Please realize that TooBig leverages hard links for much of its functionality. Make sure you realize what this means. Specifically you shouldn't edit files in place, this will make the linked blob file no longer match its checksum (filename). Files should break the hard link when they are edited. If you are unsure what the software you use does, make sure you run toobig fsck regularly to help you validate everything is working correctly.

Install

go install github.com/shortmoose/toobig@latest

Usage

Basic Usage:

# Dump photos from SD card to my photo directory.
# Delete/Edit photos as needed.
toobig status photo-toobig.cfg  # Not actually necessary
toobig update photo-toobig.cfg
rsync [flags] {refs,blobs} cloud-backup-dir

An example set up. Sort of git-esque:

cd <photos-dir>
mkdir .toobig
mkdir .toobig/{refs,blobs,dups}
toobig config >.toobig/config
# Edit config to look like:
# {
#  "file-path": "..",
#  "blob-path": "blobs",
#  "ref-path": "refs",
#  "dup-path": "dups"
# }
#
# You can verify your config by running:
toobig status .toobig/config

TODO: Give more examples here...

Exit Codes

Some effort has been made to make error codes consistent and useful. Here is a basic list of the error codes we currently use.

  • 0 - Success.
  • 1 - General error.
  • 2 - Panic, application crashed - golang defined exit code.
  • 3 - Invalid arguments, invalid command, etc.
  • 10 - Normal operational updates - for example a file has been added, etc. These normally exhibit as success (exit 0) but exit code 10 is used with the flag --update-is-error.
  • 11 - Data inconsistencies, generally will need manual intervention to repair (corrupted blob, invalid ref, etc).
  • 12 - Error with the config file.
  • 125+ - Above this range is usually used for signal handling. For example a ctrl-c will exit with the code 130.

Future

  • I have thought about breaking up the blobs directory into subdirectories, sort of like git does with its objects directory. No immediate plans for this I would need to see users with enough blobs to make it worth while.
  • A config option to NOT use hardlinks? It would double the space usage but allow people to us work flows where they actually modify files, instead of replace them.
  • Other ideas??

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
internal
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL