bbb

command module
v0.1.24 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 6, 2026 License: MIT Imports: 29 Imported by: 0

README

bbb

A Go fork of boostedblob — a fast, concurrent CLI for working with local files, Azure Blob Storage (az://), and Hugging Face (hf://).

Why a fork of boostedblob

  1. Need a single binary for multi-platform support
  2. Add login user and network debug logs

Installation

Download the latest release from the Releases page, or build from source:

go install github.com/tg123/bbb@latest

Supported Path Types

Prefix Description Example
(none) Local filesystem /tmp/data/, ./file.txt
az:// Azure Blob Storage az://myaccount/mycontainer/path/to/blob
hf:// Hugging Face Hub hf://meta-llama/Llama-2-7b/weights.bin, hf://datasets/org/repo/data.csv

Global Flags

Flag Default Description
--loglevel info Log level: debug, info, warn, error (env: BBB_LOG_LEVEL)

Debug logging example — use --loglevel debug to inspect DNS resolution and the Azure AD token issuer (iss), which is useful for diagnosing connectivity or authentication problems:

bbb --loglevel debug ls az://myaccount/mycontainer/

Example debug output (sensitive fields redacted):

time=... level=DEBUG msg="DNS lookup" host=myaccount.blob.core.windows.net addrs=["198.51.100.1"]
time=... level=DEBUG msg="Decoded JWT payload" payload="{\"aud\":\"https://storage.azure.com\",\"iss\":\"https://sts.windows.net/<tenant-id>/\",…}"

The DNS lookup line shows the resolved IP addresses for the storage account, and the Decoded JWT payload line contains the full token claims including iss (the token issuer) and aud (audience), letting you verify the correct identity and tenant are being used.

Warning: Debug output may include personally identifiable information such as tenant IDs, object IDs, and other token claims. Do not share debug logs publicly or paste them into tickets without redacting sensitive fields.

Environment Variables

Variable Default Description
BBB_LOG_LEVEL info Same as --loglevel flag
BBB_DNS_CACHE (off) Set to 1 or true to enable process-local DNS caching
BBB_DNS_PIN (off) Set to 1 or true to pin DNS to a single IP (implies BBB_DNS_CACHE=1)
BBB_AZBLOB_ACCOUNTKEY Azure Storage shared key for all accounts
SRC_BBB_AZBLOB_ACCOUNTKEY Shared key for source storage accounts only
DST_BBB_AZBLOB_ACCOUNTKEY Shared key for destination storage accounts only
BBB_PARALLEL_DOWNLOAD 1 (true) Set to 0 or false to disable parallel ranged Azure→local single-file downloads and fall back to a single streaming connection
BBB_PARALLEL_UPLOAD 1 (true) Set to 0 or false to disable parallel ranged local→Azure single-file uploads and fall back to the streaming UploadStream path
BBB_AZBLOB_DOWNLOAD_BLOCK_MIB 16 Chunk size in MiB used by the parallel Azure→local download path
BBB_AZBLOB_UPLOAD_BLOCK_MIB 64 Chunk size in MiB used by the parallel local→Azure upload path (clamped so the total block count stays within Azure's per-blob limit)
BBB_AZBLOB_DOWNLOAD_CONCURRENCY_MAX (auto) Hard upper bound on in-flight download ranges for the adaptive concurrency controller (default cap 512)
BBB_AZBLOB_UPLOAD_CONCURRENCY_MAX (auto) Hard upper bound on in-flight upload blocks for the adaptive concurrency controller (default cap 512)
BBB_AZBLOB_COPY_CONCURRENCY_MAX (auto) Hard upper bound on in-flight blocks for Azure→Azure server-side block copies (default cap 256)
BBB_AZBLOB_FORCE_S2S (off) Set to 1 or true to force Azure→Azure server-side (S2S) copy. On a server-side copy failure, bbb does not fall back to client-side streaming; instead the error is retried (honouring --retry-count)
BBB_RETRY_JITTER (off) Go duration (e.g. 500ms, 2s) to wait a random amount of time in [0, value) before each retry attempt. Applies to all operations that honour --retry-count
Non-Interactive Authentication (AZURE_* Env Vars)

By default, when bbb cannot find shared keys or role-specific credentials it discovers the storage account's tenant and authenticates via the Azure CLI, falling back to an interactive browser login. For CI/CD and other headless environments, set the standard AZURE_* environment variables to authenticate non-interactively:

Variable Purpose
AZURE_TENANT_ID Service principal tenant
AZURE_CLIENT_ID Service principal client ID
AZURE_CLIENT_SECRET Service principal secret
  • When AZURE_TENANT_ID, AZURE_CLIENT_ID, and AZURE_CLIENT_SECRET are all set, bbb uses a service principal credential.
  • AZURE_SUBSCRIPTION_ID is not required for Blob Storage data-plane authentication and is ignored.

These take effect before the interactive CLI/browser flow, so no browser popup is opened when they are configured. Single-endpoint commands (ls, cat, rm, etc.) reuse the SRC role internally, so the unprefixed AZURE_* vars (and SRC_AZURE_*) are honored for them too. For per-account scoping across tenants, use the SRC_ / DST_ prefixed variables described below.

Multi-Tenant / Multi-Account Authentication (SRC_ / DST_ Env Vars)

When copying or syncing between Azure Storage accounts in different tenants (or using different credentials), prefix any standard Azure identity environment variable with SRC_ or DST_ to scope it to source or destination accounts respectively.

The unprefixed AZURE_* variables act as shared defaults: a plain AZURE_xxx is interpreted as if it were set for both SRC_AZURE_xxx and DST_AZURE_xxx, and the role-prefixed variant overrides it when present. This lets a single set of AZURE_* vars authenticate both sides while still allowing per-role overrides.

bbb uses DefaultAzureCredential under the hood, so all credential types are supported: service principal (secret or certificate), workload identity (OIDC / AKS), managed identity, and Azure CLI.

Supported env vars — prefix with SRC_ or DST_:

Variable Category
AZURE_CLIENT_ID Core identity
AZURE_TENANT_ID Core identity
AZURE_CLIENT_SECRET Service principal (secret)
AZURE_CLIENT_CERTIFICATE_PATH Service principal (certificate)
AZURE_CLIENT_CERTIFICATE_PASSWORD Service principal (certificate)
AZURE_CLIENT_SEND_CERTIFICATE_CHAIN Service principal (certificate)
AZURE_FEDERATED_TOKEN_FILE Workload identity (OIDC / AKS)
IDENTITY_ENDPOINT Managed identity
IDENTITY_HEADER Managed identity
MSI_ENDPOINT Managed identity
MSI_SECRET Managed identity
IMDS_ENDPOINT Managed identity
AZURE_AUTHORITY_HOST Cloud / authority
AZURE_USERNAME Developer cache hint
AZURE_CONFIG_DIR Azure CLI integration
BBB_AZBLOB_ACCOUNTKEY Shared key (bbb-specific)

Example — service principal per tenant:

# Source tenant credentials
export SRC_AZURE_TENANT_ID=<tenant-a>
export SRC_AZURE_CLIENT_ID=<sp-a-id>
export SRC_AZURE_CLIENT_SECRET=<sp-a-secret>

# Destination tenant credentials
export DST_AZURE_TENANT_ID=<tenant-b>
export DST_AZURE_CLIENT_ID=<sp-b-id>
export DST_AZURE_CLIENT_SECRET=<sp-b-secret>

bbb cp az://src-account/container/ az://dst-account/container/

Example — shared key per account:

export SRC_BBB_AZBLOB_ACCOUNTKEY=<key-for-source>
export DST_BBB_AZBLOB_ACCOUNTKEY=<key-for-destination>

bbb sync az://src-account/data/ az://dst-account/data/

Credential resolution order (first match wins):

  1. Shared key (SRC_BBB_AZBLOB_ACCOUNTKEY / DST_BBB_AZBLOB_ACCOUNTKEY, or BBB_AZBLOB_ACCOUNTKEY)
  2. Role env credential via DefaultAzureCredentialSRC_AZURE_* / DST_AZURE_*, falling back to unprefixed AZURE_* defaults. Accounts without an explicit role (single-endpoint commands like ls/cat/rm) reuse the SRC role, so unprefixed AZURE_* (and SRC_AZURE_*) are honored for them.
  3. Tenant-specific AzureCLI credential (auto-discovered from storage endpoint)
  4. Interactive browser login (fallback)
BBB_DNS_CACHE

When enabled, bbb caches DNS resolution results in memory so that repeated connections to the same hostname (e.g. an Azure Storage endpoint) skip the DNS lookup. Cached entries expire after 5 minutes.

BBB_DNS_CACHE=1 bbb cp ./data/ az://myaccount/mycontainer/data/

The cache is installed on the shared HTTP transport used by all outbound traffic in the process, including the Azure SDK (data-plane requests, user-delegation-key / UDC acquisition, and OAuth token calls to login.microsoftonline.com) and Hugging Face API calls.

Caveats:

  • DNS records that change during the TTL window (e.g. IP rotations) will not be picked up until the cached entry expires.
  • Because cached addresses are dialled as IP literals, Go's standard Happy Eyeballs (RFC 6555) connection racing is bypassed. For Azure Blob Storage endpoints (typically single-stack) this has no practical impact.
  • az login (when falling back to AzureCLICredential) shells out to the az binary; DNS resolution for that subprocess happens in the child and is not affected by these variables.
BBB_DNS_PIN

When enabled, bbb pins every hostname to a single IP address. If DNS returns multiple addresses, only the first reachable address is used for all connections to that host. This implicitly enables BBB_DNS_CACHE with unlimited TTL (BBB_DNS_CACHE_TTL is ignored). Pinning can help avoid 403 errors from services that tie authentication tokens to a specific endpoint IP.

BBB_DNS_PIN=1 bbb cp ./data/ az://myaccount/mycontainer/data/

The pin applies to all outbound HTTP traffic in the process, including Azure SDK data-plane requests, UDC acquisition, and OAuth token calls, as well as Hugging Face API calls. This matches the semantics of a /etc/hosts override but limited to the bbb process.

Caveats:

  • Pinned entries never refresh. Long-lived processes will not pick up DNS or IP rotations and may require a restart (or disabling pinning) to recover if the pinned IP becomes unreachable.
  • If the first resolved address is unreachable (e.g. an IPv6 address in an IPv4-only environment), bbb will try the remaining addresses and pin to the first one that successfully connects.
  • AzureCLICredential shells out to the az binary; its DNS resolution runs in the child process and is not covered by the pin. MSAL-cached tokens on disk are also unaffected — occasional re-authentication may still occur.
Taskfile

A taskfile is a plain-text file with one src dst pair per line, separated by whitespace. Empty lines are ignored.

Note: Paths containing spaces are not supported in the taskfile format because fields are split on whitespace. Use cp or sync with positional arguments instead for such paths.

Example tasks.txt:

./data/model.bin   az://myaccount/mycontainer/models/model.bin
./data/config.json az://myaccount/mycontainer/models/config.json
./data/vocab.txt   az://myaccount/mycontainer/models/vocab.txt

Use --taskfile to pass the file to cp or sync. Use - to read from stdin:

# Copy all pairs listed in the taskfile
bbb cp --taskfile tasks.txt

# Sync all pairs listed in the taskfile
bbb sync --taskfile tasks.txt

# Pipe pairs from another command
find ./models -name '*.bin' | awk '{print $0, "az://myaccount/mycontainer/"$0}' | bbb cp --taskfile -
State file

A state file tracks completed work so interrupted operations can be resumed. Pass --state to cp or sync and re-run the same command after a failure — already-finished items are skipped automatically.

# Start a large copy with crash recovery
bbb cp --taskfile tasks.txt --state copy.state

# If the process is interrupted, re-run the exact same command.
# Completed files are skipped; only remaining work is executed.
bbb cp --taskfile tasks.txt --state copy.state

--state also works without --taskfile:

bbb cp --state copy.state ./huge-dataset/ az://myaccount/mycontainer/dataset/

The state file is a plain-text append-only log. Each successfully copied file is recorded as src -> dst, and when all files in a taskfile pair are finished the pair is marked complete with a TASK\t prefix so the entire pair can be skipped on resume:

./data/model.bin -> az://myaccount/mycontainer/models/model.bin
./data/config.json -> az://myaccount/mycontainer/models/config.json
TASK	./data/model.bin -> az://myaccount/mycontainer/models/model.bin

Commands

ls — List directory contents

List files and directories at a given path.

bbb ls [flags] [path]
Flag Description
-l, --long Show file type, size, and modification time
-a Include hidden files (entries starting with .)
-s, --relative Show relative paths instead of full paths
--machine Machine-readable tab-separated output

Examples:

# List local directory
bbb ls /tmp/data/

# Long listing of an Azure Blob container
bbb ls -l az://myaccount/mycontainer/

# List Hugging Face repo files with relative paths
bbb ls -s hf://meta-llama/Llama-2-7b/

ll — Long listing (alias for ls -l)

Aliases: du

bbb ll [flags] [path]
Flag Description
-s, --relative Show relative paths
--machine Machine-readable tab-separated output

Example:

# Show sizes and timestamps of blobs in a container
bbb ll az://myaccount/mycontainer/models/

lstree — Recursively list all files

Aliases: lsr

Recursively lists all files (not directories) under a path, with a summary of total count and size.

bbb lstree [flags] [path]
Flag Description
-l, --long Show file type, size, and modification time
-s, --relative Show relative paths
--machine Machine-readable tab-separated output

Examples:

# Recursively list all files under a local directory
bbb lstree /home/user/project/

# Machine-readable recursive listing of a blob container
bbb lstree --machine az://myaccount/mycontainer/data/

llr — Long recursive file list

Equivalent to lstree -l.

bbb llr [flags] [path]
Flag Description
-s, --relative Show relative paths
--machine Machine-readable tab-separated output

Example:

bbb llr az://myaccount/mycontainer/

cat — Print file contents to stdout
bbb cat path [path ...]

Examples:

# Print a local file
bbb cat /tmp/config.yaml

# Print a blob from Azure
bbb cat az://myaccount/mycontainer/config.json

# Print multiple files
bbb cat file1.txt file2.txt

touch — Create or ensure file exists

Creates empty files if they don't exist. For local files, also updates the modification timestamp. For Azure blobs, creates an empty blob only when it doesn't already exist — it does not update the timestamp on existing blobs.

bbb touch path [path ...]

Examples:

# Create an empty local file
bbb touch /tmp/newfile.txt

# Touch a blob in Azure
bbb touch az://myaccount/mycontainer/marker.txt

cp — Copy files or directories

Aliases: cpr, cptree

Copy one or more source files/directories to a destination. Supports local and Azure Blob paths in any combination. Hugging Face (hf://) paths are supported as a source only (the hf:// backend is read-only).

bbb cp [flags] src [src ...] dst
Flag Description
--taskfile FILE Batch task file with one src dst pair per line; use - for stdin
--state FILE State file for crash recovery / resuming interrupted operations
-f Force overwrite existing files
-q, --quiet Suppress output
--concurrency N Number of concurrent transfers (default: CPU cores)
--retry-count N Number of retries on failure (default: 0)

Examples:

# Copy a local file to Azure Blob Storage
bbb cp ./model.bin az://myaccount/mycontainer/models/model.bin

# Copy an entire directory to Azure
bbb cp ./data/ az://myaccount/mycontainer/data/

# Download from Azure to local
bbb cp az://myaccount/mycontainer/results/ ./results/

# Server-side copy between Azure containers
bbb cp az://myaccount/src-container/data/ az://myaccount/dst-container/data/

# Download from Hugging Face (hf:// is source-only)
bbb cp hf://meta-llama/Llama-2-7b/ ./llama-model/

# Copy multiple sources to one destination
bbb cp file1.txt file2.txt az://myaccount/mycontainer/uploads/

# Copy with higher concurrency and retries
bbb cp --concurrency 16 --retry-count 3 ./big-dataset/ az://myaccount/mycontainer/dataset/
Taskfile Mode

Use --taskfile to provide a file of src dst pairs (one per line). See Taskfile for the file format.

# From a file
bbb cp --taskfile tasks.txt

# From stdin (pipe)
echo "local.txt az://myaccount/c/remote.txt" | bbb cp --taskfile -
Crash Recovery with State File

Use --state to resume interrupted copies. See State file for details.

# First run — starts copying and records progress
bbb cp --taskfile tasks.txt --state copy.state

# If interrupted, re-run the same command — already-copied files are skipped
bbb cp --taskfile tasks.txt --state copy.state

rm — Remove files
bbb rm [flags] path [path ...]
Flag Description
-f Ignore nonexistent files
-q, --quiet Suppress output
--concurrency N Number of concurrent deletions (default: CPU cores)
--retry-count N Number of retries on failure (default: 0)

Examples:

# Remove a local file
bbb rm /tmp/old-file.txt

# Remove a blob from Azure
bbb rm az://myaccount/mycontainer/old-model.bin

# Force-remove (no error if missing)
bbb rm -f az://myaccount/mycontainer/maybe-exists.txt

rmtree — Remove a directory tree

Aliases: rmr

Recursively deletes an entire directory and all of its contents.

bbb rmtree [flags] path
Flag Description
-q, --quiet Suppress output
--concurrency N Number of concurrent deletions (default: CPU cores)
--retry-count N Number of retries on failure (default: 0)

Examples:

# Remove a local directory tree
bbb rmtree /tmp/scratch/

# Remove an Azure Blob virtual directory
bbb rmtree az://myaccount/mycontainer/old-experiment/

sync — Synchronise two directory trees

Unidirectional sync: copies new and updated files from source to destination.

bbb sync [flags] src dst
Flag Description
--taskfile FILE Batch task file with one src dst pair per line; use - for stdin
--state FILE State file for crash recovery / resuming interrupted operations
--dry-run Show what would be done without making changes
--delete Delete destination files that don't exist in source
-x, --exclude PATTERN Exclude files matching this regex pattern
-q, --quiet Suppress output
--concurrency N Number of concurrent transfers (default: CPU cores)
--retry-count N Number of retries on failure (default: 0)

Examples:

# Sync a local directory to Azure
bbb sync ./data/ az://myaccount/mycontainer/data/

# Sync from Azure to local
bbb sync az://myaccount/mycontainer/data/ ./local-data/

# Mirror (delete extra files at destination)
bbb sync --delete ./source/ az://myaccount/mycontainer/dest/

# Preview changes without applying
bbb sync --dry-run ./data/ az://myaccount/mycontainer/data/

# Exclude certain file patterns
bbb sync --exclude '\.tmp$' ./project/ az://myaccount/mycontainer/project/

# Sync with taskfile and crash recovery (see Taskfile and State file sections above)
bbb sync --taskfile tasks.txt --state sync.state

md5sum — Compute MD5 checksums
bbb md5sum path [path ...]

Examples:

# Checksum a local file
bbb md5sum ./model.bin

# Checksum an Azure blob
bbb md5sum az://myaccount/mycontainer/model.bin

# Checksum multiple files
bbb md5sum file1.txt file2.txt file3.txt

bbb share path

For Azure Blob paths, prints an Azure Portal link and a direct blob URL. For local files, prints a file:// URL.

Examples:

# Get a shareable link for an Azure blob
bbb share az://myaccount/mycontainer/report.pdf
# Output:
#   Azure Portal: https://portal.azure.com/#blade/...
#   Direct Blob (if public): https://myaccount.blob.core.windows.net/mycontainer/report.pdf

# Get a file:// link for a local file
bbb share ./report.pdf

edit — Open a file in your editor (local only)

Opens a local file in the editor specified by the $EDITOR environment variable (defaults to vi). Creates the file and parent directories if they don't exist. Remote paths (az://, hf://) are not supported.

bbb edit path

Example:

# Edit a local config file
bbb edit /etc/myapp/config.yaml

az mkcontainer — Create an Azure Blob container
bbb az mkcontainer az://account/container

Example:

# Create a new Azure Blob container
bbb az mkcontainer az://myaccount/newcontainer

Benchmark

The Benchmark workflow compares bbb's single-file upload and download throughput against azcopy and boostedblob (the upstream Python bbb, referred to as py-bbb). It guards the parallel transfer paths introduced in #87 and #89 against regressions.

Because boostedblob hardcodes the https://{account}.blob.core.windows.net endpoint (no port or host override), the benchmark serves the Azurite emulator at that exact host. The whole benchmark runs inside Docker Compose (internal/benchmark), mirroring the E2E suite: Azurite runs as its own service (the official image), and the benchmark container generates a local CA (trusted inside the container), points {account}.blob.core.windows.net at the emulator over TLS on port 443, builds bbb, and runs all three tools against the same emulator. Because everything is containerised, the benchmark needs no host privileges, no secrets and no real Azure account, so it runs on every pull request.

The workflow runs on every pull request, on pushes to main, and on demand via Run workflow (workflow_dispatch), where you can set the test-file size, the number of runs, and a fail_factor that fails the job when bbb is slower than the fastest other tool by more than that factor (default 1.05, i.e. a 5% regression gate; set it blank for report-only). Results are written to the job summary as a table.

The emulator is CPU/loopback-bound, so the numbers reflect client-side overhead rather than real network throughput; fail_factor defaults to 1.05 (a 5% gate) and can be set blank to report only.

To run it locally you only need Docker:

cd internal/benchmark
docker compose up --build --abort-on-container-exit --exit-code-from benchmark
# Optional overrides: BENCH_SIZE_MB, BENCH_RUNS, BENCH_FAIL_FACTOR
Real Azure results

The same harness can be pointed at two real Azure Storage accounts:

export BENCH_SRC_ACCOUNT=<src-account>
export BENCH_DST_ACCOUNT=<dst-account>
export BENCH_CONTAINER=bench
export BENCH_SIZES_MB=10,100,500,1000,5000,10000   # comma-separated
export BENCH_RUNS=3
export BENCH_FAIL_FACTOR=99                         # disable the regression gate

go test -count=1 -timeout 0 -v -run TestBenchmark ./internal/benchmark/

Sample numbers (best of 3 runs, concurrency 32, seconds — lower is better).

Test environment:

  • Client node: Standard_E32ads_v5 (32 vCPU AMD EPYC 7763, 256 GiB RAM), AKS in southcentralus.
  • Source account: southcentralus (same region as the client, used for upload and download).
  • Destination account: uksouth (S2S destination — cross-region copy).
  • bbb commit at time of run: 55f4e44 (PR #99 head).
  • Tool versions: bbb (this repo), boostedblob 10.0.0, azcopy 10.32.2.
Upload (s)
Size bbb py-bbb azcopy
10 MiB 0.44 0.70 2.23
100 MiB 0.76 1.43 2.34
500 MiB 1.89 3.30 2.13
1000 MiB 1.26 5.27 2.25
5000 MiB 3.87 17.29 4.15
10000 MiB 6.43 31.69 8.15
Download (s)
Size bbb py-bbb azcopy
10 MiB 0.29 0.60 2.29
100 MiB 0.64 1.42 2.24
500 MiB 1.41 2.34 2.24
1000 MiB 1.57 3.69 2.26
5000 MiB 5.34 10.28 6.52
10000 MiB 9.57 17.98 13.06
Server-to-server copy (s)
Size bbb py-bbb azcopy
10 MiB 3.44 3.53 4.75
100 MiB 4.07 5.02 4.71
500 MiB 4.29 5.50 4.85
1000 MiB 3.91 5.64 4.87
5000 MiB 4.62 11.95 5.90
10000 MiB 6.58 18.58 6.88

Network conditions vary, so treat these as representative rather than absolute; the Azurite-based CI run is the regression gate.

License

See LICENSE for details.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
internal
hf

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL