README
¶
bbb
A Go fork of boostedblob — a fast, concurrent CLI for working with local files, Azure Blob Storage (az://), and Hugging Face (hf://).
Why a fork of boostedblob
- Need a single binary for multi-platform support
- Add login user and network debug logs
Installation
Download the latest release from the Releases page, or build from source:
go install github.com/tg123/bbb@latest
Supported Path Types
| Prefix | Description | Example |
|---|---|---|
| (none) | Local filesystem | /tmp/data/, ./file.txt |
az:// |
Azure Blob Storage | az://myaccount/mycontainer/path/to/blob |
hf:// |
Hugging Face Hub | hf://meta-llama/Llama-2-7b/weights.bin, hf://datasets/org/repo/data.csv |
Global Flags
| Flag | Default | Description |
|---|---|---|
--loglevel |
info |
Log level: debug, info, warn, error (env: BBB_LOG_LEVEL) |
Debug logging example — use --loglevel debug to inspect DNS resolution and the Azure AD token issuer (iss), which is useful for diagnosing connectivity or authentication problems:
bbb --loglevel debug ls az://myaccount/mycontainer/
Example debug output (sensitive fields redacted):
time=... level=DEBUG msg="DNS lookup" host=myaccount.blob.core.windows.net addrs=["198.51.100.1"]
time=... level=DEBUG msg="Decoded JWT payload" payload="{\"aud\":\"https://storage.azure.com\",\"iss\":\"https://sts.windows.net/<tenant-id>/\",…}"
The DNS lookup line shows the resolved IP addresses for the storage account, and the Decoded JWT payload line contains the full token claims including iss (the token issuer) and aud (audience), letting you verify the correct identity and tenant are being used.
Warning: Debug output may include personally identifiable information such as tenant IDs, object IDs, and other token claims. Do not share debug logs publicly or paste them into tickets without redacting sensitive fields.
Environment Variables
| Variable | Default | Description |
|---|---|---|
BBB_LOG_LEVEL |
info |
Same as --loglevel flag |
BBB_DNS_CACHE |
(off) | Set to 1 or true to enable process-local DNS caching |
BBB_DNS_PIN |
(off) | Set to 1 or true to pin DNS to a single IP (implies BBB_DNS_CACHE=1) |
BBB_AZBLOB_ACCOUNTKEY |
Azure Storage shared key for all accounts | |
SRC_BBB_AZBLOB_ACCOUNTKEY |
Shared key for source storage accounts only | |
DST_BBB_AZBLOB_ACCOUNTKEY |
Shared key for destination storage accounts only | |
BBB_PARALLEL_DOWNLOAD |
1 (true) |
Set to 0 or false to disable parallel ranged Azure→local single-file downloads and fall back to a single streaming connection |
BBB_PARALLEL_UPLOAD |
1 (true) |
Set to 0 or false to disable parallel ranged local→Azure single-file uploads and fall back to the streaming UploadStream path |
BBB_AZBLOB_DOWNLOAD_BLOCK_MIB |
16 |
Chunk size in MiB used by the parallel Azure→local download path |
BBB_AZBLOB_UPLOAD_BLOCK_MIB |
64 |
Chunk size in MiB used by the parallel local→Azure upload path (clamped so the total block count stays within Azure's per-blob limit) |
BBB_AZBLOB_DOWNLOAD_CONCURRENCY_MAX |
(auto) | Hard upper bound on in-flight download ranges for the adaptive concurrency controller (default cap 512) |
BBB_AZBLOB_UPLOAD_CONCURRENCY_MAX |
(auto) | Hard upper bound on in-flight upload blocks for the adaptive concurrency controller (default cap 512) |
BBB_AZBLOB_COPY_CONCURRENCY_MAX |
(auto) | Hard upper bound on in-flight blocks for Azure→Azure server-side block copies (default cap 256) |
BBB_AZBLOB_FORCE_S2S |
(off) | Set to 1 or true to force Azure→Azure server-side (S2S) copy. On a server-side copy failure, bbb does not fall back to client-side streaming; instead the error is retried (honouring --retry-count) |
BBB_RETRY_JITTER |
(off) | Go duration (e.g. 500ms, 2s) to wait a random amount of time in [0, value) before each retry attempt. Applies to all operations that honour --retry-count |
Non-Interactive Authentication (AZURE_* Env Vars)
By default, when bbb cannot find shared keys or role-specific credentials it discovers the storage account's tenant and authenticates via the Azure CLI, falling back to an interactive browser login. For CI/CD and other headless environments, set the standard AZURE_* environment variables to authenticate non-interactively:
| Variable | Purpose |
|---|---|
AZURE_TENANT_ID |
Service principal tenant |
AZURE_CLIENT_ID |
Service principal client ID |
AZURE_CLIENT_SECRET |
Service principal secret |
- When
AZURE_TENANT_ID,AZURE_CLIENT_ID, andAZURE_CLIENT_SECRETare all set, bbb uses a service principal credential. AZURE_SUBSCRIPTION_IDis not required for Blob Storage data-plane authentication and is ignored.
These take effect before the interactive CLI/browser flow, so no browser popup is opened when they are configured. Single-endpoint commands (ls, cat, rm, etc.) reuse the SRC role internally, so the unprefixed AZURE_* vars (and SRC_AZURE_*) are honored for them too. For per-account scoping across tenants, use the SRC_ / DST_ prefixed variables described below.
Multi-Tenant / Multi-Account Authentication (SRC_ / DST_ Env Vars)
When copying or syncing between Azure Storage accounts in different tenants (or using different credentials), prefix any standard Azure identity environment variable with SRC_ or DST_ to scope it to source or destination accounts respectively.
The unprefixed AZURE_* variables act as shared defaults: a plain AZURE_xxx is interpreted as if it were set for both SRC_AZURE_xxx and DST_AZURE_xxx, and the role-prefixed variant overrides it when present. This lets a single set of AZURE_* vars authenticate both sides while still allowing per-role overrides.
bbb uses DefaultAzureCredential under the hood, so all credential types are supported: service principal (secret or certificate), workload identity (OIDC / AKS), managed identity, and Azure CLI.
Supported env vars — prefix with SRC_ or DST_:
| Variable | Category |
|---|---|
AZURE_CLIENT_ID |
Core identity |
AZURE_TENANT_ID |
Core identity |
AZURE_CLIENT_SECRET |
Service principal (secret) |
AZURE_CLIENT_CERTIFICATE_PATH |
Service principal (certificate) |
AZURE_CLIENT_CERTIFICATE_PASSWORD |
Service principal (certificate) |
AZURE_CLIENT_SEND_CERTIFICATE_CHAIN |
Service principal (certificate) |
AZURE_FEDERATED_TOKEN_FILE |
Workload identity (OIDC / AKS) |
IDENTITY_ENDPOINT |
Managed identity |
IDENTITY_HEADER |
Managed identity |
MSI_ENDPOINT |
Managed identity |
MSI_SECRET |
Managed identity |
IMDS_ENDPOINT |
Managed identity |
AZURE_AUTHORITY_HOST |
Cloud / authority |
AZURE_USERNAME |
Developer cache hint |
AZURE_CONFIG_DIR |
Azure CLI integration |
BBB_AZBLOB_ACCOUNTKEY |
Shared key (bbb-specific) |
Example — service principal per tenant:
# Source tenant credentials
export SRC_AZURE_TENANT_ID=<tenant-a>
export SRC_AZURE_CLIENT_ID=<sp-a-id>
export SRC_AZURE_CLIENT_SECRET=<sp-a-secret>
# Destination tenant credentials
export DST_AZURE_TENANT_ID=<tenant-b>
export DST_AZURE_CLIENT_ID=<sp-b-id>
export DST_AZURE_CLIENT_SECRET=<sp-b-secret>
bbb cp az://src-account/container/ az://dst-account/container/
Example — shared key per account:
export SRC_BBB_AZBLOB_ACCOUNTKEY=<key-for-source>
export DST_BBB_AZBLOB_ACCOUNTKEY=<key-for-destination>
bbb sync az://src-account/data/ az://dst-account/data/
Credential resolution order (first match wins):
- Shared key (
SRC_BBB_AZBLOB_ACCOUNTKEY/DST_BBB_AZBLOB_ACCOUNTKEY, orBBB_AZBLOB_ACCOUNTKEY) - Role env credential via
DefaultAzureCredential—SRC_AZURE_*/DST_AZURE_*, falling back to unprefixedAZURE_*defaults. Accounts without an explicit role (single-endpoint commands likels/cat/rm) reuse theSRCrole, so unprefixedAZURE_*(andSRC_AZURE_*) are honored for them. - Tenant-specific AzureCLI credential (auto-discovered from storage endpoint)
- Interactive browser login (fallback)
BBB_DNS_CACHE
When enabled, bbb caches DNS resolution results in memory so that repeated connections to the same hostname (e.g. an Azure Storage endpoint) skip the DNS lookup. Cached entries expire after 5 minutes.
BBB_DNS_CACHE=1 bbb cp ./data/ az://myaccount/mycontainer/data/
The cache is installed on the shared HTTP transport used by all outbound traffic in the process, including the Azure SDK (data-plane requests, user-delegation-key / UDC acquisition, and OAuth token calls to login.microsoftonline.com) and Hugging Face API calls.
Caveats:
- DNS records that change during the TTL window (e.g. IP rotations) will not be picked up until the cached entry expires.
- Because cached addresses are dialled as IP literals, Go's standard Happy Eyeballs (RFC 6555) connection racing is bypassed. For Azure Blob Storage endpoints (typically single-stack) this has no practical impact.
az login(when falling back toAzureCLICredential) shells out to theazbinary; DNS resolution for that subprocess happens in the child and is not affected by these variables.
BBB_DNS_PIN
When enabled, bbb pins every hostname to a single IP address. If DNS returns multiple addresses, only the first reachable address is used for all connections to that host. This implicitly enables BBB_DNS_CACHE with unlimited TTL (BBB_DNS_CACHE_TTL is ignored). Pinning can help avoid 403 errors from services that tie authentication tokens to a specific endpoint IP.
BBB_DNS_PIN=1 bbb cp ./data/ az://myaccount/mycontainer/data/
The pin applies to all outbound HTTP traffic in the process, including Azure SDK data-plane requests, UDC acquisition, and OAuth token calls, as well as Hugging Face API calls. This matches the semantics of a /etc/hosts override but limited to the bbb process.
Caveats:
- Pinned entries never refresh. Long-lived processes will not pick up DNS or IP rotations and may require a restart (or disabling pinning) to recover if the pinned IP becomes unreachable.
- If the first resolved address is unreachable (e.g. an IPv6 address in an IPv4-only environment), bbb will try the remaining addresses and pin to the first one that successfully connects.
AzureCLICredentialshells out to theazbinary; its DNS resolution runs in the child process and is not covered by the pin. MSAL-cached tokens on disk are also unaffected — occasional re-authentication may still occur.
Taskfile
A taskfile is a plain-text file with one src dst pair per line, separated by whitespace. Empty lines are ignored.
Note: Paths containing spaces are not supported in the taskfile format because fields are split on whitespace. Use
cporsyncwith positional arguments instead for such paths.
Example tasks.txt:
./data/model.bin az://myaccount/mycontainer/models/model.bin
./data/config.json az://myaccount/mycontainer/models/config.json
./data/vocab.txt az://myaccount/mycontainer/models/vocab.txt
Use --taskfile to pass the file to cp or sync. Use - to read from stdin:
# Copy all pairs listed in the taskfile
bbb cp --taskfile tasks.txt
# Sync all pairs listed in the taskfile
bbb sync --taskfile tasks.txt
# Pipe pairs from another command
find ./models -name '*.bin' | awk '{print $0, "az://myaccount/mycontainer/"$0}' | bbb cp --taskfile -
State file
A state file tracks completed work so interrupted operations can be resumed. Pass --state to cp or sync and re-run the same command after a failure — already-finished items are skipped automatically.
# Start a large copy with crash recovery
bbb cp --taskfile tasks.txt --state copy.state
# If the process is interrupted, re-run the exact same command.
# Completed files are skipped; only remaining work is executed.
bbb cp --taskfile tasks.txt --state copy.state
--state also works without --taskfile:
bbb cp --state copy.state ./huge-dataset/ az://myaccount/mycontainer/dataset/
The state file is a plain-text append-only log. Each successfully copied file is recorded as src -> dst, and when all files in a taskfile pair are finished the pair is marked complete with a TASK\t prefix so the entire pair can be skipped on resume:
./data/model.bin -> az://myaccount/mycontainer/models/model.bin
./data/config.json -> az://myaccount/mycontainer/models/config.json
TASK ./data/model.bin -> az://myaccount/mycontainer/models/model.bin
Commands
ls — List directory contents
List files and directories at a given path.
bbb ls [flags] [path]
| Flag | Description |
|---|---|
-l, --long |
Show file type, size, and modification time |
-a |
Include hidden files (entries starting with .) |
-s, --relative |
Show relative paths instead of full paths |
--machine |
Machine-readable tab-separated output |
Examples:
# List local directory
bbb ls /tmp/data/
# Long listing of an Azure Blob container
bbb ls -l az://myaccount/mycontainer/
# List Hugging Face repo files with relative paths
bbb ls -s hf://meta-llama/Llama-2-7b/
ll — Long listing (alias for ls -l)
Aliases: du
bbb ll [flags] [path]
| Flag | Description |
|---|---|
-s, --relative |
Show relative paths |
--machine |
Machine-readable tab-separated output |
Example:
# Show sizes and timestamps of blobs in a container
bbb ll az://myaccount/mycontainer/models/
lstree — Recursively list all files
Aliases: lsr
Recursively lists all files (not directories) under a path, with a summary of total count and size.
bbb lstree [flags] [path]
| Flag | Description |
|---|---|
-l, --long |
Show file type, size, and modification time |
-s, --relative |
Show relative paths |
--machine |
Machine-readable tab-separated output |
Examples:
# Recursively list all files under a local directory
bbb lstree /home/user/project/
# Machine-readable recursive listing of a blob container
bbb lstree --machine az://myaccount/mycontainer/data/
llr — Long recursive file list
Equivalent to lstree -l.
bbb llr [flags] [path]
| Flag | Description |
|---|---|
-s, --relative |
Show relative paths |
--machine |
Machine-readable tab-separated output |
Example:
bbb llr az://myaccount/mycontainer/
cat — Print file contents to stdout
bbb cat path [path ...]
Examples:
# Print a local file
bbb cat /tmp/config.yaml
# Print a blob from Azure
bbb cat az://myaccount/mycontainer/config.json
# Print multiple files
bbb cat file1.txt file2.txt
touch — Create or ensure file exists
Creates empty files if they don't exist. For local files, also updates the modification timestamp. For Azure blobs, creates an empty blob only when it doesn't already exist — it does not update the timestamp on existing blobs.
bbb touch path [path ...]
Examples:
# Create an empty local file
bbb touch /tmp/newfile.txt
# Touch a blob in Azure
bbb touch az://myaccount/mycontainer/marker.txt
cp — Copy files or directories
Aliases: cpr, cptree
Copy one or more source files/directories to a destination. Supports local and Azure Blob paths in any combination. Hugging Face (hf://) paths are supported as a source only (the hf:// backend is read-only).
bbb cp [flags] src [src ...] dst
| Flag | Description |
|---|---|
--taskfile FILE |
Batch task file with one src dst pair per line; use - for stdin |
--state FILE |
State file for crash recovery / resuming interrupted operations |
-f |
Force overwrite existing files |
-q, --quiet |
Suppress output |
--concurrency N |
Number of concurrent transfers (default: CPU cores) |
--retry-count N |
Number of retries on failure (default: 0) |
Examples:
# Copy a local file to Azure Blob Storage
bbb cp ./model.bin az://myaccount/mycontainer/models/model.bin
# Copy an entire directory to Azure
bbb cp ./data/ az://myaccount/mycontainer/data/
# Download from Azure to local
bbb cp az://myaccount/mycontainer/results/ ./results/
# Server-side copy between Azure containers
bbb cp az://myaccount/src-container/data/ az://myaccount/dst-container/data/
# Download from Hugging Face (hf:// is source-only)
bbb cp hf://meta-llama/Llama-2-7b/ ./llama-model/
# Copy multiple sources to one destination
bbb cp file1.txt file2.txt az://myaccount/mycontainer/uploads/
# Copy with higher concurrency and retries
bbb cp --concurrency 16 --retry-count 3 ./big-dataset/ az://myaccount/mycontainer/dataset/
Taskfile Mode
Use --taskfile to provide a file of src dst pairs (one per line). See Taskfile for the file format.
# From a file
bbb cp --taskfile tasks.txt
# From stdin (pipe)
echo "local.txt az://myaccount/c/remote.txt" | bbb cp --taskfile -
Crash Recovery with State File
Use --state to resume interrupted copies. See State file for details.
# First run — starts copying and records progress
bbb cp --taskfile tasks.txt --state copy.state
# If interrupted, re-run the same command — already-copied files are skipped
bbb cp --taskfile tasks.txt --state copy.state
rm — Remove files
bbb rm [flags] path [path ...]
| Flag | Description |
|---|---|
-f |
Ignore nonexistent files |
-q, --quiet |
Suppress output |
--concurrency N |
Number of concurrent deletions (default: CPU cores) |
--retry-count N |
Number of retries on failure (default: 0) |
Examples:
# Remove a local file
bbb rm /tmp/old-file.txt
# Remove a blob from Azure
bbb rm az://myaccount/mycontainer/old-model.bin
# Force-remove (no error if missing)
bbb rm -f az://myaccount/mycontainer/maybe-exists.txt
rmtree — Remove a directory tree
Aliases: rmr
Recursively deletes an entire directory and all of its contents.
bbb rmtree [flags] path
| Flag | Description |
|---|---|
-q, --quiet |
Suppress output |
--concurrency N |
Number of concurrent deletions (default: CPU cores) |
--retry-count N |
Number of retries on failure (default: 0) |
Examples:
# Remove a local directory tree
bbb rmtree /tmp/scratch/
# Remove an Azure Blob virtual directory
bbb rmtree az://myaccount/mycontainer/old-experiment/
sync — Synchronise two directory trees
Unidirectional sync: copies new and updated files from source to destination.
bbb sync [flags] src dst
| Flag | Description |
|---|---|
--taskfile FILE |
Batch task file with one src dst pair per line; use - for stdin |
--state FILE |
State file for crash recovery / resuming interrupted operations |
--dry-run |
Show what would be done without making changes |
--delete |
Delete destination files that don't exist in source |
-x, --exclude PATTERN |
Exclude files matching this regex pattern |
-q, --quiet |
Suppress output |
--concurrency N |
Number of concurrent transfers (default: CPU cores) |
--retry-count N |
Number of retries on failure (default: 0) |
Examples:
# Sync a local directory to Azure
bbb sync ./data/ az://myaccount/mycontainer/data/
# Sync from Azure to local
bbb sync az://myaccount/mycontainer/data/ ./local-data/
# Mirror (delete extra files at destination)
bbb sync --delete ./source/ az://myaccount/mycontainer/dest/
# Preview changes without applying
bbb sync --dry-run ./data/ az://myaccount/mycontainer/data/
# Exclude certain file patterns
bbb sync --exclude '\.tmp$' ./project/ az://myaccount/mycontainer/project/
# Sync with taskfile and crash recovery (see Taskfile and State file sections above)
bbb sync --taskfile tasks.txt --state sync.state
md5sum — Compute MD5 checksums
bbb md5sum path [path ...]
Examples:
# Checksum a local file
bbb md5sum ./model.bin
# Checksum an Azure blob
bbb md5sum az://myaccount/mycontainer/model.bin
# Checksum multiple files
bbb md5sum file1.txt file2.txt file3.txt
share — Print browser-accessible link for a file
bbb share path
For Azure Blob paths, prints an Azure Portal link and a direct blob URL. For local files, prints a file:// URL.
Examples:
# Get a shareable link for an Azure blob
bbb share az://myaccount/mycontainer/report.pdf
# Output:
# Azure Portal: https://portal.azure.com/#blade/...
# Direct Blob (if public): https://myaccount.blob.core.windows.net/mycontainer/report.pdf
# Get a file:// link for a local file
bbb share ./report.pdf
edit — Open a file in your editor (local only)
Opens a local file in the editor specified by the $EDITOR environment variable (defaults to vi). Creates the file and parent directories if they don't exist. Remote paths (az://, hf://) are not supported.
bbb edit path
Example:
# Edit a local config file
bbb edit /etc/myapp/config.yaml
az mkcontainer — Create an Azure Blob container
bbb az mkcontainer az://account/container
Example:
# Create a new Azure Blob container
bbb az mkcontainer az://myaccount/newcontainer
Benchmark
The Benchmark workflow compares bbb's
single-file upload and download throughput against
azcopy
and boostedblob (the upstream
Python bbb, referred to as py-bbb). It guards the parallel transfer paths
introduced in #87 and
#89 against regressions.
Because boostedblob hardcodes the https://{account}.blob.core.windows.net
endpoint (no port or host override), the benchmark serves the
Azurite emulator at that exact host. The
whole benchmark runs inside Docker Compose
(internal/benchmark), mirroring the
E2E suite: Azurite runs as its own service (the
official image), and the benchmark container generates a local CA (trusted inside
the container), points {account}.blob.core.windows.net at the emulator over TLS
on port 443, builds bbb, and runs all three tools against the same emulator.
Because everything is containerised, the benchmark needs no host privileges, no
secrets and no real Azure account, so it runs on every pull request.
The workflow runs on every pull request, on pushes to main, and on demand via
Run workflow (workflow_dispatch), where you can set the test-file size, the
number of runs, and a fail_factor that fails the job when bbb is slower than
the fastest other tool by more than that factor (default 1.05, i.e. a 5%
regression gate; set it blank for report-only). Results are written to the job
summary as a table.
The emulator is CPU/loopback-bound, so the numbers reflect client-side overhead rather than real network throughput;
fail_factordefaults to1.05(a 5% gate) and can be set blank to report only.
To run it locally you only need Docker:
cd internal/benchmark
docker compose up --build --abort-on-container-exit --exit-code-from benchmark
# Optional overrides: BENCH_SIZE_MB, BENCH_RUNS, BENCH_FAIL_FACTOR
Real Azure results
The same harness can be pointed at two real Azure Storage accounts:
export BENCH_SRC_ACCOUNT=<src-account>
export BENCH_DST_ACCOUNT=<dst-account>
export BENCH_CONTAINER=bench
export BENCH_SIZES_MB=10,100,500,1000,5000,10000 # comma-separated
export BENCH_RUNS=3
export BENCH_FAIL_FACTOR=99 # disable the regression gate
go test -count=1 -timeout 0 -v -run TestBenchmark ./internal/benchmark/
Sample numbers (best of 3 runs, concurrency 32, seconds — lower is better).
Test environment:
- Client node:
Standard_E32ads_v5(32 vCPU AMD EPYC 7763, 256 GiB RAM), AKS insouthcentralus. - Source account:
southcentralus(same region as the client, used for upload and download). - Destination account:
uksouth(S2S destination — cross-region copy). - bbb commit at time of run:
55f4e44(PR #99 head). - Tool versions: bbb (this repo), boostedblob 10.0.0, azcopy 10.32.2.
Upload (s)
| Size | bbb | py-bbb | azcopy |
|---|---|---|---|
| 10 MiB | 0.44 | 0.70 | 2.23 |
| 100 MiB | 0.76 | 1.43 | 2.34 |
| 500 MiB | 1.89 | 3.30 | 2.13 |
| 1000 MiB | 1.26 | 5.27 | 2.25 |
| 5000 MiB | 3.87 | 17.29 | 4.15 |
| 10000 MiB | 6.43 | 31.69 | 8.15 |
Download (s)
| Size | bbb | py-bbb | azcopy |
|---|---|---|---|
| 10 MiB | 0.29 | 0.60 | 2.29 |
| 100 MiB | 0.64 | 1.42 | 2.24 |
| 500 MiB | 1.41 | 2.34 | 2.24 |
| 1000 MiB | 1.57 | 3.69 | 2.26 |
| 5000 MiB | 5.34 | 10.28 | 6.52 |
| 10000 MiB | 9.57 | 17.98 | 13.06 |
Server-to-server copy (s)
| Size | bbb | py-bbb | azcopy |
|---|---|---|---|
| 10 MiB | 3.44 | 3.53 | 4.75 |
| 100 MiB | 4.07 | 5.02 | 4.71 |
| 500 MiB | 4.29 | 5.50 | 4.85 |
| 1000 MiB | 3.91 | 5.64 | 4.87 |
| 5000 MiB | 4.62 | 11.95 | 5.90 |
| 10000 MiB | 6.58 | 18.58 | 6.88 |
Network conditions vary, so treat these as representative rather than absolute; the Azurite-based CI run is the regression gate.
License
See LICENSE for details.
Documentation
¶
There is no documentation for this package.