glassdb

package module

v0.1.0 Latest Latest Go to latest Published: Mar 24, 2026 License: Apache-2.0 Imports: 19 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/mbrt/glassdb

Links

Open Source Insights

README ¶

Glass DB

Glass DB is a pure Go key/value store on top of object storage (e.g. Google Cloud Storage or Azure Blob Service) that is stateless and supports ACID transactions. Clients import Glass DB as a library and don't need to deploy, nor depend on any additional services. Everything is built on top of object storage.

The interface is inspired by BoltDB and Apple's FoundationDB.

Project status

[!WARNING] This is still alpha software.

Transactions should be working correctly and performance could definitely improve. Interfaces and file formats are not stable and can still change at any point. Documentation is almost nil, but I'm planning to work on it very soon.

Note also that currently we only support Google GCS, but adding Azure Blob Storage and Amazon S3 should be very easy.[^1]

Usage example

This short example demonstrates how a transaction can read and modify a single key atomically. The function also returns the value read before the change.

import (
	"context"
	"errors"

	"cloud.google.com/go/storage"
	"github.com/mbrt/glassdb"
	"github.com/mbrt/glassdb/backend"
	"github.com/mbrt/glassdb/backend/gcs"
)

func openDB(ctx context.Context, bucket, dbName string) (*glassdb.DB, error) {
	// See https://pkg.go.dev/cloud.google.com/go/storage for how to initialize
	// a Google Cloud Storage client.
	client, err := storage.NewClient(ctx)
	if err != nil {
		return nil, err
	}
	backend := gcs.New(client.Bucket(bucket))
	return glassdb.Open(ctx, dbName, backend)
}

func example(db *glassdb.DB) (string, error) {
	ctx := context.Background()
	coll := db.Collection([]byte("my-collection"))
	key := []byte("key")
	var res string

	err := db.Tx(ctx, func(tx *glassdb.Tx) error {
		b, err := tx.Read(coll, key)
		// The first time around there's no key, so here we would get an error.
		// In that case we continue below and just write the first 'Hello'.
		if err != nil && !errors.Is(err, backend.ErrNotFound) {
			return err
		}
		res = string(b)
		if string(b) == "Hello" {
			return tx.Write(coll, key, []byte("world!"))
		}
		return tx.Write(coll, key, []byte("Hello"))
	})

	return res, err
}

Why?

A blog post with more details will come soon, but see below for a preview.

This project makes the following specific tradeoffs:

Optimizes for rare conflicts between transactions (optimistic locking).
Readers are rarely blocked.
Clients are completely stateless and ephemeral. For example, they can be scaled down to zero. We avoid explicit coordination between clients (e.g. there's no need for consensus messages).
Requires access to object storage (the lowest latency the better) with requests preconditions (both Google GCS and AWS S3 meet the requirements).
Assumes that, when transactions race each other, it's better to be slow than to be incorrect.
High throughput is better than low latency.
Allows stale reads if explicitly requested, but defaults to strong consistency in all cases.
Values are in the range 1KB to 1MB.

Glass DB makes sense in contexts where there are many writers that rarely write to the same keys or reads are more frequent than writes.

Example 1: User settings

One example could be storing user settings. Every key is dedicated to one user and the value contains all the settings. This way we can update each user independently (and scale horizontally). In the rare case where two updates for the same user arrive concurrently, we don't produce an inconsistent result but retry the transaction.

Example 2: Low frequency updates

The application serves low traffic (e.g. one query per minute). What are the choices today?

Single machine / VM mostly idle.
"Serverless" function with a managed database (for example Google Cloud Run + Cloud SQL, or fly.io).

Neither seem cost effective in the scenario. We are talking about $10 a month, which is not huge, but can we do better?

Yes. With Glass DB you only pay for each query and long term storage. In the case of GCS (as of 2023) we are talking about:

$0.020 per GB per month
$0.05 per 10k write / list ops
$0.004 per 10k read ops

At a rate of one write per minute this would be around $2 a month. Less usage? Even less money.

Example 3: Analytics

Data ingestion can usually be done in parallel and designed in such a way that different processes write independently.

A compaction process can run in parallel to the ingestion, bringing the data in a shape better suited for the query layer.

Compaction and ingestion are mostly independent, but we must make sure to be robust to crashes and restarts (avoiding e.g. double-counting or event duplicates). This can be ensured with transactions provided by Glass DB. If most transactions don't conflict with each other, the throughput will scale mostly linearly (See Performance).

Performance

We are obviously bound by object storage's latencies which are typically:

Operation	Size	Mean (ms)	Std Dev (ms)	Median (ms)	90th % (ms)
Download	1 KiB	57.4	6.6	56.8	64.8
Download	100 KiB	55.4	6.7	53.3	63.1
Download	1 MiB	56.7	3.8	57.7	59.9
Metadata	1 KiB	31.5	8.0	28.1	41.3
Upload	1 KiB	70.4	17.3	64.7	88.8
Upload	100 KiB	88.9	14.6	83.1	105.0
Upload	1 MiB	117.5	12.6	115.9	131.0

This is a lot slower than most databases, but still has a few advantages:

Throughput: we can leverage object storage scalability by reading and writing many objects in parallel. In this way we can perform many transactions per second (scale linearly). We would only be limited by bandwidth (see GCS quotas).
Size scalability: object storage scales to petabytes and probably more, as cloud providers keep working on making them faster and more scalable.

See how this translates in a dataset of 50k keys, where we vary the number of concurrent clients. Each client DB is performing 10 transactions in parallel, split in this way:

10% updates (i.e. read + write) two separate random keys.
60% strong reads to two separate random keys.
30% weak reads to one random key (max staleness of 10s).

For example, with 5 concurrent DBs we would have 50 parallel transactions at every moment.

We did all the tests below by using Google Cloud Storage as a backend.

Throughput

Glass DB's throughput scales mostly linearly with the number of concurrent clients and transactions:

The graph shows separately the three different types of transactions, where the bold line is the median number of transactions per second and the error band includes the 10th and 90th percentiles.

As you can see the median throughput increases linearly (better for reads than for writes), touching 7k transactions per second with 500 concurrent clients.

To note also the slight performance degradation at the 10th percentile with more than 30 concurrent DBs. This is due to the increased probability of conflicts between transactions (e.g. when writers race each other).

See the transaction retries below, hurting performance at the higher percentiles:

Since each transaction operates on multiple keys, here is for completeness the graph of those. Each operation is a key being read or written:

It's interesting to see that weak reads are losing against strong reads in this benchmark, for several reasons:

Given the uniform distribution of reads, it's unlikely that a weak read will hit the same key twice within the 10 seconds allowed staleness time frame.
Weak reads are currently translated into strong reads when the value is not present in cache.
Strong reads operate on two keys in the same transactions, weak reads are a "single shot".

Taken all together this means that weak reads in this case translate mostly into strong reads on a single key. These tend to perform worse (in terms of throughput) than reading two keys per transactions, because they can be done in parallel. The advantage of weak reads in this case comes with less sensitivity to retries, as you can see from the lower variability at higher percentiles.

Latency

Latency is not Glass DB's forte, but you can see below that it stays mostly flat as we increase the number of concurrent clients and transactions:

Here the effects of the retries is more noticeable at higher percentiles, as expected. Some transactions will start taking longer, as they have to drop their work and restart after a conflict.

Deadlocks

Glass DB is currently using a naive approach when it comes to deadlocks. The only detection mechanism is long timeouts. When a transaction cannot make progress for several seconds, it releases all the locks and retries by taking them one by one in a defined order. This is very slow, but ensures that no further deadlocks can occur.

You can see in the graph below how slow transactions can get when these situations occur:

In this example, 5 parallel workers keep competing for the same keys (between 1 and 6), with varying degrees of overlap (up to 100%).

As you can see, it's very easy for transactions to deadlock in this situation, resulting in delays of tens of seconds in the worst case.

This is mostly due to a few factors:

Glass DB is not optimized for deadlocks, nor makes sure deadlocked transactions can always make progress.
GCS throttles writes when they happen to the same object multiple times per second (some bursting is allowed, but in this case we are consistently trying to overwrite the same object).

License

See LICENSE for details.

Disclaimer

This project is not an official Google project. It is not supported by Google and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.

[^1]: S3 added conditional writes only very recently.

Documentation ¶

Overview ¶

Package glassdb implements a key-value database on top of cloud object storage with serializable transactions.

Index ¶

Variables
type Collection
type CollectionsIter
- func (it *CollectionsIter) Err() error
- func (it *CollectionsIter) Next() (name []byte, ok bool)
type DB
- func Open(ctx context.Context, name string, b backend.Backend) (*DB, error)
- func OpenWith(ctx context.Context, name string, b backend.Backend, opts Options) (*DB, error)
type FQKey
type KeysIter
- func (it *KeysIter) Err() error
- func (it *KeysIter) Next() (key []byte, ok bool)
type Options
- func DefaultOptions() Options
type ReadResult
type Stats
- func (s Stats) Sub(other Stats) Stats
type Tx

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrAborted = errors.New("aborted transaction")

ErrAborted is returned when a transaction is explicitly aborted.

Functions ¶

This section is empty.

Types ¶

type Collection ¶

type Collection struct {
	// contains filtered or unexported fields
}

Collection represents a named group of key-value pairs within a database.

func (Collection) Collection ¶

func (c Collection) Collection(name []byte) Collection

Collection returns a sub-collection with the given name.

func (Collection) Collections ¶

func (c Collection) Collections(ctx context.Context) (*CollectionsIter, error)

Collections returns an iterator over the sub-collections in this collection.

func (Collection) Create ¶

func (c Collection) Create(ctx context.Context) error

Create ensures the collection exists in the backend, creating it if necessary.

func (Collection) Delete ¶

func (c Collection) Delete(ctx context.Context, key []byte) error

Delete removes the value associated with key within a transaction.

func (Collection) Keys ¶

func (c Collection) Keys(ctx context.Context) (*KeysIter, error)

Keys returns an iterator over the keys in the collection.

func (Collection) ReadStrong ¶

func (c Collection) ReadStrong(ctx context.Context, key []byte) ([]byte, error)

ReadStrong reads the value for key with strong consistency guarantees.

func (Collection) ReadWeak ¶

func (c Collection) ReadWeak(
	ctx context.Context,
	key []byte,
	maxStaleness time.Duration,
) ([]byte, error)

ReadWeak reads the value for key allowing stale results up to maxStaleness.

func (Collection) Update ¶

func (c Collection) Update(
	ctx context.Context,
	key []byte,
	f func(old []byte) ([]byte, error),
) ([]byte, error)

Update atomically reads the value for key, applies f, and writes the result.

func (Collection) Write ¶

func (c Collection) Write(ctx context.Context, key, value []byte) error

type CollectionsIter ¶

type CollectionsIter struct {
	// contains filtered or unexported fields
}

CollectionsIter iterates over sub-collections within a collection.

func (*CollectionsIter) Err ¶

func (it *CollectionsIter) Err() error

Err returns the first error encountered during iteration.

func (*CollectionsIter) Next ¶

func (it *CollectionsIter) Next() (name []byte, ok bool)

Next advances the iterator and returns the next collection name.

type DB ¶

type DB struct {
	// contains filtered or unexported fields
}

DB represents an open glassdb database instance.

func Open ¶

func Open(ctx context.Context, name string, b backend.Backend) (*DB, error)

Open opens a database with the given name using default options.

func OpenWith ¶

func OpenWith(ctx context.Context, name string, b backend.Backend, opts Options) (*DB, error)

OpenWith opens a database with the given name and custom options.

func (*DB) Close ¶

func (d *DB) Close(context.Context) error

Close releases resources associated with the database.

func (*DB) Collection ¶

func (d *DB) Collection(name []byte) Collection

Collection returns a top-level collection with the given name.

func (*DB) Stats ¶

func (d *DB) Stats() Stats

Stats retrieves ongoing performance stats for the database. This is only updated when a transaction closes.

Counters only increase over time and are never reset. If you need to measure a specific interval only, please see Stats.Sub.

func (*DB) Tx ¶

func (d *DB) Tx(ctx context.Context, f func(tx *Tx) error) error

Tx executes f within a serializable transaction, retrying on conflicts.

type FQKey ¶

type FQKey struct {
	Collection Collection
	Key        []byte
}

FQKey is a fully qualified key: collection + key name.

type KeysIter ¶

type KeysIter struct {
	// contains filtered or unexported fields
}

KeysIter iterates over keys in a collection.

func (*KeysIter) Err ¶

func (it *KeysIter) Err() error

Err returns the first error encountered during iteration.

func (*KeysIter) Next ¶

func (it *KeysIter) Next() (key []byte, ok bool)

Next advances the iterator and returns the next key.

type Options ¶

type Options struct {
	Clock  clockwork.Clock
	Logger *slog.Logger
	// Cache size is the number of bytes dedicated to caching objects and
	// and metadata. Setting this too small may impact performance, as
	// more calls to the backend would be necessary.
	CacheSize int
}

Options makes it possible to tweak a client DB.

TODO: Add retry timing options.

func DefaultOptions ¶

func DefaultOptions() Options

DefaultOptions provides the options used by `Open`. They should be a good middle ground for a production deployment.

type ReadResult ¶

type ReadResult struct {
	Value []byte
	Err   error
}

ReadResult holds the value or error from a single read in ReadMulti.

type Stats ¶

type Stats struct {
	// Transactions statistics.
	TxN       int           // number of completed transactions.
	TxTime    time.Duration // time spent within transactions.
	TxReads   int           // number of reads.
	TxWrites  int           // number of writes.
	TxRetries int           // number of retried transactions.

	// Backend statistics.
	MetaReads  int // number of read metadata.
	MetaWrites int // number of write metadata.
	ObjReads   int // number of read objects.
	ObjWrites  int // number of written objects.
	ObjLists   int // number of list calls.
}

Stats holds cumulative performance counters for a database.

func (Stats) Sub ¶

func (s Stats) Sub(other Stats) Stats

Sub calculates and returns the difference between two sets of transaction stats. This is useful when obtaining stats at two different points in time and you need the performance counters occurred within that time span.

type Tx ¶

type Tx struct {
	// contains filtered or unexported fields
}

Tx represents an active database transaction.

func (*Tx) Abort ¶

func (t *Tx) Abort() error

Abort explicitly aborts the transaction, preventing it from committing.

func (*Tx) Delete ¶

func (t *Tx) Delete(c Collection, key []byte) error

Delete marks the key for deletion within the transaction.

func (*Tx) Read ¶

func (t *Tx) Read(c Collection, key []byte) ([]byte, error)

func (*Tx) ReadMulti ¶

func (t *Tx) ReadMulti(ks []FQKey) []ReadResult

ReadMulti reads multiple keys concurrently within the transaction.

func (*Tx) Write ¶

func (t *Tx) Write(c Collection, key, value []byte) error

Source Files ¶

View all Source files

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
backend Package backend defines the interfaces for object storage backends used by glassdb.	Package backend defines the interfaces for object storage backends used by glassdb.
gcs Package gcs implements the backend interface using Google Cloud Storage.	Package gcs implements the backend interface using Google Cloud Storage.
memory Package memory implements an in-memory backend for testing and development.	Package memory implements an in-memory backend for testing and development.
middleware Package middleware provides decorators for backend implementations, including logging and latency simulation.	Package middleware provides decorators for backend implementations, including logging and latency simulation.
demo Demo demonstrates basic glassdb functionality against a GCS backend.	Demo demonstrates basic glassdb functionality against a GCS backend.
hack
backendbench command Backendbench benchmarks backend storage operation latencies.	Backendbench benchmarks backend storage operation latencies.
debug command Debug provides commands for analyzing glassdb internals such as decoding paths, parsing logs, and generating call graphs.	Debug provides commands for analyzing glassdb internals such as decoding paths, parsing logs, and generating call graphs.
rtbench command Rtbench measures database transaction performance under various concurrency scenarios.	Rtbench measures database transaction performance under various concurrency scenarios.
internal
cache Package cache implements a thread-safe LRU cache with size-based eviction.	Package cache implements a thread-safe LRU cache with size-based eviction.
concurr Package concurr provides concurrency utilities including goroutine management, fan-out execution, and retry with backoff.	Package concurr provides concurrency utilities including goroutine management, fan-out execution, and retry with backoff.
data Package data defines common data types for transaction identifiers and related operations.	Package data defines common data types for transaction identifiers and related operations.
data/paths Package paths encodes and decodes storage paths for database objects.	Package paths encodes and decodes storage paths for database objects.
errors Package errors provides utilities for annotating and combining errors.	Package errors provides utilities for annotating and combining errors.
proto Package proto contains protocol buffer definitions and generated code for internal serialization.	Package proto contains protocol buffer definitions and generated code for internal serialization.
storage Package storage manages global and local storage layers with caching and version tracking.	Package storage manages global and local storage layers with caching and version tracking.
stringset Package stringset helps with common set operations on strings.	Package stringset helps with common set operations on strings.
testkit Package testkit provides testing utilities including fake clocks, in-memory GCS clients, and controllable backends.	Package testkit provides testing utilities including fake clocks, in-memory GCS clients, and controllable backends.
testkit/bench Package bench provides utilities for collecting and analyzing performance measurements.	Package bench provides utilities for collecting and analyzing performance measurements.
trace Package trace provides conditional runtime tracing support controlled by build tags.	Package trace provides conditional runtime tracing support controlled by build tags.
trans Package trans implements the transaction processing algorithm with serializable isolation.	Package trans implements the transaction processing algorithm with serializable isolation.