gcp

package

v0.2.0 Latest Latest Go to latest Published: May 29, 2025 License: Apache-2.0 Imports: 33 Imported by: 2

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/transparency-dev/tessera

Links

Open Source Insights

README ¶

GCP Design

This document describes how the storage implementation for running Tessera on Google Cloud is intended to work.

Overview

This design takes advantage of GCS for long term storage and low cost & complexity serving of read traffic, but leverages something more transactional for coordinating the cluster.

New entries flow in from the binary built with Tessera into transactional storage, where they're held temporarily to batch them up, and then assigned sequence numbers as each batch is flushed. This allows the Add API call to quickly return with durably assigned sequence numbers.

From there, an async process derives the entry bundles and Merkle tree structure from the sequenced batches, writes these to GCS for serving, before finally removing integrated bundles from the transactional storage.

Since entries are all sequenced by the time they're stored, and sequencing is done in "chunks", it's worth noting that all tree derivations are therefore idempotent.

Transactional storage

The transactional storage is implemented with Cloud Spanner, and uses a schema with the following tables:

`Tessera`

This table is used to identify the current schema version.

`SeqCoord`

A table with a single row which is used to keep track of the next assignable sequence number.

`Seq`

This holds batches of entries keyed by the sequence number assigned to the first entry in the batch.

`IntCoord`

This table is used to coordinate integration of sequenced batches in the Seq table.

`PubCoord`

This table is used to coordinate publication of new checkpoints, ensuring that checkpoints are not published more frequently than configured.

Life of a leaf

Leaves are submitted by the binary built using Tessera via a call the storage's Add func.
The storage library batches these entries up, and, after a configurable period of time has elapsed or the batch reaches a configurable size threshold, the batch is written to the Seq table which effectively assigns a sequence numbers to the entries using the following algorithm: In a transaction:
1. selects next from SeqCoord with for update ← this blocks other FE from writing their pools, but only for a short duration.
2. Inserts batch of entries into Seq with key SeqCoord.next
3. Update SeqCoord with next+=len(batch)
Newly sequenced entries are periodically appended to the tree: In a transaction:
1. select seq from IntCoord with for update ← this blocks other integrators from proceeding.
2. Select one or more consecutive batches from Seq for update, starting at IntCoord.seq
3. Write leaf bundles to GCS using batched entries
4. Integrate in Merkle tree and write tiles to GCS
5. Delete consumed batches from Seq
6. Update IntCoord with seq+=num_entries_integrated and the latest rootHash
Checkpoints representing the latest state of the tree are published at the configured interval.

Dedup

An experimental implementation has been tested which uses Spanner to store the <identity_hash> --> sequence mapping. This works well using "slack" Spanner CPU available in the smallest possible footprint, and consequently is comparably cheap requiring only extra Spanner storage costs.

Alternatives considered

Other transactional storage systems are available on GCP, e.g. CloudSQL or AlloyDB. Experiments were run using CloudSQL (MySQL), AlloyDB, and Spanner.

Spanner worked out to be the cheapest while also removing much of the administrative overhead which would come from even a managed MySQL instance, and so was selected.

The experimental implementation was tested to around 1B entries of 1KB each at a write rate of 1500/s. This was done using the smallest possible Spanner alloc of 100 Processing Units.

Documentation ¶

Overview ¶

Package gcp contains a GCP-based storage implementation for Tessera.

TODO: decide whether to rename this package.

This storage implementation uses GCS for long-term storage and serving of entry bundles and log tiles, and Spanner for coordinating updates to GCS when multiple instances of a personality binary are running.

A single GCS bucket is used to hold entry bundles and log internal tiles. The object keys for the bucket are selected so as to conform to the expected layout of a tile-based log.

A Spanner database provides a transactional mechanism to allow multiple frontends to safely update the contents of the log.

Index ¶

Constants
func New(ctx context.Context, cfg Config) (tessera.Driver, error)
type Appender
- func (a *Appender) Add(ctx context.Context, e *tessera.Entry) tessera.IndexFuture
type Config
type LogReader
type MigrationStorage
type Storage
- func (s *Storage) Appender(ctx context.Context, opts *tessera.AppendOptions) (*tessera.Appender, tessera.LogReader, error)
- func (s *Storage) MigrationWriter(ctx context.Context, opts *tessera.MigrationOptions) (migrate.MigrationWriter, tessera.LogReader, error)

Constants ¶

View Source

const (
	DefaultIntegrationSizeLimit = 5 * 4096

	// SchemaCompatibilityVersion represents the expected version (e.g. layout & serialisation) of stored data.
	//
	// A binary built with a given version of the Tessera library is compatible with stored data created by a different version
	// of the library if and only if this value is the same as the compatibilityVersion stored in the Tessera table.
	//
	// NOTE: if changing this version, you need to consider whether end-users are going to update their schema instances to be
	// compatible with the new format, and provide a means to do it if so.
	SchemaCompatibilityVersion = 1
)

Variables ¶

This section is empty.

Functions ¶

func New ¶

func New(ctx context.Context, cfg Config) (tessera.Driver, error)

New creates a new instance of the GCP based Storage.

Types ¶

type Appender ¶

type Appender struct {
	// contains filtered or unexported fields
}

Appender is an implementation of the Tessera appender lifecycle contract.

func (*Appender) Add ¶

func (a *Appender) Add(ctx context.Context, e *tessera.Entry) tessera.IndexFuture

Add is the entrypoint for adding entries to a sequencing log.

type Config ¶

type Config struct {
	// Bucket is the name of the GCS bucket to use for storing log state.
	Bucket string
	// BucketPrefix is an optional prefix to prepend to all log resource paths.
	// This can be used e.g. to store multiple logs in the same bucket.
	BucketPrefix string
	// Spanner is the GCP resource URI of the spanner database instance to use.
	Spanner string
}

Config holds GCP project and resource configuration for a storage instance.

type LogReader ¶

type LogReader struct {
	// contains filtered or unexported fields
}

func (*LogReader) IntegratedSize ¶

func (lr *LogReader) IntegratedSize(ctx context.Context) (uint64, error)

func (*LogReader) NextIndex ¶

func (lr *LogReader) NextIndex(ctx context.Context) (uint64, error)

func (*LogReader) ReadCheckpoint ¶

func (lr *LogReader) ReadCheckpoint(ctx context.Context) ([]byte, error)

func (*LogReader) ReadEntryBundle ¶

func (lr *LogReader) ReadEntryBundle(ctx context.Context, i uint64, p uint8) ([]byte, error)

func (*LogReader) ReadTile ¶

func (lr *LogReader) ReadTile(ctx context.Context, l, i uint64, p uint8) ([]byte, error)

func (*LogReader) StreamEntries ¶

func (lr *LogReader) StreamEntries(ctx context.Context, startEntry, N uint64) iter.Seq2[stream.Bundle, error]

type MigrationStorage ¶

type MigrationStorage struct {
	// contains filtered or unexported fields
}

MigrationStorgage implements the tessera.MigrationTarget lifecycle contract.

func (*MigrationStorage) AwaitIntegration ¶

func (m *MigrationStorage) AwaitIntegration(ctx context.Context, sourceSize uint64) ([]byte, error)

func (*MigrationStorage) IntegratedSize ¶

func (m *MigrationStorage) IntegratedSize(ctx context.Context) (uint64, error)

func (*MigrationStorage) SetEntryBundle ¶

func (m *MigrationStorage) SetEntryBundle(ctx context.Context, index uint64, partial uint8, bundle []byte) error

type Storage ¶

type Storage struct {
	// contains filtered or unexported fields
}

Storage is a GCP based storage implementation for Tessera.

func (*Storage) Appender ¶

func (s *Storage) Appender(ctx context.Context, opts *tessera.AppendOptions) (*tessera.Appender, tessera.LogReader, error)

func (*Storage) MigrationWriter ¶

func (s *Storage) MigrationWriter(ctx context.Context, opts *tessera.MigrationOptions) (migrate.MigrationWriter, tessera.LogReader, error)

MigrationWriter creates a new GCP storage for the MigrationTarget lifecycle mode.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
antispam Package gcp contains a GCP-based antispam implementation for Tessera.	Package gcp contains a GCP-based antispam implementation for Tessera.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL