agro

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 27, 2016 License: Apache-2.0 Imports: 15 Imported by: 0

README

agro

A Go Distributed Storage Engine

See the wiki for more details

Overview

Agro is a distributed block storage engine that provides a resource pool and basic file primitives from daemons running atop a cluster. These primitives are made consistent by being append-only and coordinated by etcd. From these primitives, an agro server can support multiple types of volumes, the semantics of which can be broken into subprojects. It ships with a simple block-device volume plugin.

The goal from the start is simplicity; running agro should take at most 5 minutes for a developer to set up and understand, while being as robust as possible.

Sharding is done via a consistent hash function, controlled in the simple case by a hash ring algorithm, but fully extensible to arbitrary maps, rack-awareness, and other nice features.

Getting Started

0) Build agro
go get github.com/coreos/agro
go get -d github.com/coreos/agro

Then one of:

go install -v github.com/coreos/agro/cmd/agro
go install -v github.com/coreos/agro/cmd/agroctl
go install -v github.com/coreos/agro/cmd/agromount

or

cd $GOPATH/src/github.com/coreos/agro
make

Either way you'll find the binaries agro, agromount and agroctl.

1) Get etcd

You need a recent etcd, as agro uses the v3 API natively and depends on some fixes therein. etcd v2.3.0-alpha1 or above is required.

Make sure to run etcd with the v3 API turned on:

etcd --experimental-v3demo --experimental-gRPC-addr 127.0.0.1:2378 --data-dir /tmp/etcd

Clustering etcd v2.3 is left as an exercise to the reader but it's a pretty common thing to do if you're running on CoreOS.

2) mkfs

We need to initialize the storage keys in etcd. This sets the fixed, global settings for the storage cluster, much like formatting a block device. Fortunately, the default settings should suffice for most cases.

agroctl mkfs

And you're ready!

If agroctl can't connect to etcd, it takes the -C flag, just like etcdctl

agroctl -C $ETCD_IP:2378 mkfs

(This remains true for all uses of agro binaries)

If you're curious about the other settings,

agroctl mkfs --help

will tell you more.

3) Run some storage nodes
Running manually
./agro --etcd 127.0.0.1:2378 --peer-address $MY_IP:40000 --data-dir /path/to/data --size 20GiB

This runs a storage node without HTTP. Add --host and --port to open the HTTP endpoint

(TODO: When gRPC on the same port is stable, default to peer-address for HTTP as well)

Multiple instances can be run, so long as the ports don't conflict and you keep separate data dirs.

Running with Docker
With Host Networking
docker run \
--net=host \
-v /path/to/data1:/data \
-e STORAGE_SIZE=20GiB \
-e LISTEN_HOST=$MY_PUBLIC_IP \
-e LISTEN_HTTP_PORT=4321 \
-e LISTEN_PEER_PORT=40000 \
-e ETCD_HOST=127.0.0.1 \
quay.io/coreos/agro

If you want to run more than one storage node on the host, you can do so by offsetting the ports.

Non-host networking

You'll need to figure out non-host networking where all storage nodes are on the same subnet. Flannel, et al, are recommended here. But if you're okay with your docker networking...

docker run \
-v /path/to/data1:/data \
-e STORAGE_SIZE=20GiB \
-e ETCD_HOST=127.0.0.1 \
quay.io/coreos/agro
Running on Kubernetes

In the folder you'll find agro-daemon-set.yaml. This example daemonset is almost all you need.

4) Check that everything is reporting in
agroctl list-peers

Should show your data nodes and their reporting status. Eg:

+-----------------+--------------------------------------+---------+------+---------------+--------------+
|     ADDRESS     |                 UUID                 |  SIZE   | USED |    UPDATED    | REB/REP DATA |
+-----------------+--------------------------------------+---------+------+---------------+--------------+
| 127.0.0.1:40000 | babecd8e-d4fc-11e5-a91f-5ce0c5527cf4 | 2.0 GiB | 0 B  | 2 seconds ago | 0 B/sec      |
| 127.0.0.1:40001 | babee2dd-d4fc-11e5-b486-5ce0c5527cf4 | 2.0 GiB | 0 B  | 2 seconds ago | 0 B/sec      |
| 127.0.0.1:40002 | babee99a-d4fc-11e5-a3e3-5ce0c5527cf4 | 1.0 GiB | 0 B  | 2 seconds ago | 0 B/sec      |
| 127.0.0.1:40003 | cb6ee7cb-d4fc-11e5-aff4-5ce0c5527cf4 | 1.0 GiB | 0 B  | 4 seconds ago | 0 B/sec      |
+-----------------+--------------------------------------+---------+------+---------------+--------------+
Balanced: true
5) Activate storage on the peers
agroctl peer add --all-peers

Will immediately impress the peers shown in list-peers into service, storing data. Peers can be added one (or a couple) at a time via:

agroctl peer add $PEER_IP:$PEER_PORT [$PEER_UUID...]

To see which peers are in service (and other sharding details):

agroctl ring get

To remove a node from service:

agroctl peer remove $PEER_IP:$PEER_PORT

Draining of peers will happen automatically. If this is a hard removal (ie, the node is gone forever) just remove it, and data will rereplicate automatically. Doing multiple hard removals above the replication threshold may result in data loss. However, this is common practice to anyone that's ever worked with the fault tolerance in RAID levels..

Even better fault tolerance with erasure codes and parity is an advanced topic TBD.

6) Create a volume
agroctl volume create-block myVolume 10GiB

This creates a 10GiB virtual blockfile for use. It will be safely replicated and CRC checked, by default.

7) Mount that volume via NBD
sudo modprobe nbd
sudo agromount --etcd 127.0.0.1:2378 nbd myVolume /dev/nbd0

Specifying /dev/nbd0 is optional -- it will pick the first available.

The mount process is similar to FUSE for a block device; it will disconnect when killed, so make sure it's synced and unmounted.

At this point, you have a replicated, highly-available block device connected to your machine. You can format it and mount it as you'd expect:

sudo mkfs.ext4 /dev/nbd0
sudo mount /dev/nbd0 -o discard,noatime /mnt/agro

It supports the TRIM SSD command for garbage collecting; -o discard enables this.

It is recommended (though not required) to use a log-structured filesystem on these devices, to minimize the chance of corruption. F2FS is a good choice, and included in the kernel.

Documentation

Index

Constants

View Source
const (
	CtxWriteLevel int = iota
	CtxReadLevel
)
View Source
const BlockRefByteSize = 8 * 3
View Source
const INodeRefByteSize = 8 * 2
View Source
const VolumeIDByteSize = 5
View Source
const (
	VolumeMax = 0x000000FFFFFFFFFF
)

Variables

View Source
var (
	// ErrBlockUnavailable is returned when a function fails to retrieve a known
	// block.
	ErrBlockUnavailable = errors.New("agro: block cannot be retrieved")

	// ErrINodeUnavailable is returned when a function fails to retrieve a known
	// INode.
	ErrINodeUnavailable = errors.New("agro: inode cannot be retrieved")

	// ErrBlockNotExist is returned when a function attempts to manipulate a
	// non-existant block.
	ErrBlockNotExist = errors.New("agro: block doesn't exist")

	// ErrClosed is returned when a function attempts to manipulate a Store
	// that is not currently open.
	ErrClosed = errors.New("agro: store is closed")

	// ErrInvalid is a locally invalid operation (such as Close()ing a nil file pointer)
	ErrInvalid = errors.New("agro: invalid operation")

	// ErrOutOfSpace is returned when the block storage is out of space.
	ErrOutOfSpace = errors.New("agro: out of space on block store")

	// ErrExists is returned if the entity already exists
	ErrExists = errors.New("agro: already exists")

	// ErrNotExist is returned if the entity doesn't already exist
	ErrNotExist = errors.New("agro: doesn't exist")

	// ErrAgain is returned if the operation was interrupted. The call was valid, and
	// may be tried again.
	ErrAgain = errors.New("agro: interrupted, try again")

	// ErrNoGlobalMetadata is returned if the metadata service hasn't been formatted.
	ErrNoGlobalMetadata = errors.New("agro: no global metadata available at mds")

	// ErrNonSequentialRing is returned if the ring's internal version number appears to jump.
	ErrNonSequentialRing = errors.New("agro: non-sequential ring")

	// ErrNoPeer is returned if the peer can't be found.
	ErrNoPeer = errors.New("agro: no such peer")

	// ErrCompareFailed is returned if the CAS operation failed to compare.
	ErrCompareFailed = errors.New("agro: compare failed")

	// ErrIsSymlink is returned if we're trying to modify a symlink incorrectly.
	ErrIsSymlink = errors.New("agro: is symlink")

	// ErrNotDir is returned if we're trying a directory operation on a non-directory path.
	ErrNotDir = errors.New("agro: not a directory")

	// ErrWrongVolumeType is returned if the operation cannot be performed on this type of volume.
	ErrWrongVolumeType = errors.New("agro: wrong volume type")

	// ErrNotSupported is returned if the interface doesn't implement the
	// requested subfuntionality.
	ErrNotSupported = errors.New("agro: not supported")

	// ErrLocked is returned if the resource is locked.
	ErrLocked = errors.New("agro: locked")
)

Functions

func MarshalBlocksetToProto

func MarshalBlocksetToProto(bs Blockset) ([]*models.BlockLayer, error)

func Mkfs

func Mkfs(name string, cfg Config, gmd GlobalMetadata, ringType RingType) error

Mkfs calls the specific Mkfs function provided by a metadata package.

func RegisterBlockStore

func RegisterBlockStore(name string, newFunc NewBlockStoreFunc)

func RegisterMetadataService

func RegisterMetadataService(name string, newFunc CreateMetadataServiceFunc)

RegisterMetadataService is the hook used for implementions of MetadataServices to register themselves to the system. This is usually called in the init() of the package that implements the MetadataService. A similar pattern is used in database/sql of the standard library.

func RegisterMkfs

func RegisterMkfs(name string, newFunc MkfsFunc)

RegisterMkfs is the hook used for implementions of MetadataServices to register their ways of creating base metadata to the system.

func RegisterSetRing

func RegisterSetRing(name string, newFunc SetRingFunc)

RegisterSetRing is the hook used for implementions of MetadataServices to register their ways of creating base metadata to the system.

func SetRing

func SetRing(name string, cfg Config, r Ring) error

SetRing calls the specific SetRing function provided by a metadata package.

Types

type BlockIterator

type BlockIterator interface {
	Err() error
	Next() bool
	BlockRef() BlockRef
	Close() error
}

type BlockLayer

type BlockLayer struct {
	Kind    BlockLayerKind
	Options string
}

type BlockLayerKind

type BlockLayerKind int

type BlockLayerSpec

type BlockLayerSpec []BlockLayer

type BlockRef

type BlockRef struct {
	INodeRef
	Index IndexID
}

BlockRef is the identifier for a unique block in the filesystem.

func BlockFromProto

func BlockFromProto(p *models.BlockRef) BlockRef

func BlockRefFromBytes

func BlockRefFromBytes(b []byte) BlockRef

func ZeroBlock

func ZeroBlock() BlockRef

func (BlockRef) BlockType

func (b BlockRef) BlockType() BlockType

func (BlockRef) HasINode

func (b BlockRef) HasINode(i INodeRef, t BlockType) bool

func (BlockRef) IsZero

func (b BlockRef) IsZero() bool

func (*BlockRef) SetBlockType

func (b *BlockRef) SetBlockType(t BlockType)

func (BlockRef) String

func (b BlockRef) String() string

func (BlockRef) ToBytes

func (b BlockRef) ToBytes() []byte

func (BlockRef) ToProto

func (b BlockRef) ToProto() *models.BlockRef

type BlockStore

type BlockStore interface {
	Store
	HasBlock(ctx context.Context, b BlockRef) (bool, error)
	GetBlock(ctx context.Context, b BlockRef) ([]byte, error)
	WriteBlock(ctx context.Context, b BlockRef, data []byte) error
	DeleteBlock(ctx context.Context, b BlockRef) error
	NumBlocks() uint64
	UsedBlocks() uint64
	BlockIterator() BlockIterator
	BlockSize() uint64
}

BlockStore is the interface representing the standardized methods to interact with something storing blocks.

func CreateBlockStore

func CreateBlockStore(kind string, name string, cfg Config, gmd GlobalMetadata) (BlockStore, error)

type BlockType

type BlockType uint16
const (
	TypeBlock BlockType = iota
	TypeINode
)

type Blockset

type Blockset interface {
	Length() int
	Kind() uint32
	GetBlock(ctx context.Context, i int) ([]byte, error)
	PutBlock(ctx context.Context, inode INodeRef, i int, b []byte) error
	GetLiveINodes() *roaring.Bitmap
	GetAllBlockRefs() []BlockRef

	Marshal() ([]byte, error)
	Unmarshal(data []byte) error
	GetSubBlockset() Blockset
	Truncate(lastIndex int, blocksize uint64) error
	Trim(from, to int) error
}

Blockset is the interface representing the standardized methods to interact with a set of blocks.

type Config

type Config struct {
	DataDir         string
	StorageSize     uint64
	MetadataAddress string
	ReadCacheSize   uint64
	ReadLevel       ReadLevel
	WriteLevel      WriteLevel
}

type CreateMetadataServiceFunc

type CreateMetadataServiceFunc func(cfg Config) (MetadataService, error)

CreateMetadataServiceFunc is the signature of a constructor used to create a registered MetadataService.

type DebugMetadataService

type DebugMetadataService interface {
	DumpMetadata(io.Writer) error
}

type File

type File struct {
	// contains filtered or unexported fields
}

func (*File) Close

func (f *File) Close() error

func (*File) Read

func (f *File) Read(b []byte) (n int, err error)

func (*File) ReadAt

func (f *File) ReadAt(b []byte, off int64) (n int, ferr error)

func (*File) Replaces

func (f *File) Replaces() uint64

func (*File) Seek

func (f *File) Seek(offset int64, whence int) (int64, error)

func (*File) Size

func (f *File) Size() uint64

func (*File) SyncAllWrites

func (f *File) SyncAllWrites() (INodeRef, error)

func (*File) Trim

func (f *File) Trim(offset, length int64) error

Trim zeroes data in the middle of a file.

func (*File) Truncate

func (f *File) Truncate(size int64) error

func (*File) Write

func (f *File) Write(b []byte) (n int, err error)

func (*File) WriteAt

func (f *File) WriteAt(b []byte, off int64) (n int, err error)

func (*File) WriteOpen

func (f *File) WriteOpen() bool

type GlobalMetadata

type GlobalMetadata struct {
	BlockSize        uint64
	DefaultBlockSpec BlockLayerSpec
	INodeReplication int
}

type INodeID

type INodeID uint64

INodeID represents a unique identifier for an INode.

type INodeIterator

type INodeIterator struct {
	// contains filtered or unexported fields
}

func (*INodeIterator) Close

func (i *INodeIterator) Close() error

func (*INodeIterator) Err

func (i *INodeIterator) Err() error

func (*INodeIterator) INodeRef

func (i *INodeIterator) INodeRef() INodeRef

func (*INodeIterator) Next

func (i *INodeIterator) Next() bool

type INodeRef

type INodeRef struct {
	INode INodeID
	// contains filtered or unexported fields
}

INodeRef is a reference to a unique INode in the filesystem.

func INodeFromProto

func INodeFromProto(p *models.INodeRef) INodeRef

func INodeRefFromBytes

func INodeRefFromBytes(b []byte) INodeRef

func NewINodeRef

func NewINodeRef(vol VolumeID, i INodeID) INodeRef

func ZeroINode

func ZeroINode() INodeRef

func (INodeRef) String

func (i INodeRef) String() string

func (INodeRef) ToBytes

func (i INodeRef) ToBytes() []byte

func (INodeRef) ToProto

func (i INodeRef) ToProto() *models.INodeRef

func (INodeRef) Volume

func (i INodeRef) Volume() VolumeID

type INodeStore

type INodeStore struct {
	// contains filtered or unexported fields
}

func NewINodeStore

func NewINodeStore(bs BlockStore) *INodeStore

func (*INodeStore) Close

func (b *INodeStore) Close() error

func (*INodeStore) DeleteINode

func (b *INodeStore) DeleteINode(ctx context.Context, i INodeRef) error

func (*INodeStore) Flush

func (b *INodeStore) Flush() error

func (*INodeStore) GetINode

func (b *INodeStore) GetINode(ctx context.Context, i INodeRef) (*models.INode, error)

func (*INodeStore) INodeIterator

func (b *INodeStore) INodeIterator() *INodeIterator

func (*INodeStore) WriteINode

func (b *INodeStore) WriteINode(ctx context.Context, i INodeRef, inode *models.INode) error

type IndexID

type IndexID uint64

IndexID represents a unique identifier for an Index.

type MetadataKind

type MetadataKind int
const (
	EtcdMetadata MetadataKind = iota
	TempMetadata
)

type MetadataService

type MetadataService interface {
	GetVolumes() ([]*models.Volume, error)
	GetVolume(volume string) (*models.Volume, error)
	NewVolumeID() (VolumeID, error)
	Kind() MetadataKind

	GlobalMetadata() (GlobalMetadata, error)

	// Returns a UUID based on the underlying datadir. Should be
	// unique for every created datadir.
	UUID() string

	GetRing() (Ring, error)
	SubscribeNewRings(chan Ring)
	UnsubscribeNewRings(chan Ring)
	SetRing(ring Ring) error

	WithContext(ctx context.Context) MetadataService

	GetLease() (int64, error)
	RegisterPeer(lease int64, pi *models.PeerInfo) error
	GetPeers() (PeerInfoList, error)

	Close() error

	CommitINodeIndex(VolumeID) (INodeID, error)
	GetINodeIndex(VolumeID) (INodeID, error)
}

MetadataService is the interface representing the basic ways to manipulate consistently stored fileystem metadata.

func CreateMetadataService

func CreateMetadataService(name string, cfg Config) (MetadataService, error)

CreateMetadataService calls the constructor of the specified MetadataService with the provided address.

type MkfsFunc

type MkfsFunc func(cfg Config, gmd GlobalMetadata, ringType RingType) error

MkfsFunc is the signature of a function which preformats a metadata service.

type ModifyableRing

type ModifyableRing interface {
	ChangeReplication(r int)
}

type NewBlockStoreFunc

type NewBlockStoreFunc func(string, Config, GlobalMetadata) (BlockStore, error)

type PeerInfoList

type PeerInfoList []*models.PeerInfo

func (PeerInfoList) AndNot

func (pi PeerInfoList) AndNot(b PeerList) PeerInfoList

func (PeerInfoList) GetWeights

func (pi PeerInfoList) GetWeights() map[string]int

func (PeerInfoList) HasUUID

func (pi PeerInfoList) HasUUID(uuid string) bool

func (PeerInfoList) Intersect

func (pi PeerInfoList) Intersect(b PeerInfoList) PeerInfoList

func (PeerInfoList) PeerList

func (pi PeerInfoList) PeerList() PeerList

func (PeerInfoList) UUIDAt

func (pi PeerInfoList) UUIDAt(uuid string) int

func (PeerInfoList) Union

type PeerList

type PeerList []string

func (PeerList) AndNot

func (pl PeerList) AndNot(b PeerList) PeerList

func (PeerList) Has

func (pl PeerList) Has(uuid string) bool

func (PeerList) IndexAt

func (pl PeerList) IndexAt(uuid string) int

func (PeerList) Intersect

func (pl PeerList) Intersect(b PeerList) PeerList

func (PeerList) Union

func (pl PeerList) Union(b PeerList) PeerList

type PeerPermutation

type PeerPermutation struct {
	Replication int
	Peers       PeerList
}

type ReadLevel

type ReadLevel int
const (
	ReadBlock ReadLevel = iota
	ReadSequential
	ReadSpread
)

type Ring

type Ring interface {
	GetPeers(key BlockRef) (PeerPermutation, error)
	Members() PeerList

	Describe() string
	Type() RingType
	Version() int

	Marshal() ([]byte, error)
}

type RingAdder

type RingAdder interface {
	ModifyableRing
	AddPeers(PeerInfoList, ...RingModification) (Ring, error)
}

type RingModification

type RingModification interface {
	ModifyRing(ModifyableRing)
}

type RingRemover

type RingRemover interface {
	ModifyableRing
	RemovePeers(PeerList, ...RingModification) (Ring, error)
}

type RingType

type RingType int

type Server

type Server struct {
	Blocks BlockStore
	MDS    MetadataService
	INodes *INodeStore

	Cfg Config

	ReplicationOpen bool
	// contains filtered or unexported fields
}

Server is the type representing the generic distributed block store.

func NewMemoryServer

func NewMemoryServer() *Server

func NewServer

func NewServer(cfg Config, metadataServiceKind, blockStoreKind string) (*Server, error)

func NewServerByImpl

func NewServerByImpl(cfg Config, mds MetadataService, blocks BlockStore) (*Server, error)

func (*Server) AddTimeoutCallback

func (s *Server) AddTimeoutCallback(f func(uuid string))

func (*Server) BeginHeartbeat

func (s *Server) BeginHeartbeat(addr string, listen bool) error

BeginHeartbeat spawns a goroutine for heartbeats. Non-blocking.

func (*Server) Close

func (s *Server) Close() error

func (*Server) CreateFile

func (s *Server) CreateFile(volume *models.Volume, inode *models.INode, blocks Blockset) (*File, error)

func (*Server) Debug

func (s *Server) Debug(w io.Writer) error

Debug writes a bunch of debug output to the io.Writer.

func (*Server) ExtendContext

func (s *Server) ExtendContext(ctx context.Context) context.Context

func (*Server) GetPeerMap

func (s *Server) GetPeerMap() map[string]*models.PeerInfo

func (*Server) Lease

func (s *Server) Lease() int64

func (*Server) UpdatePeerMap

func (s *Server) UpdatePeerMap() map[string]*models.PeerInfo

func (*Server) UpdateRebalanceInfo

func (s *Server) UpdateRebalanceInfo(ri *models.RebalanceInfo)

type SetRingFunc

type SetRingFunc func(cfg Config, r Ring) error

type Store

type Store interface {
	Kind() string
	Flush() error
	Close() error
}

Store is the interface that represents methods that should be common across all types of storage providers.

type VolumeID

type VolumeID uint64

VolumeID represents a unique identifier for a Volume.

func (VolumeID) ToBytes

func (v VolumeID) ToBytes() []byte

type WriteLevel

type WriteLevel int
const (
	WriteAll WriteLevel = iota
	WriteOne
	WriteLocal
)

Directories

Path Synopsis
aoe
cmd
agro command
agroctl command
agromount command
ringtool command
internal
etcdproto/etcdserverpb
Package etcdserverpb is a generated protocol buffer package.
Package etcdserverpb is a generated protocol buffer package.
etcdproto/storagepb
Package storagepb is a generated protocol buffer package.
Package storagepb is a generated protocol buffer package.
nbd
Package nbd uses the Linux NBD layer to emulate a block device in user space
Package nbd uses the Linux NBD layer to emulate a block device in user space
Package models is a generated protocol buffer package.
Package models is a generated protocol buffer package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL