graphsplit

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 28, 2021 License: MIT Imports: 32 Imported by: 4

README

Go-graphsplit

A tool for splitting large dataset into graph slices fit for making deal in the Filecoin Network.

When storing a large dataset in the Filecoin Network, we have to split it into smaller pieces to fit for the size of sector, which could be 32GiB or 64GiB.

At first, we made the dataset into a large tar ball, did chunking this tar ball into small pieces, and then make deals with storage miners with these pieces. We did this way for a while until we realized that it brought a difficulty for data retrieval. Even if we only needed to retrieve a small file in the dataset, we had to retrieve all the pieces of the tar ball at first.

Graphsplit has solved the problem we faced above. It takes advantage of IPLD concepts, following the unixfs format datastructures. It regards the dataset or it's sub-directory as a big graph and then cut it into small graphs. Each small graph will keep its file system structure as possible as its used be. After that, we only need to organize these small graphs into a car file according to unixfs.

Build

git clone https://github.com/filedrive-team/go-graphsplit.git
cd go-graphsplit
make

Usage

Splitting dataset:

# car-dir: folder for splitted smaller pieces, in form of .car
# slice-size: size for each pieces
# parallel: number goroutines run when building ipld nodes
# graph-name: it will use graph-name for prefix of smaller pieces
# parent-path: usually just be the same as /path/to/dataset, it's just a method to figure out relative path when building IPLD graph
./graphsplit chunk \
--car-dir=path/to/car-dir \
--slice-size=17179869184 \
--parallel=2 \
--graph-name=gs-test \
--parent-path=/path/to/dataset \
/path/to/dataset

Notes: A manifest.csv will created to save the mapping with graph slice name and the payload cid. As following:

cat /path/to/car-dir/manifest.csv
payload_cid,filename
Qm...,graph-slice-name.car

Import car file to IPFS:

ipfs dag import /path/to/car-dir/car-file

Restore files:

# car-path: directory or file, in form of .car
# output-dir: usually just be the same as /path/to/output-dir
# parallel: number goroutines run when restoring
./graphsplit restore \
--car-path=/path/to/car-path \
--output-dir=/path/to/output-dir \
--parallel=2

Contribute

PRs are welcome!

License

MIT

Documentation

Index

Constants

View Source
const UnixfsChunkSize uint64 = 1 << 20
View Source
const UnixfsLinksPerLevel = 1 << 10

Variables

This section is empty.

Functions

func BuildFileNode

func BuildFileNode(item Finfo, bufDs ipld.DAGService, cidBuilder cid.Builder) (node ipld.Node, err error)

func BuildIpldGraph

func BuildIpldGraph(ctx context.Context, fileList []Finfo, graphName, parentPath, carDir string, parallel int, cb GraphBuildCallback)

func CarTo added in v0.2.0

func CarTo(carPath, outputDir string, parallel int)

func Chunk added in v0.3.0

func Chunk(ctx context.Context, sliceSize int64, parentPath, targetPath, carDir, graphName string, parallel int, cb GraphBuildCallback) error

func ExistDir added in v0.2.0

func ExistDir(path string) bool

func GenGraphName

func GenGraphName(graphName string, sliceCount, sliceTotal int) string

func GetFileList

func GetFileList(args []string) (fileList []string, err error)

func GetFileListAsync

func GetFileListAsync(args []string) chan Finfo

func GetGraphCount

func GetGraphCount(args []string, sliceSize int64) int

func Import added in v0.2.0

func Import(path string, st car.Store) (cid.Cid, error)

func Merge added in v0.2.0

func Merge(dir string, parallel int)

func NodeWriteTo added in v0.2.0

func NodeWriteTo(nd files.Node, fpath string) error

Types

type Finfo

type Finfo struct {
	Path      string
	Name      string
	Info      os.FileInfo
	SeekStart int64
	SeekEnd   int64
}

type GraphBuildCallback added in v0.3.0

type GraphBuildCallback interface {
	OnSuccess(node ipld.Node, graphName string)
	OnError(error)
}

func CSVCallback added in v0.3.0

func CSVCallback(carDir string) GraphBuildCallback

func ErrCallback added in v0.3.0

func ErrCallback() GraphBuildCallback

Directories

Path Synopsis
cmd
graphsplit command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL