lakefs

module
v0.8.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 3, 2020 License: Apache-2.0

README

Go Node

What is lakeFS

lakeFS is an open source layer that delivers resilience and manageability to object-storage based data lakes.

With lakeFS you can build repeatable, atomic and versioned data lake operations - from complex ETL jobs to data science and analytics.

lakeFS is API compatible with AWS S3 and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc.

For more information see the Official Documentation.

Capabilities

Atomic Operations - lakeFS allows data producers to manipulate multiple objects as a single, atomic operation. If something fails half-way, all changes can be instantly rolled back.

Consistency - lakeFS enables object-level and cross-collection consistency:

  • object-level consistency ensures all operations within a branch are strongly consistent (read-after-write, list-after-write, read-after-delete, etc).
  • cross-collection consistency is achieved by providing snapshot isolation. Using branches, writers can provide consistency guarantees across different logical collections - merging to “main” is only done after several datasets have been created successfully.

History - Commits are retained for a configurable duration, so readers can query data from the latest commit, or from any other point in time. Writers can atomically and safely rollback changes to previous versions.

Cross-Lake Isolation - Creating a lakeFS branch provides you with a snapshot of the entire lake at a given point in time. All reads from that branch are guaranteed to always return the same results.

Data CI/CD - The ability to define automated rules and tests that are required to pass before committing or merging changes to data.

Getting Started

Docker
  1. Ensure you have Docker installed on your computer. The MacOS and Windows installations include Docker Compose by default.

  2. Create a docker-compose.yaml file, containing the following configuration:

    ---
    version: '3'
    services:
      lakefs:
        image: "treeverse/lakefs:latest"
        ports: ["8000:8000"]
        links: ["postgres"]
        environment:
          LAKEFS_AUTH_ENCRYPT_SECRET_KEY: "some random secret string"
          LAKEFS_DATABASE_CONNECTION_STRING: postgres://lakefs:lakefs@postgres/postgres?sslmode=disable
          LAKEFS_BLOCKSTORE_TYPE: local
          LAKEFS_BLOCKSTORE_LOCAL_PATH: /home/lakefs
        entrypoint: ["/app/wait-for", "postgres:5432", "--", "/app/lakefs", "run"]
      postgres:
        image: "postgres:11"
        environment:
          POSTGRES_USER: lakefs
          POSTGRES_PASSWORD: lakefs
    
  3. From the directory that contains our new docker-compose.yaml file, run the following command:

    $ docker-compose up
    
  4. Open http://localhost:8000/setup in your web browser to set up an initial admin user, used to login and send API requests.

Download the Binary

Alternatively, you can download the lakeFS binaries and run them directly.

Binaries are available at https://github.com/treeverse/lakeFS/releases.

Setting up a repository

Please follow the Guide to Get Started to set up your local lakeFS installation.

For more detailed inforamation on how to setup lakeFS, please visit the documentation

Community

Keep up to date and get lakeFS support via:

More information

Licensing

lakeFS is completely free and open source and licensed under the Apache 2.0 License.

Directories

Path Synopsis
mem
s3
cmd
lakectl command
lakefs command
lakefs-loadtest command
sig
Package cmd This file implements helper functions to validate Streaming AWS Signature Version '4' authorization header.
Package cmd This file implements helper functions to validate Streaming AWS Signature Version '4' authorization header.
modules
api/factory module
auth/factory module
block/factory module
cache module
webui module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL