dataset

package module
v2.3.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 11, 2025 License: BSD-3-Clause Imports: 37 Imported by: 2

README

Dataset Project

DOI

The Dataset Project provides tools for working with collections of JSON documents. It uses a simple key and object pair to organize JSON documents into a collection. It supports SQL querying of the objects stored in a collection.

It is suitable for temporary storage of JSON objects in data processing pipelines as well as a persistent storage mechanism for collections of JSON objects.

The Dataset Project provides a command line program and a web service for working with JSON objects as a collection or individual objects. As such it is well suited for data science projects as well as building web applications that work with metadata.

dataset, a command line tool

dataset is a command line tool for working with collections of JSON documents. Collections can be stored on the file system in a pairtree or stored in a SQL database that supports JSON columns like SQLite3, PostgreSQL or MySQL.

The dataset command line tool supports common data management operations as

  • initialization of a collection
  • dump and load JSON lines files into collection
  • CRUD operations on a collection
  • Query a collection using SQL

See Getting started with dataset for a tour and tutorial.

datasetd is dataset implemented as a web service

datasetd is a JSON REST web service and static file host. It provides a JSON API supporting the main operations found in the dataset command line program. This allows dataset collections to be integrated safely into web applications or be used concurrently by multiple processes.

The Dataset Web Service can host multiple collections each with their own custom query API defined in a simple YAML configuration file.

Design choices

dataset and datasetd are intended to be simple tools for managing collections JSON object documents in a predictable structured way. The dataset web service allows multi process or multi user access to a dataset collection via HTTP.

dataset is guided by the idea that you should be able to work with JSON documents as easily as you can any plain text document on the Unix command line. dataset is intended to be simple to use with minimal setup (e.g. dataset init mycollection.ds creates a new collection called 'mycollection.ds').

  • dataset and datasetd store JSON object documents in collections
    • Storage of the JSON documents may be either in a pairtree on disk or in a SQL database using JSON columns (e.g. SQLite3 or MySQL 8)
    • dataset collections are made up of a directory containing a collection.json and codemeta.json files.
    • collection.json metadata file describing the collection, e.g. storage type, name, description, if versioning is enabled
    • codemeta.json is a codemeta file describing the nature of the collection, e.g. authors, description, funding
    • collection objects are accessed by their key, a unique identifier, made up of lower case alpha numeric characters
    • collection names are usually lowered case and usually have a .ds extension for easy identification

dataset collection storage options

  • SQL store stores JSON documents in a JSON column
    • SQLite3 (default), PostgreSQL >= 12 and MySQL 8 are the current SQL databases support
    • A "DSN URI" is used to identify and gain access to the SQL database
    • The DSN URI maybe passed through the environment
  • pairtree (depricated, will be removed in v3)
    • the pairtree path is always lowercase
    • non-JSON attachments can be associated with a JSON document and found in a directories organized by semver (semantic version number)
    • versioned JSON documents are created along side the current JSON document but are named using both their key and semver

datasetd is a web service

  • it is intended as a back end web service run on localhost
    • it runs on localhost and a designated port (port 8485 is the default)
    • supports multiple collections each can have their own configuration for global object permissions and supported SQL queries

The choice of plain UTF-8 is intended to help future proof reading dataset collections. Care has been taken to keep dataset simple enough and light weight enough that it will run on a machine as small as a Raspberry Pi Zero while being equally comfortable on a more resource rich server or desktop environment. dataset can be re-implement in any programming language supporting file input and output, common string operations and along with JSON encoding and decoding functions. The current implementation is in the Go language.

Features

dataset supports

datasetd supports

Both dataset and datasetd maybe useful for general data science applications needing JSON object management or in implementing repository systems in research libraries and archives.

Limitations of dataset and datasetd

dataset has many limitations, some are listed below

  • the pairtree implementation it is not a multi-process, multi-user data store
  • it is not a general purpose database system
  • it stores all keys in lower case in order to deal with file systems
  • it stores collection names as lower case to deal with file systems that are not case sensitive
  • it should NOT be used for sensitive, confidential or secret information because it lacks access controls and data encryption

datasetd is a simple web service intended to run on "localhost:8485".

  • it does not include support for authentication
  • it does not support access control for users or roles
  • it does not encrypt the data it stores
  • it does not support HTTPS
  • it does not provide auto key generation
  • it limits the size of JSON documents stored to the size supported by with host SQL JSON columns
  • it limits the size of attached files to less than 250 MiB
  • it does not support partial JSON record updates or retrieval
  • it does not provide an interactive Web UI for working with dataset collections
  • it should NOT be used for sensitive, confidential or secret information because it lacks access controls and data encryption

Authors and history

  • R. S. Doiel
  • Tommy Morrell

Releases

Compiled versions are provided for Linux (x86, aarch64), Mac OS X (x86 and M1), Windows 11 (x86, aarch64) and Raspberry Pi OS.

github.com/caltechlibrary/dataset/releases

You can use dataset from Python via the py_dataset package.

You can use dataset from Deno+TypeScript by running datasetd and access it with ts_dataset.

Documentation

Overview

api is a part of dataset

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Package dataset includes the operations needed for processing collections of JSON documents and their attachments.

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Package dataset includes the operations needed for processing collections of JSON documents and their attachments.

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

cli is part of dataset

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

This is part of the dataset package.

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrell, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Package dataset includes the operations needed for processing collections of JSON documents and their attachments.

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

*

  • compatibility.go provides some wrapping methods for backward complatible
  • with v1 of dataset. These are likely to go away at some point.

config is a part of dataset

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Package dataset includes the operations needed for processing collections of JSON documents and their attachments.

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Dataset Project ===============

The Dataset Project provides tools for working with collections of JSON Object documents stored on the local file system or via a dataset web service. Two tools are provided, a command line interface (dataset) and a web service (datasetd).

dataset command line tool -------------------------

_dataset_ is a command line tool for working with collections of JSON objects. Collections are stored on the file system in a pairtree directory structure or can be accessed via dataset's web service. For collections storing data in a pairtree JSON objects are stored in collections as plain UTF-8 text files. This means the objects can be accessed with common Unix text processing tools as well as most programming languages.

The _dataset_ command line tool supports common data management operations such as initialization of collections; document creation, reading, updating and deleting; listing keys of JSON objects in the collection; and associating non-JSON documents (attachments) with specific JSON documents in the collection.

datasetd, dataset as a web service ----------------------------------

_datasetd_ is a web service implementation of the _dataset_ command line program. It features a sub-set of capability found in the command line tool. This allows dataset collections to be integrated safely into web applications or used concurrently by multiple processes. It achieves this by storing the dataset collection in a SQL database using JSON columns.

Design choices --------------

_dataset_ and _datasetd_ are intended to be simple tools for managing collections JSON object documents in a predictable structured way.

_dataset_ is guided by the idea that you should be able to work with JSON documents as easily as you can any plain text document on the Unix command line. _dataset_ is intended to be simple to use with minimal setup (e.g. `dataset init mycollection.ds` creates a new collection called 'mycollection.ds').

  • _dataset_ and _datasetd_ store JSON object documents in collections. The storage of the JSON documents differs.
  • dataset collections are defined in a directory containing a collection.json file
  • collection.json metadata file describing the collection, e.g. storage type, name, description, if versioning is enabled
  • collection objects are accessed by their key which is case insensitive
  • collection names lowered case and usually have a `.ds` extension for easy identification the directory must be lower case folder contain

_datatset_ stores JSON object documents in a SQL data store or pairtree

  • the pairtree path is always lowercase
  • a pairtree of JSON object documents
  • non-JSON attachments can be associated with a JSON document and found in a directories organized by semver (semantic version number)
  • versioned JSON documents are created sub directory incorporating a semver

_datasetd_ stores JSON object documents in a table named for the collection

  • objects are versioned into a collection history table by semver and key
  • attachments are not supported
  • can be exported to a collection using pairtree storage (e.g. a zip file will be generated holding a pairtree representation of the collection)

The choice of plain UTF-8 is intended to help future proof reading dataset collections. Care has been taken to keep _dataset_ simple enough and light weight enough that it will run on a machine as small as a Raspberry Pi Zero while being equally comfortable on a more resource rich server or desktop environment. _dataset_ can be re-implement in any programming language supporting file input and output, common string operations and along with JSON encoding and decoding functions. The current implementation is in the Go language.

Features --------

_dataset_ supports - Initialize a new dataset collection

  • Define metadata about the collection using a codemeta.json file
  • Define a keys file holding a list of allocated keys in the collection
  • Creates a pairtree for object storage

- Listing _keys_ in a collection - Object level actions

  • create
  • read
  • update
  • delete
  • Documents as attachments
  • attachments (list)
  • attach (create/update)
  • retrieve (read)
  • prune (delete)
  • The ability to create data frames from while collections or based on keys lists

_datasetd_ supports

- List collections available from the web service - List or update a collection's metadata - List a collection's keys - Object level actions

  • create
  • read
  • update
  • delete
  • Documents as attachments
  • attachments (list)
  • attach (create/update)
  • retrieve (read)
  • prune (delete)
  • A means of importing to or exporting from pairtree based dataset collections
  • The ability to create data frames from while collections or based on keys lists

Both _dataset_ and _datasetd_ maybe useful for general data science applications needing JSON object management or in implementing repository systems in research libraries and archives.

Limitations of _dataset_ and _datasetd_ -------------------------------------------

_dataset_ has many limitations, some are listed below

  • the pairtree implementation it is not a multi-process, multi-user data store
  • it is not a general purpose database system
  • it stores all keys in lower case in order to deal with file systems that are not case sensitive, compatibility needed by pairtrees
  • it stores collection names as lower case to deal with file systems that are not case sensitive
  • it does not have a built-in query language, search or sorting
  • it should NOT be used for sensitive or secret information

_datasetd_ is a simple web service intended to run on "localhost:8485".

  • it is a RESTful service
  • it does not include support for authentication
  • it does not support a query language, search or sorting
  • it does not support access control by users or roles
  • it does not provide auto key generation
  • it limits the size of JSON documents stored to the size supported by with host SQL JSON columns
  • it limits the size of attached files to less than 250 MiB
  • it does not support partial JSON record updates or retrieval
  • it does not provide an interactive Web UI for working with dataset collections
  • it does not support HTTPS or "at rest" encryption
  • it should NOT be used for sensitive or secret information

Authors and history -------------------

- R. S. Doiel - Tommy Morrell

*

  • helptext.go holds the common help docuemntation shared between the dataeset and datasetd commands.

Package dataset includes the operations needed for processing collections of JSON documents and their attachments.

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

ptstore is a part of the dataset

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Package dataset includes the operations needed for processing collections of JSON documents and their attachments.

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

sqlstore is a part of dataset

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Package dataset includes the operations needed for processing collections of JSON documents and their attachments.

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

table.go provides some utility functions to move string one and two dimensional slices into/out of one and two dimensional slices.

texts is part of dataset

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Package dataset includes the operations needed for processing collections of JSON documents and their attachments.

Authors R. S. Doiel, <rsdoiel@library.caltech.edu> and Tom Morrel, <tmorrell@library.caltech.edu>

Copyright (c) 2022, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Index

Constants

View Source
const (

	// PTSTORE describes the storage type using a pairtree
	PTSTORE = "pairtree"

	// SQLSTORE describes the SQL storage type
	SQLSTORE = "sqlstore"
)
View Source
const (
	DatasetHelpText = `%{app_name}(1) user manual | version {version} {release_hash}
% R. S. Doiel and Tom Morrell
% {release_date}

# NAME

{app_name} 

# SYNOPSIS

{app_name} [GLOBAL_OPTIONS] VERB [OPTIONS] COLLECTION_NAME [PRAMETER ...]

# DESCRIPTION

{app_name} command line interface supports creating JSON object
collections and managing the JSON object documents in a collection.

When creating new documents in the collection or updating documents
in the collection the JSON source can be read from the command line,
a file or from standard input.

# SUPPORTED VERBS

help
: will give documentation of help on a verb, e.g. "help create"

init [STORAGE_TYPE]
: Initialize a new dataset collection

model
: provides an experimental interactive data model generator creating
the "model.yaml" file in the data set collection's root directory.

create
: creates a new JSON document in the collection

read
: retrieves the "current" version of a JSON document from 
  the collection writing it standard out

update
: updates a JSON document in the collection

delete
: removes all versions of a JSON document from the collection

keys
: returns a list of keys in the collection

codemeta:
: copies metadata a codemeta file and updates the 
  collections metadata

attach
: attaches a document to a JSON object record

attachments
: lists the attachments associated with a JSON object record

retrieve
: creates a copy local of an attachement in a JSON record

detach
: will copy out the attachment to a JSON document 
  into the current directory 

prune
: removes an attachment (including all versions) from a JSON record

set-versioning
: will set the versioning of a collection. The versioning
  value can be "", "none", "major", "minor", or "patch"

get-versioning
: will display the versioning setting for a collection

dump
: This will write out all dataset collection records in a JSONL document.
JSONL shows on JSON object per line, see https://jsonlines.org for details.
The object rendered will have two attributes, "key" and "object". The
key corresponds to the dataset collection key and the object is the JSON
value retrieved from the collection.

load
: This will read JSON objects one per line from standard input. This
format is often called JSONL, see https://jsonlines.org. The object
has two attributes, key and object. 

join [OPTIONS] c_name, key, JSON_SRC
: This will join a new object provided on the command line with an
existing object in the collection.


A word about "keys". {app_name} uses the concept of key/values for
storing JSON documents where the key is a unique identifier and the
value is the object to be stored.  Keys must be lower case 
alpha numeric only.  Depending on storage engines there are issues
for keys with punctation or that rely on case sensitivity. E.g. 
The pairtree storage engine relies on the host file system. File
systems are notorious for being picky about non-alpha numeric
characters and some are not case sensistive.

A word about "GLOBAL_OPTIONS" in v2 of {app_name}.  Originally
all options came after the command name, now they tend to
come after the verb itself. This is because context counts
in trying to remember options (at least for the authors of
{app_name}).  There are three "GLOBAL_OPTIONS" that are exception
and they are ` + "`" + `-version` + "`" + `, ` + "`" + `-help` + "`" + `
and ` + "`" + `-license` + "`" + `. All other options come
after the verb and apply to the specific action the verb
implements.

# STORAGE TYPE

There are currently three support storage options for JSON documents in a dataset collection.

- SQLite3 database >= 3.40 (default)
- Postgres >= 12
- MySQL 8
- Pairtree (pre-2.1 default)

STORAGE TYPE are specified as a DSN URI except for pairtree which is just "pairtree".


# OPTIONS

-help
: display help

-license
: display license

-version
: display version

# EXAMPLES

~~~
   {app_name} help init

   {app_name} init my_objects.ds 

   {app_name} model my_objects.ds

   {app_name} help create

   {app_name} create my_objects.ds "123" '{"one": 1}'

   {app_name} create my_objects.ds "234" mydata.json 
   
   cat <<EOT | {app_name} create my_objects.ds "345"
   {
	   "four": 4,
	   "five": "six"
   }
   EOT

   {app_name} update my_objects.ds "123" '{"one": 1, "two": 2}'

   {app_name} delete my_objects.ds "345"

   {app_name} keys my_objects.ds
~~~

This is an example of initializing a Pairtree JSON documentation
collection using the environment.

~~~
{app_name} init '${C_NAME}' pairtree
~~~

In this case '${C_NAME}' is the name of your JSON document
read from the environment varaible C_NAME.

To specify Postgres as the storage for your JSON document collection.
You'd use something like --

~~~
{app_name} init '${C_NAME}' \\
  'postgres://${USER}@localhost/${DB_NAME}?sslmode=disable'
~~~


In this case '${C_NAME}' is the name of your JSON document
read from the environment varaible C_NAME. USER is used
for the Postgres username and DB_NAME is used for the Postgres
database name.  The sslmode option was specified because Postgres
in this example was restricted to localhost on a single user machine.


{app_name} {version}

`

	DatasetdHelpText = `%{app_name}(1) user manual | version {version} {release_hash}
% R. S. Doiel
% {release_date}

# NAME

{app_name}

# SYNOPSIS

{app_name} [OPTIONS] SETTINGS_FILE

# DESCRIPTION

{app_name} provides a web service for one or more dataset collections. Requires the
collections to exist (e.g. created previously with the dataset cli). It requires a
settings JSON or YAML file that decribes the web service configuration and
permissions per collection that are available via the web service.

# OPTIONS

-help
: display detailed help

-license
: display license

-version
: display version

-debug
: log debug information

# SETTINGS_FILE

The settings files provides {app_name} with the configuration
of the service web service and associated dataset collection(s).

It can be writen as either a JSON or YAML file. If it is a YAML file
you should use the ".yaml" extension so that {app_name} will correctly
parse the YAML.

The top level YAML attributes are

host
: (required) hostname a port for the web service to listen on, e.g. localhost:8485

htdocs
: (optional) if set static content will be serviced based on this path. This is a
good place to implement a browser side UI in HTML, CSS and JavaScript.

collections
: (required) A list of dataset collections that will be supported with this
web service. The dataset collections can be pairtrees or SQL stored. The
latter is preferred for web access to avoid problems of write collisions.

The collections object is a list of configuration objects. The configuration
attributes you should supply are as follows.

dataset
: (required) The path to the dataset collection you are providing a web API to.

query
: (optional) is map of query name to SQL statement. A POST is used to access
the query (i.e. a GET or POST To the path "` + "`" + `/api/<COLLECTION_NAME>/query/<QUERY_NAME>/<FIELD_NAMES>` + "`" + `")
The parameters submitted in the post are passed to the SQL statement.
NOTE: Only dataset collections using a SQL store are supported. The SQL
needs to conform the SQL dialect of the store being used (e.g. MySQL, Postgres,
SQLite3). The SQL statement functions with the same contraints of dsquery SQL
statements. The SQL statement is defined as a YAML text blog.

## API Permissions

The following are permissioning attributes for the collection. These are
global to the collection and by default are set to false. A read only API 
would normally only include "keys" and "read" attributes set to true.

keys
: (optional, default false) allow object keys to be listed

create
: (optional, default false) allow object creation through a POST to the web API

read
: (optional, default false) allow object to be read through a GET from the web API

update
: (optional, default false) allow object updates through a PUT to the web API.

delete
: (optional, default false) allow object deletion through a DELETE to the web API.

attachments
: (optional, default false) list object attachments through a GET to the web API.

attach
: (optional, default false) Allow adding attachments through a POST to the web API.

retrieve
: (optional, default false) Allow retrieving attachments through a GET to the web API.

prune
: (optional, default false) Allow removing attachments through a DELETE to the web API.

versions
: (optional, default false) Allow setting versioning of attachments via POST to the web API.


# EXAMPLES

Starting up the web service

~~~
   {app_name} settings.yaml
~~~

In this example we cover a short life cycle of a collection
called "t1.ds". We need to create a "settings.json" file and
an empty dataset collection. Once ready you can run the {app_name} 
service to interact with the collection via cURL. 

To create the dataset collection we use the "dataset" command and the
"vi" text edit (use can use your favorite text editor instead of vi).

~~~
    createdb t1
    dataset init t1.ds \
	   "postgres://$PGUSER:$PGPASSWORD@/t1?sslmode=disable"
	vi settings.yaml
~~~

You can create the "settings.yaml" with this Bash script.
I've created an htdocs directory to hold the static content
to interact with the dataset web service.

~~~
mkdir htdocs
cat <<EOT >settings.yaml
host: localhost:8485
htdocs: htdocs
collections:
  # Each collection is an object. The path prefix is
  # /api/<dataset_name>/...
  - dataset: t1.ds
    # What follows are object level permissions
	keys: true
    create: true
    read: true
	update: true
	delete: true
    # These are attachment related permissions
	attachments: true
	attach: true
	retrieve: true
	prune: true
    # This sets versioning behavior
	versions: true
EOT
~~~

Now we can run {app_name} and make the dataset collection available
via HTTP.

~~~
    {app_name} settings.yaml
~~~

You should now see the start up message and any log information display
to the console. You should open a new shell sessions and try the following.

We can now use cURL to post the document to the "api//t1.ds/object/one" end
point. 

~~~
    curl -X POST http://localhost:8485/api/t1.ds/object/one \
	    -d '{"one": 1}'
~~~

Now we can list the keys available in our collection.

~~~
    curl http://localhost:8485/api/t1.ds/keys
~~~

We should see "one" in the response. If so we can try reading it.

~~~
    curl http://localhost:8485/api/t1.ds/read/one
~~~

That should display our JSON document. Let's try updating (replacing)
it. 

~~~
    curl -X POST http://localhost:8485/api/t1.ds/object/one \
	    -d '{"one": 1, "two": 2}'
~~~

If you read it back you should see the updated record. Now lets try
deleting it.

~~~
	curl http://localhost:8485/api/t1.ds/object/one
~~~

List the keys and you should see that "one" is not longer there.

~~~
    curl http://localhost:8485/api/t1.ds/keys
~~~

You can run a query named 'browse' that is defined in the YAML configuration like this.

~~~
	curl http://localhost:8485/api/t1.ds/query/browse
~~~

or 

~~~
	curl -X POST -H 'Content-type:application/json' -d '{}' http://localhost:8485/api/t1.ds/query/browse
~~~

In the shell session where {app_name} is running press "ctr-C"
to terminate the service.


{app_name} {version}
`

	DatasetdApiText = `%{app_name}(1) user manual | version {version} {release_hash}
% R. S. Doiel
% {release_date}


# {app_name} REST API

{app_name} provides a RESTful JSON API for working with a dataset collection. This document describes the path expressions and to interact with the API.  Note some of the methods and paths require permissions as set in the {app_name} YAML or JSON [settings file]({app_name}_yaml.5.md).

## basic path expressions

There are three basic forms of the URL paths supported by the API.

- ` + "`" + `/api/<COLLECTION_NAME>/keys` + "`" + `, get a list of all keys in the the collection
- ` + "`" + `/api/<COLLECTION_NAME>/object/<OPTIONS>` + "`" + `, interact with an object in the collection (e.g. create, read, update, delete)
- ` + "`" + `/api/<COLLECTION_NAME>/query/<QUERY_NAME>/<FIELDS>` + "`" + `, query the collection and receive a list of objects in response

The "` + "`" + `<COLLECTION_NAME>` + "`" + `" would be the name of the dataset collection, e.g. "mydata.ds".

The "` + "`" + `<OPTIONS>` + "`" + `" holds any additional parameters related to the verb. Options are separated by the path delimiter (i.e. "/"). The options are optional. They do not require a trailing slash.

The "` + "`" + `<QUERY_NAME>` + "`" + `" is the query name defined in the YAML configuration for the specific collection.

The "` + "`" + `<FIELDS>` + "`" + `" holds the set of fields being passed into the query. These are delimited with the path separator like with options (i.e. "/"). Fields are optional and they do not require a trailing slash.

## HTTP Methods

The {app_name} REST API follows the rest practices. Good examples are POST creates, GET reads, PUT updates, and DELETE removes. It is important to remember that the HTTP method and path expression need to match form the actions you'd take using the command line version of dataset. For example to create a new object you'd use the object path without any options and a POST expression. You can do a read of an object using the GET method along withe object style path.

## Content Type and the API

The REST API works with JSON data. The service does not support multipart urlencoded content. You MUST use the content type of ` + "`" + `application/json` + "`" + ` when performing a POST, or PUT. This means if you are building a user interface for a collections {app_name} service you need to appropriately use JavaScript to send content into the API and set the content type to ` + "`" + `application/json` + "`" + `.

## Examples

Here's an example of a list, in YAML, of people in a collection called "people.ds". There are some fields for the name, sorted name, display name and orcid. The pid is the "key" used to store the objects in our collection.

~~~yaml
people:
  - pid: doe-jane
    family: Doe
    lived: Jane
    orcid: 9999-9999-9999-9999
~~~

In JSON this would look like

~~~json
{
  "people": [
    {
      "pid": "doe-jane",
      "family": "Doe",
      "lived": "Jane",
      "orcid": "9999-9999-9999-9999"
    }
  ]
}
~~~

### create

The create action is formed with the object URL path, the POST http method and the content type of "application/json". It POST data is expressed as a JSON object.

The object path includes the dataset key you'll assign in the collection. The key must be unique and not currently exist in the collection.

If we're adding an object with the key of "doe-jane" to our collection called "people.ds" then the object URL path would be  ` + "`" + `/api/people.ds/object/doe-jane` + "`" + `. NOTE: the object key is included as a single parameter after "object" path element.

Adding an object to our collection using curl looks like the following.

~~~shell
curl -X POST \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -d '{"pid": "doe-jane", "family": "Doe", "lived": "Jane", "orcid": "9999-9999-9999-9999" }' \
  http://localhost:8485/api/people.ds/object/doe-jane  
~~~

### read

The read action is formed with the object URL path, the GET http method and the content type of "application/json".  There is no data
aside from the URL to request the object. Here's what it would look like using curl to access the API.

~~~shell
curl http://localhost:8485/api/people.ds/object/doe-jane  
~~~

### update

Like create update is formed from the object URL path, content type of "application/json" the data is expressed as a JSON object.
Onlike create we use the PUT http method.

Here's how you would use curl to get the JSON expression of the object called "doe-jane" in your collection.

~~~shell
curl -X PUT \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -d '{"pid": "doe-jane", "family": "Doe", "lived": "Jane", "orcid": "9999-9999-9999-9999" }' \
  http://localhost:8485/api/people.ds/object/doe-jane  
~~~

This will overwrite the existing "doe-jane". NOTE the record must exist or you will get an error.

### delete

If you want to delete the "doe-jane" record in "people.ds" you perform an http DELETE method and form the url like a read.

~~~shell
curl -X DELETE http://localhost:8485/api/people.ds/object/doe-jane  
~~~

## query

The query path lets you run a predefined query from your settings YAML file. The http method used is a POST. This is becaue we need to send data inorder to receive a response. The resulting data is expressed as a JSON array of object. Like with create, read, update and delete you use the content type of "application/json".

In the settings file the queries are named. The query names are unique. One or many queries may be defined. The SQL expression associated with the name run as a prepared statement and parameters are mapped into based on the URL path provided. This allows you use many fields in forming your query.

Let's say we have a query called "full_name". It is defined to run the following SQL.

~~~sql
select src
from people
where src->>'family' like ?
  and src->>'lived' like ?
order by family, lived
~~~

NOTE: The SQL is has to retain the constraint of a single object per row, normally this will be "src" for dataset collections.

When you form a query path we need to indicate that the parameter for family and lived names need to get mapped to their respect positional references in the SQL. This is done as following url path. In this example "full_name" is the name of the query while "family" and "lived" are the values mapped into the parameters.

~~~
/api/people.ds/query/full_name/family/lived
~~~

The web form could look like this.  

~~~
<form id="query_name">
   <label for="family">Family</label> <input id="family" name="family" ><br/>
   <label for="lived">Lived</label> <input id="lived" name="lived" ><br/>
   <button type="submit">Search</button>
</form>
~~~

REMEMBER: the JSON API only supports the content type of "application/json" so you can use the browser's action and method in the form.

You would include JavaScript in the your HTML to pull the values out of the form and create a JSON object. If I searched
for someone who had the family name "Doe" and he lived name of "Jane" the object submitted to query might look like the following. 

~~~json
{
    "family": "Doe"
    "lived": "Jane"
}
~~~

The curl expression would look like the following simulating the form submission would look like the following.


~~~shell
curl -X POST \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -d '{"family": "Doe", "lived": "Jane" }' \
  http://localhost:8485/api/people.ds/query/full_name/family/lived
~~~


`

	DatasetdServiceText = `` /* 690-byte string literal not displayed */

	DatasetdYAMLText = `%{app_name}(5) user manual | version {version} {release_hash}
% R. S. Doiel and Tom Morrell
% {release_date}


# {app_name} YAML configuration

The dataset RESTful JSON API is configured using either a YAML or JSON file. YAML is preferred as it is more readable but JSON remains supported for backward compatibility. What follows is the description of the YAML configuration. Note option elements are optional and for booleans will default to false if missing.

## Top level

host
: (required) this is the hostname and port for the web service, e.g. localhost:8485

htdocs
: (optional) if this is a non-empty it will be used as the path to static resouce provided with the web service.
These are useful for prototyping user interfaces with HTML, CSS and JavaScript interacting the RESTful JSON API.


collections
: (required), a list of datasets to be manage via the web service.

Each collection object has the following properties. Notes if you are trying to provide a read-only API
then you will want to include permissions for keys, read and probably query (to provide a search feature).

dataset
: (required) this is a path to your dataset collection.

query
: (optional) Is a map of query name to SQL statements. Each name will trigger a the execution of a SQL statement.
The query expects a POST. Fields are mapped to the SQL statement parameters. If a pairtree store is used a
indexing will be needed before this will work as it would use the SQLite 3 database to execute the SQL statement against.
Otherwise the SQL statement would conform to the SQL dialect of the SQL storage used (e.g. Postgres, MySQL or SQLite3).
The SQL statements need to conform to the same constraints as dsquery's implementation of SQL statements.

## API Permissions

API permissions are global. They are controlled with the following attributes. If the attributes are set to true
then they enable that permission. If you want to create a read only API then set keys, read to true. Query
support can be added via the query parameter. These are indepent so if you didn't want to allow keys or full
objects to be retrieve you could just provide access via defined queries.

keys
: (optional, default false) If true allow keys for the collection to be retrieved with a GET to ` + "`" + `/api/<COLLECTION_NAME>/keys` + "`" + `

read
: (optional, default false) If true allow objects to be read via a GET to ` + "`" + `/api/<COLLLECTION_NAME>/object/<KEY>` + "`" + `

create
: (optional, default false) If true allow object to be created via a POST to ` + "`" + `/api/<COLLLECTION_NAME>/object` + "`" + `

update
: (optional, default false) If true allow object to be updated via a PUT  to ` + "`" + `/api/<COLLECTION_NAME>/object/<KEY>` + "`" + `

delete
: (optional, default false) If true allow obejct to be deleted via a DELETE to ` + "`" + `/api/<COLLECTION_NAME>/object/<KEY>` + "`" + `

attachments
: (optional, default false) list object attachments through a GET to the web API.

attach
: (optional, default false) Allow adding attachments through a POST to the web API.

retrieve
: (optional, default false) Allow retrieving attachments through a GET to the web API.

prune
: (optional, default false) Allow removing attachments through a DELETE to the web API.

versions
: (optional, default false) Allow setting versioning of attachments via POST to the web API.


`

	DSQueryHelpText = `%{app_name}(1) dataset user manual | version {version} {release_hash}
% R. S. Doiel and Tom Morrell
% {release_date}

# NAME

{app_name}

# SYNOPSIS

{app_name} [OPTIONS] C_NAME SQL_STATEMENT [PARAMS]

# DESCRIPTION

__{app_name}__ is a tool to support SQL queries of dataset collections. 
Pairtree based collections should be index before trying to query them
(see '-index' option below). Pairtree collections use the SQLite 3
dialect of SQL for querying.  For collections using a SQL storage
engine (e.g. SQLite3, Postgres and MySQL), the SQL dialect reflects
the SQL of the storage engine.

The schema is the same for all storage engines.  The scheme for the JSON
stored documents have a four column scheme.  The columns are "_key", 
"created", "updated" and "src". "_key" is a string (aka VARCHAR),
"created" and "updated" are timestamps while "src" is a JSON column holding
the JSON document. The table name reflects the collection
name without the ".ds" extension (e.g. data.ds is stored in a database called
data having a table also called data).

The output of __{app_name}__ is a JSON array of objects. The order of the
objects is determined by the your SQL statement and SQL engine. There
is an option to generate a 2D grid of values in JSON, CSV or YAML formats.
See OPTIONS for details.

# PARAMETERS

C_NAME
: If harvesting the dataset collection name to harvest the records to.

SQL_STATEMENT
: The SQL statement should conform to the SQL dialect used for the
JSON store for the JSON store (e.g.  Postgres, MySQL and SQLite 3).
The SELECT clause should return a single JSON object type per row.
__{app_name}__ returns an JSON array of JSON objects returned
by the SQL query.

PARAMS
: Is optional, it is any values you want to pass to the SQL_STATEMENT.

# SQL Store Scheme

_key
: The key or id used to identify the JSON documented stored.

src
: This is a JSON column holding the JSON document

created
: The date the JSON document was created in the table

updated
: The date the JSON document was updated


# OPTIONS

-help
: display help

-license
: display license

-version
: display version

-pretty
: pretty print the resulting JSON array

-sql SQL_FILENAME
: read SQL from a file. If filename is "-" then read SQL from standard input.

-grid STRING_OF_ATTRIBUTE_NAMES
: Returns list as a 2D grid of values. This options requires a comma delimited
string of attribute names for the outer object to include in grid output. It
can be combined with -pretty options.

-csv STRING_OF_ATTRIBUTE_NAMES
: Like -grid this takes our list of dataset objects and a list of attribute
names but rather than create a 2D JSON array of values it creates CSV 
representation with the first row as the attribute names.

-yaml STRING_OF_ATTRIBUTE_NAMES
: Like -grid this takes our list of dataset objects and a list of attribute
names but rather than create a 2D JSON of values it creates YAML 
representation.

-index
: This will create a SQLite3 index for a collection. This enables {app_name}
to query pairtree collections using SQLite3 SQL dialect just as it would for
SQL storage collections (i.e. don't use with postgres, mysql or sqlite based
dataset collections. It is not needed for them). Note the index is always
built before executing the SQL statement.

# EXAMPLES

Generate a list of JSON objects with the ` + "`" + `_key` + "`" + ` value
merged with the object stored as the ` + "`" + `._Key` + "`" + ` attribute.
The colllection name "data.ds" which is implemented using Postgres
as the JSON store. (note: in Postgres the ` + "`" + `||` + "`" + ` is very helpful).

~~~
{app_name} data.ds "SELECT jsonb_build_object('_Key', _key)::jsonb || src::jsonb FROM data"
~~~

In this example we're returning the "src" in our collection by querying
for a "id" attribute in the "src" column. The id is passed in as an attribute
using the Postgres positional notatation in the statement.

~~~
{app_name} data.ds "SELECT src FROM data WHERE src->>'id' = $1 LIMIT 1" "xx103-3stt9"
~~~

`

	DSImporterHelpText = `` /* 1191-byte string literal not displayed */

)
View Source
const (
	Sqlite3SchemaName = "sqlite"
	Sqlite3DriverName = "sqlite"

	PostgresSchemaName = "postgres"
	PostgresDriverName = "postgres"

	MySQLSchemaName = "mysql"
	MySQLDriverName = "mysql"

	// None means versioning is turned off for collection
	None = 0
	// Major means increment the major semver value on creation or update
	Major = 1
	// Minor means increment the minor semver value on creation or update
	Minor = 2
	// Patach means increment the patch semver value on creation or update
	Patch = 3
)
View Source
const (
	// Version number of release
	Version = "2.3.2"

	// ReleaseDate, the date version.go was generated
	ReleaseDate = "2025-07-11"

	// ReleaseHash, the Git hash when version.go was generated
	ReleaseHash = "76950ed"
	LicenseText = `` /* 1525-byte string literal not displayed */

)
View Source
const (

	// License is a formatted from for dataset package based command line tools
	License = `` /* 1545-byte string literal not displayed */

)

Variables

This section is empty.

Functions

func Analyzer

func Analyzer(cName string, verbose bool) error

Analyzer checks the collection version and analyzes current state of collection reporting on errors.

NOTE: the collection MUST BE CLOSED when Analyzer is called otherwise the results will not be accurate.

func ApiVersion

func ApiVersion(w http.ResponseWriter, r *http.Request, api *API, cName string, verb string, options []string)

ApiVersion returns the version of the web service running. This will normally be the same version of dataset you installed.

```shell

curl -X GET http://localhost:8485/api/version

```

func Attach

func Attach(w http.ResponseWriter, r *http.Request, api *API, cName, verb string, options []string)

Attach will add or replace an attachment for a JSON object in the collection.

```shell

KEY="123"
FILENAME="mystuff.zip"
curl -X POST \
   http://localhost:8585/api/journals.ds/attachment/$KEY/$FILENAME
     -H "Content-Type: application/zip" \
     --data-binary "@./mystuff.zip"

```

func AttachmentVersions

func AttachmentVersions(w http.ResponseWriter, r *http.Request, api *API, cName, verb string, options []string)

func Attachments

func Attachments(w http.ResponseWriter, r *http.Request, api *API, cName, verb string, options []string)

Attachemnts lists the attachments avialable for a JSON object in the collection.

```shell

KEY="123"
curl -X GET http://localhost:8585/api/journals.ds/attachments/$KEY

```

func BytesProcessor

func BytesProcessor(varMap map[string]string, text []byte) []byte

BytesProcessor takes the a text and replaces all the keys (e.g. "{app_name}") with their value (e.g. "dataset"). It is used to prepare command line and daemon document for display.

func CliDisplayHelp

func CliDisplayHelp(in io.Reader, out io.Writer, eout io.Writer, args []string) error

CliDisplayHelp writes out help on a supported topic

func CliDisplayUsage

func CliDisplayUsage(out io.Writer, appName string, flagSet *flag.FlagSet)

CliDisplayUsage displays a usage message.

func Codemeta

func Codemeta(w http.ResponseWriter, r *http.Request, api *API, cName string, verb string, options []string)

Collection returns the codemeta JSON for a specific collection. Example collection name "journals.ds"

```shell

curl -X GET http://localhost:8485/api/collection/journals.ds

```

func Collections

func Collections(w http.ResponseWriter, r *http.Request, api *API, cName string, verb string, options []string)

Collections returns a list of dataset collections supported by the running web service.

```shell

curl -X GET http://localhost:8485/api/collections

```

func Create

func Create(w http.ResponseWriter, r *http.Request, api *API, cName string, verb string, options []string)

Create deposit a JSON object in the collection for a given key.

In this example the json document is in the working directory called "record-123.json" and the environment variable KEY holds the document key which is the string "123".

```shell

KEY="123"
curl -X POST http://localhost:8585/api/journals.ds/object/$KEY
     -H "Content-Type: application/json" \
      --data-binary "@./record-123.json"

```

func Delete

func Delete(w http.ResponseWriter, r *http.Request, api *API, cName string, verb string, options []string)

Delete removes a JSON object from the collection for a given key.

In this example the environment variable KEY holds the document key which is the string "123".

```shell

KEY="123"
curl -X DELETE http://localhost:8585/api/journals.ds/object/$KEY

```

func DeleteVersion

func DeleteVersion(w http.ResponseWriter, r *http.Request, api *API, cName, verb string, options []string)

func DisplayLicense

func DisplayLicense(out io.Writer, appName string)

DisplayLicense returns the license associated with dataset application.

func DisplayVersion

func DisplayVersion(out io.Writer, appName string)

DisplayVersion returns the of the dataset application.

func FixMissingCollectionJson

func FixMissingCollectionJson(cName string) error

FixMissingCollectionJson will scan the collection directory and environment making an educated guess to type of collection collection type

func FmtHelp added in v2.1.3

func FmtHelp(src string, appName string, version string, releaseDate string, releaseHash string) string

FmtHelp lets you process a text block with simple curly brace markup.

func JSONIndent added in v2.1.4

func JSONIndent(src []byte, prefix string, indent string) []byte

JSONIndent takes an byte slice of JSON source and returns an indented version.

func JSONMarshal added in v2.1.2

func JSONMarshal(data interface{}) ([]byte, error)

JSONMarshal provides provide a custom json encoder to solve a an issue with HTML entities getting converted to UTF-8 code points by json.Marshal(), json.MarshalIndent().

func JSONMarshalIndent added in v2.1.2

func JSONMarshalIndent(data interface{}, prefix string, indent string) ([]byte, error)

JSONMarshalIndent provides provide a custom json encoder to solve a an issue with HTML entities getting converted to UTF-8 code points by json.Marshal(), json.MarshalIndent().

func JSONUnmarshal added in v2.1.2

func JSONUnmarshal(src []byte, data interface{}) error

JSONUnmarshal is a custom JSON decoder so we can treat numbers easier

func Keys

func Keys(w http.ResponseWriter, r *http.Request, api *API, cName string, verb string, options []string)

Keys returns the available keys in a collection as a JSON array. Example collection name "journals.ds"

```shell

curl -X GET http://localhost:8485/api/journals.ds/keys

```

func MakeCSV added in v2.1.6

func MakeCSV(src []byte, attributes []string) ([]byte, error)

MakeCSV takes JSON source holding an array of objects and uses the attribute list to render a CSV file from the list. It returns the CSV content as a byte slice along with an error.

func MakeGrid added in v2.1.5

func MakeGrid(src []byte, attributes []string) ([]byte, error)

MakeGrid takes JSON source holding an array of objects and uses the attribute list to render a 2D grid of values where the columns match the attribute name list provided. If an attribute is missing a nil is inserted. MakeGrid returns the grid as JSON source along with an error value.

func MakeYAML added in v2.1.18

func MakeYAML(src []byte, attributes []string) ([]byte, error)

MakeYAML takes JSON source holding an array of objects and uses the attribute list to render a new YAML file from the targeted list. It returns the YAML content as a byte slice along with an error.

func Migrate

func Migrate(srcName string, dstName string, verbose bool) error

Migrate a dataset v1 collection to a v2 collection. Both collections need to already exist. Records from v1 will be read out of v1 and created in v2.

NOTE: Migrate does not current copy attachments.

func ObjectVersions

func ObjectVersions(w http.ResponseWriter, r *http.Request, api *API, cName, verb string, options []string)

func ParseDSN

func ParseDSN(uri string) (string, error)

func Prune

func Prune(w http.ResponseWriter, r *http.Request, api *API, cName, verb string, options []string)

Prune removes and attachment from a JSON object in the collection.

```shell

KEY="123"
FILENAME="mystuff.zip"
curl -X DELETE \
   http://localhost:8585/api/journals.ds/attachment/$KEY/$FILENAME

```

func PruneVersion

func PruneVersion(w http.ResponseWriter, r *http.Request, api *API, cName, verb string, options []string)

func Query added in v2.1.13

func Query(w http.ResponseWriter, r *http.Request, api *API, cName string, verb string, options []string)

Query returns the results from a SQL function stored in MySQL or Postgres. The query takes a query name followed by a path part that maps the order of the fields. This is needed because the SQL prepared statments use paramter order is mostly common to SQL dialects.

In this example we're runing the SQL statement with the name of "journal_search" with title mapped to `$1` and journal mapped to `$2`.

```shell

curl -X POST http://localhost:8485/api/journals.ds/query/journal_search/title/journal \
     --data "title=Princess+Bride" \
     --data "journal=Movies+and+Popculture"

```

NOTE: the SQL query must conform to the same constraints as dsquery SQL constraints.

func Read

func Read(w http.ResponseWriter, r *http.Request, api *API, cName string, verb string, options []string)

Read retrieves a JSON object from the collection for a given key.

In this example the json retrieved will be called "record-123.json" and the environment variable KEY holds the document key as a string "123".

```shell

KEY="123"
curl -o "record-123.json" -X GET \
     http://localhost:8585/api/journals.ds/object/$KEY

```

func ReadKeys

func ReadKeys(keysName string, in io.Reader) ([]string, error)

ReadKeys reads a list of keys given filename or an io.Reader (e.g. standard input) as fallback. The key file should be formatted one key per line with a line delimited of "\n".

```

keys, err := dataset.ReadKeys(keysFilename, os.Stdin)
if err != nil {
}

```

func ReadSource

func ReadSource(fName string, in io.Reader) ([]byte, error)

ReadSource reads the source text from a filename or io.Reader (e.g. standard input) as fallback.

``` src, err := ReadSource(inputName, os.Stdin)

if err != nil {
   ...
}

```

func ReadVersion

func ReadVersion(w http.ResponseWriter, r *http.Request, api *API, cName, verb string, options []string)

func Repair

func Repair(cName string, verbose bool) error

Repair takes a collection name and calls walks the pairtree and repairs collection.json as appropriate.

NOTE: the collection MUST BE CLOSED when repair is called otherwise the repaired collection may revert.

func Retrieve

func Retrieve(w http.ResponseWriter, r *http.Request, api *API, cName, verb string, options []string)

Attach retrieve an attachment from a JSON object in the collection.

```shell

KEY="123"
FILENAME="mystuff.zip"
curl -X GET \
   http://localhost:8585/api/journals.ds/attachment/$KEY/$FILENAME

```

func RetrieveVersion

func RetrieveVersion(w http.ResponseWriter, r *http.Request, api *API, cName, verb string, options []string)

func RowInterfaceToString added in v2.1.0

func RowInterfaceToString(r []interface{}) []string

RowInterfaceToString takes a 1D slice of interface{} and returns a 1D slice of string, of conversion then cell will be set to empty string.

func RowStringToInterface added in v2.1.0

func RowStringToInterface(r []string) []interface{}

RowStringToInterface takes a 1D slice of string and returns a 1D slice of interface{}

func RunAPI

func RunAPI(appName string, settingsFile string, debug bool) error

RunAPI takes a JSON configuration file and opens all the collections to be used by web service.

```

appName := filepath.Base(sys.Argv[0])
settingsFile := "settings.yaml"
if err := api.RunAPI(appName, settingsFile); err != nil {
   ...
}

```

func RunCLI

func RunCLI(in io.Reader, out io.Writer, eout io.Writer, args []string) error

/ RunCLI implemented the functionlity used by the cli.

func StringProcessor

func StringProcessor(varMap map[string]string, text string) string

StringProcessor takes the a text and replaces all the keys (e.g. "{app_name}") with their value (e.g. "dataset"). It is used to prepare command line and daemon document for display.

func TableInterfaceToString added in v2.1.0

func TableInterfaceToString(t [][]interface{}) [][]string

TableInterfaceToString takes a 2D slice of interface{} holding simple types (e.g. string, int, int64, float, float64, rune) and returns a 2D slice of string suitable for working with the csv encoder package. Uses ValueInterfaceToString() for conversion storing an empty string if they is an error.

func TableStringToInterface added in v2.1.0

func TableStringToInterface(t [][]string) [][]interface{}

TableStringToInterface takes a 2D slice of string and returns an 2D slice of interface{}.

func Update

func Update(w http.ResponseWriter, r *http.Request, api *API, cName string, verb string, options []string)

Update replaces a JSON object in the collection for a given key.

In this example the json document is in the working directory called "record-123.json" and the environment variable KEY holds the document key which is the string "123".

```shell

KEY="123"
curl -X PUT http://localhost:8585/api/journals.ds/object/$KEY
     -H "Content-Type: application/json" \
      --data-binary "@./record-123.json"

```

func ValueInterfaceToString added in v2.1.0

func ValueInterfaceToString(val interface{}) (string, error)

ValueInterfaceToString - takes a interface{} and renders it as a string

func ValueStringToInterface added in v2.1.0

func ValueStringToInterface(s string) (interface{}, error)

ValueStringToInterface takes a string and returns an interface{}

func WriteKeys

func WriteKeys(keyFilename string, out io.Writer, keys []string) error

WriteKeys writes a list of keys to given filename or to io.Writer as fallback. The key file is formatted as one key per line using "\n" as a separator.

```

keys := ...
if err := WriteKeys(out, keyFilename, keys); err != nil {
   ...
}

```

func WriteSource

func WriteSource(fName string, out io.Writer, src []byte) error

WriteSource writes a source text to a file or to the io.Writer of filename not set.

func YAMLMarshal added in v2.1.11

func YAMLMarshal(data interface{}) ([]byte, error)

YAMLMarshal provides provide a custom json encoder to solve a an issue with HTML entities getting converted to UTF-8 code points by json.Marshal(), json.MarshalIndent().

func YAMLMarshalIndent added in v2.1.11

func YAMLMarshalIndent(data interface{}, spaces int) ([]byte, error)

YAMLMarshalIndent provides provide a custom json encoder to solve a an issue with HTML entities getting converted to UTF-8 code points by json.Marshal(), json.MarshalIndent().

func YAMLUnmarshal added in v2.1.11

func YAMLUnmarshal(src []byte, data interface{}) error

YAMLUnmarshal is a custom YAML decoder so we can treat numbers easier

Types

type API

type API struct {
	// AppName is the name of the running application. E.g. os.Args[0]
	AppName string
	// SettingsFile is the path to the settings file.
	SettingsFile string
	// Version is the version of the API running
	Version string
	// Settings is the configuration reading from SettingsFile
	Settings *Settings
	// CMap is a map to the collections supported by the web service.
	CMap map[string]*Collection

	// Routes holds a double map of prefix path and HTTP method that
	// points to the function that will be dispatched if found.
	//
	// The the first level map identifies the prefix path for the route
	// e.g. "api/version".  No leading slash is expected.
	// The second level map is organized by HTTP method, e.g. "GET",
	// "POST". The second map points to the function to call when
	// the route and method matches.
	Routes map[string]map[string]func(http.ResponseWriter, *http.Request, *API, string, string, []string)

	// Debug if set true will cause more verbose output.
	Debug bool

	// Process ID
	Pid int
}

API this structure holds the information for running an web service instance. One web service may host many collections.

func (*API) Init

func (api *API) Init(appName string, settingsFile string) error

Init setups and the API to run.

func (*API) RegisterRoute

func (api *API) RegisterRoute(prefix string, method string, fn func(http.ResponseWriter, *http.Request, *API, string, string, []string)) error

RegisterRoute resigns a prefix path to a route handler.

prefix is the url path prefix minus the leading slash that is targetted by this handler.

method is the HTTP method the func will process fn is the function that handles this route.

```

func Version(w http.ResponseWriter, r *http.Reqest, api *API, verb string, options []string) {
   ...
}

...

err := api.RegistereRoute("version", http.MethodGet, Version)
if err != nil {
   ...
}

```

func (*API) Reload

func (api *API) Reload(sigName string) error

Reload performs a Shutdown and an init after re-reading in the settings.yaml file.

func (*API) Router

func (api *API) Router(w http.ResponseWriter, r *http.Request)

func (*API) Shutdown

func (api *API) Shutdown(sigName string) int

Shutdown attemtps a graceful shutdown of the service. returns an exit code.

func (*API) WebService

func (api *API) WebService() error

WebService this starts and runs a web server implementation of dataset.

type Attachment

type Attachment struct {
	// Name is the filename and path to be used inside the generated tar file
	Name string `json:"name"`

	// Size remains to to help us migrate pre v0.0.61 collections.
	// It should reflect the last size added.
	Size int64 `json:"size"`

	// Sizes is the sizes associated with the version being attached
	Sizes map[string]int64 `json:"sizes"`

	// Current holds the semver to the last added version
	Version string `json:"version"`

	// Checksum, current implemented as a MD5 checksum for now
	// You should have one checksum per attached version.
	Checksums map[string]string `json:"checksums"`

	// HRef points at last attached version of the attached document
	// If you moved an object out of the pairtree it should be a URL.
	HRef string `json:"href"`

	// VersionHRefs is a map to all versions of the attached document
	// {
	//    "0.0.0": "... /photo.png",
	//    "0.0.1": "... /photo.png",
	//    "0.0.2": "... /photo.png"
	// }
	VersionHRefs map[string]string `json:"version_hrefs"`

	// Created a date string in RTC3339 format
	Created string `json:"created"`

	// Modified a date string in RFC3339 format
	Modified string `json:"modified"`

	// Metadata is a map for application specific metadata about attachments.
	Metadata map[string]interface{} `json:"metadata,omitempty"`
}

Attachment is a structure for holding non-JSON content metadata you wish to store alongside a JSON document in a collection Attachments reside in a their own pairtree of the collection directory. (even when using a SQL store for the JSON document). The attachment metadata is read as needed from disk where the collection folder resides.

type Collection

type Collection struct {
	// DatasetVersion of the collection
	DatasetVersion string `json:"dataset,omitempty"`

	// Name of collection
	Name string `json:"name"`

	// StoreType can be either "pairtree" (default or if attribute is
	// omitted) or "sqlstore".  If sqlstore the connection string, DSN URI,
	// will determine the type of SQL database being accessed.
	StoreType string `json:"storage_type,omitempty"`

	// DsnURI holds protocol plus dsn string. The protocol can be
	// "sqlite://", "mysql://" or "postgres://"and the dsn conforming to the Golang
	// database/sql driver name in the database/sql package.
	DsnURI string `json:"dsn_uri,omitempty"`

	// Model holds the an experimental schema expressed in YAML
	// used to validate objects in a collection. By default it is nil and not used
	// but if a "model.yaml" file exists in the collection root directory it'll be loaded
	// allowing possible varification of structure data.
	Model *models.Model `json:"-"`

	// Created
	Created string `json:"created,omitempty"`

	// Repaired
	Repaired string `json:"repaired,omitempty"`

	// PTStore the point to the pairtree implementation of storage
	PTStore *PTStore `json:"-"`
	// SQLStore points to a SQL database with JSON column support
	SQLStore *SQLStore `json:"-"`

	// Versioning holds the type of versioning implemented in the collection.
	// It can be set to an empty string (the default) which means no versioning.
	// It can be set to "patch" which means objects and attachments are versioned by
	// a semver patch value (e.g. 0.0.X where X is incremented), "minor" where
	// the semver minor value is incremented (e.g. e.g. 0.X.0 where X is incremented),
	// or "major" where the semver major value is incremented (e.g. X.0.0 where X is
	// incremented). Versioning affects storage of JSON objects and their attachments
	// across the whole collection.
	Versioning string `json:"versioning,omitempty"`
	// contains filtered or unexported fields
}

Collection is the holds both operational metadata for collection level operations on collections of JSON objects. General metadata is stored in a codemeta.json file in the root directory along side the collection.json file.

func Init

func Init(name string, dsnURI string) (*Collection, error)

Init - creates a new collection and opens it. It takes a name (e.g. directory holding the collection.json and codemeta.josn files) and an optional DSN in URI form. The default storage engine is a pairtree (i.e. PTSTORE) but some SQL storage engines are supported.

If a DSN URI is a non-empty string then it is the SQL storage engine is used. The database and user access in the SQL engine needs be setup before you can successfully intialized your dataset collection. Currently three SQL database engines are support, SQLite3 or MySQL 8. You select the SQL storage engine by forming a URI consisting of a "protocol" (e.g. "sqlite", "mysql", "postgres"), the protocol delimiter "://" and a Go SQL supported DSN based on the database driver implementation.

A MySQL 8 DSN URI would look something like

`mysql://DB_USER:DB_PASSWD@PROTOCAL_EXPR/DB_NAME`

The one for SQLite3

`sqlite://FILENAME_FOR_SQLITE_DATABASE`

NOTE: The DSN URI is stored in the collections.json. The file should NOT be world readable as that will expose your database password. You can remove the DSN URI after initializing your collection but will then need to provide the DATASET_DSN_URI envinronment variable so you can open your database successfully.

For PTSTORE the access value can be left blank.

```

var (
   c *Collection
   err error
)
name := "my_collection.ds"
c, err = dataset.Init(name, "")
if err != nil {
  // ... handle error
}
defer c.Close()

```

For a sqlstore collection we need to pass the "access" value. This is the file containing a DNS or environment variables formating a DSN.

```

var (
   c *Collection
   err error
)
name := "my_collection.ds"
dsnURI := "sqlite://collection.db"
c, err = dataset.Init(name, dsnURI)
if err != nil {
  // ... handle error
}
defer c.Close()

```

func Open

func Open(name string) (*Collection, error)

Open reads in a collection's operational metadata and returns a new collection structure and error value.

```

var (
   c *dataset.Collection
   err error
)
c, err = dataset.Open("collection.ds")
if err != nil {
   // ... handle error
}
defer c.Close()

```

func (*Collection) AttachFile

func (c *Collection) AttachFile(key string, filename string) error

AttachFile reads a filename from file system and attaches it.

```

key, filename := "123", "report.pdf"
err := c.AttachFile(key, filename)
if err != nil {
   ...
}

```

func (*Collection) AttachStream

func (c *Collection) AttachStream(key string, filename string, buf io.Reader) error

AttachStream is for attaching a non-JSON file via a io buffer. It requires the JSON document key, the filename and a io.Reader. It does not close the reader. If the collection is versioned then the document attached is automatically versioned per collection versioning setting.

Example: attach the file "report.pdf" to JSON document "123"
in an open collection.

```

key, filename := "123", "report.pdf"
buf, err := os.Open(filename)
if err != nil {
   ...
}
err := c.AttachStream(key, filename, buf)
if err != nil {
   ...
}
buf.Close()

```

func (*Collection) AttachVersionFile

func (c *Collection) AttachVersionFile(key string, filename string, version string) error

AttachVersionFile attaches a file to a JSON document in the collection. This does NOT increment the version number of attachment(s). It is used to explicitly replace a attached version of a file. It does not update the symbolic link to the "current" attachment.

```

key, filename, version := "123", "report.pdf", "0.0.3"
err := c.AttachVersionFile(key, filename, version)
if err != nil {
   ...
}

```

func (*Collection) AttachVersionStream

func (c *Collection) AttachVersionStream(key string, filename string, version string, buf io.Reader) error

AttachVersionStream is for attaching open a non-JSON file buffer (via an io.Reader) to a specific version of a file. If attached file exists it is replaced.

Example: attach the file "report.pdf", version "0.0.3" to
JSON document "123" in an open collection.

```

key, filename, version := "123", "helloworld.txt", "0.0.3"
buf, err := os.Open(filename)
if err != nil {
   ...
}
err := c.AttachVersionStream(key, filename, version, buf)
if err != nil {
   ...
}
buf.Close()

```

func (*Collection) AttachmentPath

func (c *Collection) AttachmentPath(key string, filename string) (string, error)

AttachmentPath takes a key and filename and returns the path file system path to the attached file (if found). For versioned collections this is the path the symbolic link for the "current" version.

```

key, filename := "123", "report.pdf"
docPath, err := c.AttachmentPath(key, filename)
if err != nil {
   ...
}

```

func (*Collection) AttachmentVersionPath

func (c *Collection) AttachmentVersionPath(key string, filename string, version string) (string, error)

AttachmentVersionPath takes a key, filename and semver returning the path to the attached versioned file (if found).

```

key, filename, version := "123", "report.pdf", "0.0.3"
docPath, err := c.AttachmentVersionPath(key, filename, version)
if err != nil {
   ...
}

```

func (*Collection) AttachmentVersions

func (c *Collection) AttachmentVersions(key string, filename string) ([]string, error)

AttachmentVersions returns a list of versions for an attached file to a JSON document in the collection.

Example: retrieve a list of versions of an attached file.
"key" is a key in the collection, filename is name of an
attached file for the JSON document referred to by key.

```

versions, err := c.AttachmentVersions(key, filename)
if err != nil {
   ...
}
for i, version := range versions {
   fmt.Printf("key: %q, filename: %q, version: %q", key, filename, version)
}

```

func (*Collection) Attachments

func (c *Collection) Attachments(key string) ([]string, error)

Attachments returns a list of filenames for a key name in the collection

Example: "c" is a dataset collection previously opened,
"key" is a string.  The "key" is for a JSON document in
the collection. It returns an slice of filenames and err.

```

filenames, err := c.Attachments(key)
if err != nil {
   ...
}
// Print the names of the files attached to the JSON document
// referred to by "key".
for i, filename := ranges {
   fmt.Printf("key: %q, filename: %q", key, filename)
}

```

func (*Collection) Close

func (c *Collection) Close() error

Close closes a collection. For a pairtree that means flushing the keymap to disk. For a SQL store it means closing a database connection. Close is often called in conjunction with "defer" keyword.

```

c, err := dataset.Open("my_collection.ds")
if err != nil { /* .. handle error ... */ }
/* do some stuff with the collection */
defer func() {
  if err := c.Close(); err != nil {
     /* ... handle closing error ... */
  }
}()

```

func (*Collection) Codemeta

func (c *Collection) Codemeta() ([]byte, error)

Codemeta returns a copy of the codemeta.json file content found in the collection directory. The collection must be previous open.

```

name := "my_collection.ds"
c, err := dataset.Open(name)
if err != nil {
   ...
}
defer c.Close()
src, err := c.Metadata()
if err != nil {
   ...
}
ioutil.WriteFile("codemeta.json", src, 664)

```

func (*Collection) Create

func (c *Collection) Create(key string, obj map[string]interface{}) error

Create store a an object in the collection. Object will get converted to JSON source then stored. Collection must be open. A Go `map[string]interface{}` is a common way to handle ad-hoc JSON data in gow. Use `CreateObject()` to store structured data.

```

key := "123"
obj := map[]*interface{}{ "one": 1, "two": 2 }
if err := c.Create(key, obj); err != nil {
   ...
}

```

func (*Collection) CreateJSON

func (c *Collection) CreateJSON(key string, src []byte) error

CreateJSON is used to store JSON directory into a dataset collection. NOTE: the JSON is NOT validated.

```

import (
  "fmt"
  "os"
)

func main() {
    c, err := dataset.Open("friends.ds")
    if err != nil {
         fmt.Fprintf(os.Stderr, "%s", err)
         os.Exit(1)
    }
    defer c.Close()

    src := []byte(`{ "ID": "mojo", "Name": "Mojo Sam", "EMail": "mojo.sam@cosmic-cafe.example.org" }`)
    if err := c.CreateJSON("modo", src); err != nil {
         fmt.Fprintf(os.Stderr, "%s", err)
         os.Exit(1)
    }
    fmt.Printf("OK\n")
    os.Exit(0)
}

```

func (*Collection) CreateObject

func (c *Collection) CreateObject(key string, obj interface{}) error

CreateObject is used to store structed data in a dataset collection. The object needs to be defined as a Go struct notated approriately with the domain markup for working with json.

```

import (
  "encoding/json"
  "fmt"
  "os"
)

type Record struct {
    ID string `json:"id"`
    Name string `json:"name,omitempty"`
    EMail string `json:"email,omitempty"`
}

func main() {
    c, err := dataset.Open("friends.ds")
    if err != nil {
         fmt.Fprintf(os.Stderr, "%s", err)
         os.Exit(1)
    }
    defer c.Close()

    obj := &Record{
        ID: "mojo",
        Name: "Mojo Sam",
        EMail: "mojo.sam@cosmic-cafe.example.org",
    }
    if err := c.CreateObject(obj.ID, obj); err != nil {
         fmt.Fprintf(os.Stderr, "%s", err)
         os.Exit(1)
    }
    fmt.Printf("OK\n")
    os.Exit(0)
}

```

func (*Collection) CreateObjectsJSON added in v2.1.0

func (c *Collection) CreateObjectsJSON(keyList []string, src []byte) error

CreateObjectsJSON takes a list of keys and creates a default object for each key as quickly as possible. This is useful in vary narrow situation like quickly creating test data. Use with caution.

NOTE: if object already exist creation is skipped without reporting an error.

func (*Collection) Delete

func (c *Collection) Delete(key string) error

Delete removes an object from the collection. If the collection is versioned then all versions are deleted. Any attachments to the JSON document are also deleted including any versioned attachments.

```

key := "123"
if err := c.Delete(key); err != nil {
   // ... handle error
}

```

func (*Collection) DocPath added in v2.1.0

func (c *Collection) DocPath(key string) (string, error)

DocPath method provides access to a PTStore's document path. If the collection is not a PTStore then an empty path and error is returned with an error message. NOTE: the path returned is a full path including the JSON document stored.

```

c, err := dataset.Open(cName, "")
// ... handle error ...
key := "2488"
s, err := c.DocPath(key)
// ... handle error ...
fmt.Printf("full path to JSON document %q is %q\n", key, s)

```

func (*Collection) Dump added in v2.2.0

func (c *Collection) Dump(out io.Writer) error

```jsonl

{"key": "mirtle", "object": { "one": "1 }}

```

Here is how you would use Dump in a Go project.

```go

cName := "mycollection.ds"
c, err := dataset.Open(cName)
if err != nil {
    ... // handle error
}
defer c.Close()
err := c.Dump(os.Stdout)
if err != nil {
    ... // handle error
}

```

func (*Collection) HasKey

func (c *Collection) HasKey(key string) bool

HasKey takes a collection and checks if a key exists. NOTE: collection must be open otherwise false will always be returned.

```

key := "123"
if c.HasKey(key) {
   ...
}

```

func (*Collection) Keys

func (c *Collection) Keys() ([]string, error)

Keys returns a array of strings holding all the keys in the collection.

```

keys, err := c.Keys()
for _, key := range keys {
   ...
}

```

func (*Collection) KeysJSON added in v2.2.4

func (c *Collection) KeysJSON() ([]byte, error)

KeysJSON returns a JSON encoded list of Keys

```

	src, err := c.KeysJSON()
 if err != nil {
     // ... handle error ...
 }
 fmt.Printf("%s\n", src)

```

func (*Collection) Length

func (c *Collection) Length() int64

Length returns the number of objects in a collection NOTE: Returns a -1 (as int64) on error, e.g. collection not open or Length not available for storage type.

```

var x int64
x = c.Length()

```

func (*Collection) Load added in v2.2.0

func (c *Collection) Load(in io.Reader, overwrite bool, maxCapacity int) error

Load reads JSONL from an io.Reader. The JSONL object should have two attributes. The first is "key" should should be a unique string the object is "object" which is the JSON object to be stored in the collection. The collection needs to exist. If the overwrite parameter is set to true then the object read will overwrite any objects with the same key. If overwrite is false you will get a warning mesage that the object was skipped due to duplicate key. The third parameter is the size of the input buffer scanned in megabytes. If

the value is less or equal to zero then it defaults to 1 megabyte buffer.

```

	 cName := "mycollection.ds"
	 c, err := dataset.open(cName)
	 if err != nil {
	    // ... handle error
	 }
	 defer c.Close()
  // use the default buffer size
	 err = c.Load(os.Stdin, maxCapacity, 0)
	 if err != nil {
	    // ... handle error
	 }

```

func (*Collection) Prune

func (c *Collection) Prune(key string, filename string) error

Prune removes a an attached document from the JSON record given a key and filename. NOTE: In versioned collections this include removing all versions of the attached document.

```

key, filename := "123", "report.pdf"
err := c.Prune(key, filename)
if err != nil {
   ...
}

```

func (*Collection) PruneAll

func (c *Collection) PruneAll(key string) error

PruneAll removes attachments from a JSON record in the collection. When the collection is versioned it removes all versions of all too.

```

key := "123"
err := c.PruneAll(key)
if err != nil {
   ...
}

```

func (*Collection) PruneVersion

func (c *Collection) PruneVersion(key string, filename string, version string) error

PruneVersion removes an attached version of a document.

```

key, filename, version := "123", "report.pdf, "0.0.3"
err := c.PruneVersion(key, filename, version)
if err != nil {
   ...
}

```

func (*Collection) Query added in v2.2.4

func (c *Collection) Query(sqlStmt string, debug bool) ([]interface{}, error)

Query implement the SQL query against a SQLStore or SQLties3 index of pairtree.

func (*Collection) QueryJSON added in v2.2.4

func (c *Collection) QueryJSON(sqlStmt string, debug bool) ([]byte, error)

Query implement the SQL query against a SQLStore and return JSON results.

func (*Collection) Read

func (c *Collection) Read(key string, obj map[string]interface{}) error

Read retrieves a map[string]inteferface{} from the collection, unmarshals it and updates the object pointed to by the map.

```

obj := map[string]interface{}{}

key := "123"
if err := c.Read(key, &obj); err != nil {
   ...
}

```

func (*Collection) ReadJSON

func (c *Collection) ReadJSON(key string) ([]byte, error)

ReadJSON retrieves JSON stored in a dataset collection for a given key. NOTE: It does not validate the JSON

```

	key := "123"
	src, err := c.ReadJSON(key)
 if err != nil {
	   // ... handle error
	}

```

func (*Collection) ReadJSONVersion added in v2.1.1

func (c *Collection) ReadJSONVersion(key string, semver string) ([]byte, error)

ReadJSONVersion retrieves versioned JSON record stored in a dataset collection for a given key and semver. NOTE: It does not validate the JSON

```

	key := "123"
 semver := "0.0.2"
	src, err := c.ReadVersionJSON(key, semver)
 if err != nil {
	   // ... handle error
	}

```

func (*Collection) ReadObject

func (c *Collection) ReadObject(key string, obj interface{}) error

ReadObject retrieves structed data via Go's general inteferface{} type. The JSON document is retreived from the collection, unmarshaled and variable holding the struct is updated.

```

type Record struct {
    ID string `json:"id"`
    Name string `json:"name,omitempty"`
    EMail string `json:"email,omitempty"`
}

// ...

var obj *Record

key := "123"
if err := c.Read(key, &obj); err != nil {
   // ... handle error
}

```

func (*Collection) ReadObjectVersion

func (c *Collection) ReadObjectVersion(key string, version string, obj interface{}) error

ReadObjectVersion retrieves a specific vesion from the collection for the given object.

```

type Record srtuct {
    // ... structure def goes here.
}

var obj = *Record

key, version := "123", "0.0.1"
if err := ReadObjectVersion(key, version, &obj); err != nil {
   ...
}

```

func (*Collection) ReadVersion

func (c *Collection) ReadVersion(key string, version string, obj map[string]interface{}) error

ReadVersion retrieves a specific vesion from the collection for the given object.

```

var obj map[string]interface{}

key, version := "123", "0.0.1"
if err := ReadVersion(key, version, &obj); err != nil {
   ...
}

```

func (*Collection) RetrieveFile

func (c *Collection) RetrieveFile(key string, filename string) ([]byte, error)

RetrieveFile retrieves a file attached to a JSON document in the collection.

```

key, filename := "123", "report.pdf"
src, err := c.RetrieveFile(key, filename)
if err != nil {
   ...
}
err = ioutil.WriteFile(filename, src, 0664)
if err != nil {
   ...
}

```

func (*Collection) RetrieveStream

func (c *Collection) RetrieveStream(key string, filename string, out io.Writer) error

RetrieveStream takes a key and filename then returns an io.Reader, and error. If the collection is versioned then the stream is for the "current" version of the attached file.

```

key, filename := "123", "report.pdf"
src := []byte{}
buf := bytes.NewBuffer(src)
err := c.Retrieve(key, filename, buf)
if err != nil {
   ...
}
ioutil.WriteFile(filename, src, 0664)

```

func (*Collection) RetrieveVersionFile

func (c *Collection) RetrieveVersionFile(key string, filename string, version string) ([]byte, error)

RetrieveVersionFile retrieves a file version attached to a JSON document in the collection.

```

key, filename, version := "123", "report.pdf", "0.0.3"
src, err := c.RetrieveVersionFile(key, filename, version)
if err != nil  {
   ...
}
err = ioutil.WriteFile(filename + "_" + version, src, 0664)
if err != nil {
   ...
}

```

func (*Collection) RetrieveVersionStream

func (c *Collection) RetrieveVersionStream(key string, filename string, version string, buf io.Writer) error

RetrieveVersionStream takes a key, filename and version then returns an io.Reader and error.

```

key, filename, version := "123", "helloworld.txt", "0.0.3"
src := []byte{}
buf := bytes.NewBuffer(src)
err := c.RetrieveVersion(key, filename, version, buf)
if err != nil {
   ...
}
ioutil.WriteFile(filename + "_" + version, src, 0664)

```

func (*Collection) Sample

func (c *Collection) Sample(size int) ([]string, error)

Sample takes a sample size and returns a list of randomly selected keys and an error. Sample size most be greater than zero and less or equal to the number of keys in the collection. Collection needs to be previously opened.

```

smapleSize := 1000
keys, err := c.Sample(sampleSize)

```

func (*Collection) SetVersioning

func (c *Collection) SetVersioning(versioning string) error

SetVersioning sets the versioning on a collection. The version string can be "major", "minor", "patch". Any other value (e.g. "", "off", "none") will turn off versioning for the collection.

func (*Collection) Update

func (c *Collection) Update(key string, obj map[string]interface{}) error

Update replaces a JSON document in the collection with a new one. If the collection is versioned then it creates a new versioned copy and updates the "current" version to use it.

```

key := "123"
obj["three"] = 3
if err := c.Update(key, obj); err != nil {
   ...
}

```

func (*Collection) UpdateJSON

func (c *Collection) UpdateJSON(key string, src []byte) error

UpdateJSON replaces a JSON document in the collection with a new one. NOTE: It does not validate the JSON

```

src := []byte(`{"Three": 3}`)
key := "123"
if err := c.UpdateJSON(key, src); err != nil {
   // ... handle error
}

```

func (*Collection) UpdateMetadata

func (c *Collection) UpdateMetadata(fName string) error

UpdateMetadata imports new codemeta citation information replacing the previous version. Collection must be open.

```

name := "my_collection.ds"
codemetaFilename := "../codemeta.json"
c, err := dataset.Open(name)
if err != nil {
   ...
}
defer c.Close()
c.UpdateMetadata(codemetaFilename)

```

func (*Collection) UpdateObject

func (c *Collection) UpdateObject(key string, obj interface{}) error

UpdateObject replaces a JSON document in the collection with a new one. If the collection is versioned then it creates a new versioned copy and updates the "current" version to use it.

```

type Record struct {
    // ... structure def goes here.
    Three int `json:"three"`
}

var obj = *Record

key := "123"
obj := &Record {
  Three: 3,
}
if err := c.Update(key, obj); err != nil {
   // ... handle error
}

```

func (*Collection) UpdatedKeys

func (c *Collection) UpdatedKeys(start string, end string) ([]string, error)

UpdatedKeys takes a start and end time and returns a list of keys for records that were modified in that time range. The start and end values are expected to be in YYYY-MM-DD HH:MM:SS notation or empty strings.

NOTE: This currently only supports SQL stored collections.

func (*Collection) UpdatedKeysJSON added in v2.2.4

func (c *Collection) UpdatedKeysJSON(start string, end string) ([]byte, error)

UpdatedKeysJSON takes a start and end time and returns a JSON encoded list of keys for records that were modified in that time range. The start and end values are expected to be in YYYY-MM-DD HH:MM:SS notation or empty strings.

NOTE: This currently only supports SQL stored collections.

func (*Collection) Versions

func (c *Collection) Versions(key string) ([]string, error)

Versions retrieves a list of versions available for a JSON document if versioning is enabled for the collection.

```

key, version := "123", "0.0.1"
if versions, err := Versions(key); err != nil {
   ...
}

```

func (*Collection) WorkPath

func (c *Collection) WorkPath() string

WorkPath returns the working path to the collection.

type Config

type Config struct {
	// Dname holds the dataset collection name/path.
	CName string `json:"dataset,omitempty" yaml:"dataset,omitempty"`

	// Dsn URI describes how to connection to a SQL storage engine
	// use by the collection(s).
	// e.g. "sqlite://my_collection.ds/collection.db".
	//
	// The Dsn URI may be past in from the environment via the
	// variable DATASET_DSN_URI. E.g. where all the collections
	// are stored in a common database.
	DsnURI string `json:"dsn_uri,omitemtpy" yaml:"dsn_uri,omitempty"`

	// QueryFn maps a query name to a SQL statement used to query the
	// dataset collection. Multiple query statments can be defaulted. They
	// Need to conform to the SQL dialect of the store. NOTE: Only collections
	// using SQL stores are supported.
	QueryFn map[string]string `json:"query,omitempty" yaml:"query,omitempty"`

	// Model describes the record structure to store. It is to validate
	// URL encoded POST and PUT tot the collection.
	Model *models.Model `json:"model,omitempty" yaml:"model,omitempty"`

	// SuccessPage is used for form submissions that are succcessful, i.e. HTTP Status OK (200)
	SuccessPage string `json:"success_page,omitempty" yaml:"success_page,omitempty"`

	// FailPage is used to for form submissions that are unsuccessful, i.g. HTTP response other than OK
	FailPage string `json:"fail_page,omitempty" yaml:"fail_page,omitempty"`

	// Keys lets you get a list of keys in a collection
	Keys bool `json:"keys,omitempty" yaml:"keys,omitempty"`

	// Create allows you to add objects to a collection
	Create bool `json:"create,omitempty" yaml:"create,omitempty"`

	// CreateSuccess defines a redirect URL or relative path for successful POST to datasetd
	CreateSuccess string `json:"create_success,omitempty" yaml:"create_success,omitempty"`

	// CreateError defines a redirect URL or relative path for unsuccessful POST to datasetd
	CreateError string `json:"create_error,omitempty" yaml:"create_error,omitempty"`

	// Read allows you to retrive an object from a collection
	Read bool `json:"read,omitempty" yaml:"read,omitempty"`

	// Update allows you to replace objects in a collection
	Update bool `json:"update,omitempty" yaml:"update,omitempty"`

	// Delete allows you to removes objects, object versions,
	// and attachments from a collection
	Delete bool `json:"delete,omitempty" yaml:"delete,omitempty"`

	// Attachments allows you to attached documents for an object in the
	// collection.
	Attachments bool `json:"attachments,omitempty" yaml:"attachments,omitempty"`

	// Attach allows you to store an attachment for an object in
	// the collection
	Attach bool `json:"attach,omitempty" yaml:"attach,omitempty"`

	// Retrieve allows you to get an attachment in the collection for
	// a given object.
	Retrieve bool `json:"retrieve,omitempty" yaml:"retreive,omitempty"`

	// Prune allows you to remove an attachment from an object in
	// a collection
	Prune bool `json:"prune,omitempty" yaml:"prune,omitempty"`

	// FrameRead allows you to see a list of frames, check for
	// a frame's existance and read the content of a frame, e.g.
	// it's definition, keys, object list.
	FrameRead bool `json:"frame_read,omitempty" yaml:"frame_read,omitempty"`

	// FrameWrite allows you to create a frame, change the frame's
	// content or remove the frame completely.
	FrameWrite bool `json:"frame_write,omitempty" yaml:"frame_write,omitempty"`

	// Versions allows you to list versions, read and delete
	// versioned objects and attachments in a collection.
	Versions bool `json:"versions,omitempty" yaml:"versions,omitempty"`
}

Config holds the collection specific configuration.

type DSImport added in v2.1.7

type DSImport struct {
	Comma            string
	Comment          string
	Overwrite        bool
	LazyQuotes       bool
	TrimLeadingSpace bool
}

func (*DSImport) Run added in v2.1.7

func (app *DSImport) Run(in io.Reader, out io.Writer, eout io.Writer, cName string, keyColumn string) error

type DSQuery added in v2.1.4

type DSQuery struct {
	CName      string   `json:"c_name,omitempty"`
	Stmt       string   `json:"stmt,omitempty"`
	Pretty     bool     `json:"pretty,omitempty"`
	AsGrid     bool     `json:"as_grid,omitempty"`
	AsCSV      bool     `json:"csv,omitempty"`
	AsYAML     bool     `json:"yaml,omitempty"`
	Attributes []string `json:"attributes,omitempty"`
	PTIndex    bool     `json:"pt_index,omitempty"`
	// contains filtered or unexported fields
}

func (*DSQuery) Run added in v2.1.4

func (app *DSQuery) Run(in io.Reader, out io.Writer, eout io.Writer, cName string, stmt string, params []string) error

type PTStore

type PTStore struct {
	// Working path to the directory where the collections.json is found.
	WorkPath string

	// Versioning holds the type of versioning active for the stored
	// collection. The options are None (no versioning, the default),
	// Major (major value in semver is incremented), Minor (minor value
	// in semver is incremented) and Patch (patch value in semver is incremented)
	Versioning int
	// contains filtered or unexported fields
}

func PTStoreOpen

func PTStoreOpen(name string, dsnURI string) (*PTStore, error)

Open opens the storage system and returns an storage struct and error It is passed a directory name that holds collection.json. The second parameter is for a DSN URI which is ignored in a pairtree implementation.

```

name := "testout/T1.ds" // a collection called "T1.ds"
store, err := c.Store.Open(name, "")
if err != nil {
   ...
}
defer store.Close()

```

func (*PTStore) Close

func (store *PTStore) Close() error

Close closes the storage system freeing resources as needed.

```

if err := store.Close(); err != nil {
   ...
}

```

func (*PTStore) Create

func (store *PTStore) Create(key string, src []byte) error

Create stores a new JSON object in the collection It takes a string as a key and a byte slice of encoded JSON

err := store.Create("123", []byte(`{"one": 1}`))
if err != nil {
   ...
}

func (*PTStore) Delete

func (store *PTStore) Delete(key string) error

Delete removes all versions of JSON document and attachment indicated by the key provided.

key := "123"
if err := store.Delete(key); err != nil {
   ...
}

NOTE: If you're versioning your collection then you never really want to delete. An approach could be to use update using an empty JSON document to indicate the document is retired those avoiding the deletion problem of versioned content.

```

key := "123"
if err := store.Delete(key); err != nil {
   ...
}

```

func (*PTStore) DocPath

func (store *PTStore) DocPath(key string) (string, error)

func (*PTStore) HasKey

func (store *PTStore) HasKey(key string) bool

HasKey will look up and make sure key is in collection. PTStore must be open or zero false will always be returned.

```

key := "123"
if store.HasKey(key) {
   ...
}

```

func (*PTStore) Keymap

func (store *PTStore) Keymap() map[string]string

func (*PTStore) KeymapName

func (store *PTStore) KeymapName() string

func (*PTStore) Keys

func (store *PTStore) Keys() ([]string, error)

List returns all keys in a collection as a slice of strings.

```

var keys []string
keys, _ = store.Keys()
/* iterate over the keys retrieved */
for _, key := range keys {
   ...
}

```

NOTE: the error will always be nil, this func signature needs to match the other storage engines.

func (*PTStore) Length

func (store *PTStore) Length() int64

Length returns the number of records (len(store.keys)) in the collection Requires collection to be open.

```

var x int64

x = store.Length()

```

func (*PTStore) Read

func (store *PTStore) Read(key string) ([]byte, error)

Read retrieves takes a string as a key and returns the encoded JSON document from the collection. If versioning is enabled this is always the "current" version of the object. Use Versions() and ReadVersion() for versioned copies.

```

src, err := store.Read("123")
if err != nil {
   ...
}
obj := map[string]interface{}{}
if err := json.Unmarshal(src, &obj); err != nil {
   ...
}

```

func (*PTStore) ReadVersion

func (store *PTStore) ReadVersion(key string, version string) ([]byte, error)

ReadVersion retrieves a specific version of a JSON document stored in a collection.

```

key, version := "123", "0.0.1"
src, err := store.ReadVersion(key, version)
if err != nil {
   ...
}

```

func (*PTStore) SetVersioning

func (store *PTStore) SetVersioning(setting int) error

SetVersioning sets the type of versioning associated with the stored collection.

func (*PTStore) Update

func (store *PTStore) Update(key string, src []byte) error

Update takes a key and encoded JSON object and updates a JSON document in the collection.

```

key := "123"
src := []byte(`{"one": 1, "two": 2}`)
if err := store.Update(key, src); err != nil {
   ...
}

```

func (*PTStore) UpdateKeymap

func (store *PTStore) UpdateKeymap(keymap map[string]string) error

func (*PTStore) Versions

func (store *PTStore) Versions(key string) ([]string, error)

Versions retrieves a list of semver version strings available for a JSON document.

```

key := "123"
versions, err := store.Versions(key)
if err != nil {
   ...
}
for _, version := range versions {
     // do something with version string.
}

```

type SQLStore

type SQLStore struct {
	// WorkPath holds the path to where the collection definition is held.
	WorkPath string

	// versioning
	Versioning int
	// contains filtered or unexported fields
}

func SQLStoreInit

func SQLStoreInit(name string, dsnURI string) (*SQLStore, error)

SQLStoreInit creates a table to hold the collection if it doesn't already exist.

func SQLStoreOpen

func SQLStoreOpen(name string, dsnURI string) (*SQLStore, error)

SQLStoreOpen opens the storage system and returns an storage struct and error It is passed either a filename. For a Pairtree the would be the path to collection.json and for a sql store file holding a DSN URI. The DSN URI is formed from a protocal prefixed to the DSN. E.g. for a SQLite connection to test.ds database the DSN URI might be "sqlite://collections.db".

```

store, err := c.Store.Open(c.Name, c.DsnURI)
if err != nil {
   ...
}

```

func (*SQLStore) Close

func (store *SQLStore) Close() error

Close closes the storage system freeing resources as needed.

```

if err := storage.Close(); err != nil {
   ...
}

```

func (*SQLStore) Create

func (store *SQLStore) Create(key string, src []byte) error

Create stores a new JSON object in the collection It takes a string as a key and a byte slice of encoded JSON

err := storage.Create("123", []byte(`{"one": 1}`))
if err != nil {
   ...
}

func (*SQLStore) Delete

func (store *SQLStore) Delete(key string) error

Delete removes a JSON document from the collection

key := "123"
if err := storage.Delete(key); err != nil {
   ...
}

func (*SQLStore) HasKey

func (store *SQLStore) HasKey(key string) bool

HasKey will look up and make sure key is in collection. SQLStore must be open or zero false will always be returned.

```

key := "123"
if store.HasKey(key) {
   ...
}

```

func (*SQLStore) Keys

func (store *SQLStore) Keys() ([]string, error)

Keys returns all keys in a collection as a slice of strings.

var keys []string
keys, _ = storage.Keys()
/* iterate over the keys retrieved */
for _, key := range keys {
   ...
}

func (*SQLStore) Length

func (store *SQLStore) Length() int64

Length returns the number of records (count of rows in collection). Requires collection to be open.

func (*SQLStore) Read

func (store *SQLStore) Read(key string) ([]byte, error)

Read retrieves takes a string as a key and returns the encoded JSON document from the collection

src, err := storage.Read("123")
if err != nil {
   ...
}
obj := map[string]interface{}{}
if err := json.Unmarshal(src, &obj); err != nil {
   ...
}

func (*SQLStore) ReadVersion

func (store *SQLStore) ReadVersion(key string, version string) ([]byte, error)

ReadVersion returns a specific version of a JSON object.

func (*SQLStore) SetVersioning

func (store *SQLStore) SetVersioning(setting int) error

SetVersioning sets versioning to Major, Minor, Patch or None If versioning is set to Major, Minor or Patch a table in the open SQL storage engine will be created.

func (*SQLStore) Update

func (store *SQLStore) Update(key string, src []byte) error

Update takes a key and encoded JSON object and updates a

key := "123"
src := []byte(`{"one": 1, "two": 2}`)
if err := storage.Update(key, src); err != nil {
   ...
}

func (*SQLStore) UpdatedKeys

func (store *SQLStore) UpdatedKeys(start string, end string) ([]string, error)

UpdatedKeys returns all keys updated in a time range

```

var (
   keys []string
   start = "2022-06-01 00:00:00"
   end = "20022-06-30 23:23:59"
)
keys, _ = storage.UpdatedKeys(start, end)
/* iterate over the keys retrieved */
for _, key := range keys {
   ...
}

```

func (*SQLStore) Versions

func (store *SQLStore) Versions(key string) ([]string, error)

Versions return a list of semver strings for a versioned object.

type Settings

type Settings struct {
	// Host holds the URL to listen to for the web API
	Host string `json:"host" yaml:"host"`

	// Htdocs holds the path to static content that will be
	// provided by the web service.
	Htdocs string `json:"htdocs" yaml:"htdocs"`

	// Collections holds an array of collection configurations that
	// will be supported by the web service.
	Collections []*Config `json:"collections" yaml:"collections"`
}

Settings holds the specific settings for the web service.

func ConfigOpen

func ConfigOpen(fName string) (*Settings, error)

ConfigOpen reads the JSON or YAML configuration file provided, validates it and returns a Settings structure and error.

NOTE: if the dsn string isn't specified

```

settings := "settings.yaml"
settings, err := ConfigOpen(settings)
if err != nil {
   ...
}

```

func (*Settings) GetCfg added in v2.1.13

func (settings *Settings) GetCfg(cName string) (*Config, error)

GetCfg retrieves a collection configuration from a Settings object using dataset name.

func (*Settings) String

func (settings *Settings) String() string

String renders the configuration as a JSON string.

func (*Settings) WriteFile

func (settings *Settings) WriteFile(name string, perm os.FileMode) error

Write will save a configuration to the filename provided.

```

fName := "new-settings.yaml"
mysql_dsn_uri := os.Getenv("DATASET_DSN_URI")

settings := new(Settings)
settings.Host = "localhost:8001"
settings.Htdocs = "/usr/local/www/htdocs"

cfg := &Config{
	DsnURI: mysql_dsn_uri,
   CName: "my_collection.ds",
   Keys: true,
   Create: true,
   Read:  true,
	Update: true
	Delete: true
	Attach: false
	Retrieve: false
	Prune: false
}}
settings.Collections = append(settings.Collections, cfg)

if err := api.WriteFile(fName, 0664); err != nil {
   ...
}

```

type StorageSystem

type StorageSystem interface {

	// Open opens the storage system and returns an storage struct and error
	// It is passed either a filename. For a Pairtree the would be the
	// path to collection.json and for a sql store file holding a DSN
	//
	// “`
	//  store, err := c.Store.Open(c.Access)
	//  if err != nil {
	//     ...
	//  }
	// “`
	//
	Open(name string, dsnURI string) (*StorageSystem, error)

	// Close closes the storage system freeing resources as needed.
	//
	// “`
	//   if err := storage.Close(); err != nil {
	//      ...
	//   }
	// “`
	//
	Close() error

	// Create stores a new JSON object in the collection
	// It takes a string as a key and a byte slice of encoded JSON
	//
	//   err := storage.Create("123", []byte(`{"one": 1}`))
	//   if err != nil {
	//      ...
	//   }
	//
	Create(string, []byte) error

	// Read retrieves takes a string as a key and returns the encoded
	// JSON document from the collection
	//
	//   src, err := storage.Read("123")
	//   if err != nil {
	//      ...
	//   }
	//   obj := map[string]interface{}{}
	//   if err := json.Unmarshal(src, &obj); err != nil {
	//      ...
	//   }
	Read(string) ([]byte, error)

	// Versions returns a list of semver formatted version strings avialable for an JSON object
	Versions(string) ([]string, error)

	// ReadVersion takes a key and semver version string and return that version of the
	// JSON object.
	ReadVersion(string, string) ([]byte, error)

	// Update takes a key and encoded JSON object and updates a
	// JSON document in the collection.
	//
	//   key := "123"
	//   src := []byte(`{"one": 1, "two": 2}`)
	//   if err := storage.Update(key, src); err != nil {
	//      ...
	//   }
	//
	Update(string, []byte) error

	// Delete removes all versions and attachments of a JSON document.
	//
	//   key := "123"
	//   if err := storage.Delete(key); err != nil {
	//      ...
	//   }
	//
	Delete(string) error

	// Keys returns all keys in a collection as a slice of strings.
	//
	//   var keys []string
	//   keys, _ = storage.List()
	//   /* iterate over the keys retrieved */
	//   for _, key := range keys {
	//      ...
	//   }
	//
	Keys() ([]string, error)

	// HasKey returns true if collection is open and key exists,
	// false otherwise.
	HasKey(string) bool

	// Length returns the number of records in the collection
	Length() int64
}

StorageSystem describes the functions required to implement a dataset storage system. Currently two types of storage systems are supported -- pairtree and sql storage (via MySQL 8 and JSON columns) If the funcs describe are not supported by the storage system they must return a "Not Implemented" error value.

Directories

Path Synopsis
cmd
dataset command
dataset is a command line tool, Go package, shared library and Python package for working with JSON objects as collections on local disc.
dataset is a command line tool, Go package, shared library and Python package for working with JSON objects as collections on local disc.
datasetd command
datasetd implements a web service for working with dataset collections.
datasetd implements a web service for working with dataset collections.
dsquery command
dsquery.go is a command line program for working an dataset collections using the dataset v2 SQL store for the JSON documents (e.g.
dsquery.go is a command line program for working an dataset collections using the dataset v2 SQL store for the JSON documents (e.g.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL