lsa

module
v1.0.0-beta12 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 25, 2022 License: Apache-2.0

README

GoDoc Go Report Card Build Status

Layered Schemas

Layered Schema Architecture (LSA) enables semantic interoperability between heterogeneous systems. LSA uses a schema (such as FHIR schemas for health data) to define data structures, and overlays to add semantics, contextual metadata, and processing directives. A schema variant is the composition of the schema base and use-case specific overlays. Different schema variants can be used to ingest data from or export data to disparate systems. Each variant encodes source specific metadata and rules to ingest data into a knowledge graph, or target specific metadata and rules to translate the knowledge graph into another format.

A layered schema defines a mapping from structured data (in JSON, XML, tabular format) to a labeled property graph. For instance, when a JSON document is ingested using a layered schema, a labeled property graph is constructed by combining schema information with the input data elements. The resulting graph is self-contained, and includes both the input data and the schema. This allows processing data based on its semantic attributes instead of its attributes names or other structural properties. For example, it is possible to use layered schema to ingest health data, mark certain attributes as "identifiable", and then, remove all attributes that are marked as such. The algorithm to remove "identifiable" elements can be written without any knowledge of the input structure. Same algorithm can be used to de-identify health data as well as marketing data.

The main documentation site for layered schemas is:

https://layeredschemas.org

This Go module contains the reference implementation of the layered schema specification. It contains a layered schema processor to slice/compose schemas, import JSON schemas, and ingest data from JSON/CSV/XML sources.

Overview

A traditional schema (a JSON schema, an XML schema, etc.) defines one or more data types. Such schemas are primarily used for documentation, data validation, and automated code generation. A schema includes definitions about the shape of the data object such as the list of fields, nested fields, or constraints on field contents. A layered schema adds additional interchangeable layers to a traditional schema so the same schema can be adjusted for different use cases, or can be processed with different processing directives to handle data variations. The layered schema architecture provides the framework to extend traditional schemas using such layers.

Schemas, overlays, and bundles

A schema defines a data structure (attributes). A schema has a valueType which is the the type of data structure defined by the schema (e.g. Person).

An overlay defines semantics for attributes, and may override schema definitions. A Person schema can only be composed with Person overlays.

A schema variant is a schema that is the result of the composition of a schema with zero or more overlays.

A bundle defines the schema variants for multiple valueTypes. That is, a bundle specifies all the schemas and overlays for each type of object used in a particular use case. For instance, a bundle may specify that Person means the composition of a Person schema and an overlay containing terms from data privacy vocabulary. Data ingested using this bundle will have annotations using data privacy vocabulary terms.

Below is an example schema for a Person data type containing firstName and lastName.

person.schema.json:
{
  "@context": "https://lschema.org/ls.json",
  "@id": "https://example.org/Person/schema",
  "@type": "Schema",
  "valueType": "Person",
  "layer": {
    "@type": "Object",
    "@id": "https://example.org/Person",
    "attributes": {
      "firstName": {
        "@type": "Value"
      },
      "lastName": {
        "@type": "Value"
     }
   }
 }
}

The Person object is defined as an Object that contains two Value attributes: firstName and lastName. These are the unique attribute IDs. An overlay can be defined to add new semantics to these attributes. The following overlay adds personal data category Name to the lastName attribute:

person-dpv.overlay.json:
{
  "@context": "https://lschema.org/ls.json",
  "@id": "https://example.org/Person/dpv-ovl",
  "@type": "Overlay",
  "valueType": "Person",
  "layer": {
    "lastName": {
      "@type": "Value",
      "http://www.w3.org/ns/dpv#hasPersonalDataCategory": "http://www.w3.org/dpv#Name"
    }
  }
}

A bundle that defines Person by composing the person schema and the dpv overlay is:

{
    "typeNames" : {
        "Person": {
            "schema" : "person.schema.json",
            "overlays" : [
                {
                    "schema" : "person-dpv.overlay.json"
                }
            ]
        }
    }
}

Building

Once you clone the repository, you can build the schema compiler using the Go build system.

cd layers
go build

This will build the layers binary in the current directory.

Examples

The examples/ directory contains some example data processing scenarios.

Commercial Support

Commercial support for this library is available from Cloud Privacy Labs: support@cloudprivacylabs.com

Directories

Path Synopsis
examples
fhir command
cmd
pkg
csv
dot
ls
xml

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL