orientdb/

directory
v0.0.0-...-c5cfbba Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 18, 2025 License: Apache-2.0

README

This repo contains code for trying to load 21M dataset into OrientDB. We try to convert the data into JSON format and load it. We'll need one file for each entity e.g. Film, Director, Starring, Genre, Country etc.

Preparing JSON files

To prepare the JSON files, we need a list of ids for the entity and need to know what edges we need to include in JSON. For e.g. the xids in sorted_files.txt were obtained from 21M rdf dataset and all have a starring edge, hence are films.

These can be supplied to the transform binary to get the relevant JSON.

./transform -data ~/Downloads/21million.rdf.gz -properties initial_release_date,name -input sorted_films.txt -edges genre,starring,country

This would generate a output.json file for films (sample file in data/output.json). Similar files have to be generated for other entities.

Loading the data into OrientDB

OrientDB ETL tool supports loading data in JSON format. It can be run with a config file similar to config.json.

Querying the data

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL