Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Schema inference for property graphs

Hanâ Lbath 1 Angela Bonifati 1 Russ Harmer 2
1 BD - Base de Données
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
2 PLUME - Preuves et Langages
LIP - Laboratoire de l'Informatique du Parallélisme
Abstract : Property graph instances are typically populated without defining a schema beforehand. Although this ensures great flexibility, the lack of a schema implies to miss opportunities for query optimization, data integration and analytics, to name a few. Since several graph instances exist prior to the schema definition, extracting the schema from those instances in a principled way might become a significant yet daunting task. In this paper, we present a novel end-to-end schema inference method for property graph schemas that tackles complex and nested property values, multi-labeled nodes and node hierarchies. Our method consists of three main steps, the first of which builds upon Cypher queries to extract the node and edge serialization of a property graph. The second step builds over a MapReduce type inference system, working on the serialized output thereby obtained during the first step. The third step analyzes subtypes and supertypes to infer node hierarchies. We describe our schema inference pipeline and its implementation, a labels-and a properties-oriented variant. Finally, we experimentally evaluate and compare the scalability and accuracy of our approaches on several real-life datasets. To the best of our knowledge, our work is the first to tackle the problem of schema inference for property graphs.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://hal.inria.fr/hal-02929153
Contributor : Russ Harmer <>
Submitted on : Thursday, September 3, 2020 - 11:14:23 AM
Last modification on : Saturday, September 5, 2020 - 3:32:46 AM

File

PGSchemaInferencePaper_BigData...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02929153, version 1

Citation

Hanâ Lbath, Angela Bonifati, Russ Harmer. Schema inference for property graphs. 2020. ⟨hal-02929153⟩

Share

Metrics

Record views

72

Files downloads

113