Challenges in the Design of a Graph Database Benchmark

42.98 min. | 0 user rating | 232 views

Graph databases are one of the leading drivers in the emerging, highly heterogeneous landscape of database management systems for non-relational data management and processing. The recent interest and success of graph databases arises mainly from the growing interest in social media analysis and the exploration and mining of relationships in social media data. However, with a graph-based model as a very flexible underlying data model, a graph database can serve a large variety of scenarios from different domains such as travel planning, supply chain management and package routing. During the past months, many vendors have designed and implemented solutions to satisfy the need to efficiently store, manage and query graph data. However, the solutions are very diverse in terms of the supported graph data model, supported query languages, and APIs. With a growing number of vendors offering graph processing and graph management functionality, there is also an increased need to compare the solutions on a functional level as well as on a performance level with the help of benchmarks. Graph database benchmarking is a challenging task. Already existing graph database benchmarks are limited in their functionality and portability to different graph-based data models and different application domains. Existing benchmarks and the supported workloads are typically based on a proprietary query language and on a specific graph-based data model derived from the mathematical notion of a graph. The variety and lack of standardization with respect to the logical representation of graph data and the retrieval of graph data make it hard to define a portable graph database benchmark. In this talk, we present a proposal and design guideline for a graph database benchmark. Typically, a database benchmark consists of a synthetically generated data set of varying size and varying characteristics and a workload driver. In order to generate graph data sets, we present parameters from graph theory, which influence the characteristics of the generated graph data set. Following, the workload driver issues a set of queries against a well-defined interface of the graph database and gathers relevant performance numbers. We propose a set of performance measures to determine the response time behavior on different workloads and also initial suggestions for typical workloads in graph data scenarios. Our main objective of this session is to open the discussion on graph database benchmarking. We believe that there is a need for a common understanding of different workloads for graph processing from different domains and the definition of a common subset of core graph functionality in order to provide a general-purpose graph database benchmark. We encourage vendors to participate and to contribute with their domain-dependent knowledge and to define a graph database benchmark proposal. Talk by Marcus Paradies: Is a first-year Ph.D. student at the Database Technology Group at the Technische Universität Dresden. His advisor is Wolfgang Lehner. Before coming to Dresden, Marcus Paradies completed his diploma degree at the Ilmenau University of Technology. Marcus' thesis involved distributed entity matching on semistructured data sources co-advised by Susan Malaika, IBM Software Group and Jerome Simeon, IBM T.J. Watson Research. Hi current research focus is on graph processing and graph data management.
Tags: database Benchmark Fosdem 2012 2012 dev room dev room talks Graph Database Database Engineers

Related videos

Database Connector OverviewDuration:6.72 min.
User rating: 0
Views: 1123


Introduction to Graph DatabasesDuration:69.92 min.
User rating: 4.952941
Views: 10594


What is a Relational Database?Duration:5.17 min.
User rating: 4.72093
Views: 15064


Scaling the Web: Databases & NoSQLDuration:70.40 min.
User rating: 4.888889
Views: 4660


What is Benchmarking?Duration:1.30 min.
User rating: 0
Views: 316


NoSQL/Graph Database Visualization, The case of Gephi and Neo4jDuration:18.20 min.
User rating: 3.0
Views: 998


Alistair Jones - Introduction To Graph Databases - CodeKen 2011Duration:53.23 min.
User rating: 2.3333333
Views: 77


Technical Overview [PowerOLAP Basics #01]Duration:6.07 min.
User rating: 5.0
Views: 441


Model First Series - pt.1 - Creating a Model | Telerik OpenAccess ORMDuration:6.78 min.
User rating: 3.6666667
Views: 1410


i-teach-u databases - relating tablesDuration:7.62 min.
User rating: 5.0
Views: 2841


Model First Series - pt.2 - Updating a Model | Telerik OpenAccess ORMDuration:4.08 min.
User rating: 5.0
Views: 473


PHP Programming 38 MySQL Relational DatabaseDuration:8.98 min.
User rating: 5.0
Views: 5318


Cypher Query LanguageDuration:37.28 min.
User rating: 5.0
Views: 182


ET Math: How different could it be? - John Stillwell (SETI Talks)Duration:66.82 min.
User rating: 4.894737
Views: 3012


Caché BenchmarkDuration:1.77 min.
User rating: 5.0
Views: 1475


Apache Giraph: distributed graph processing in the cloud (2/2)Duration:18.05 min.
User rating: 0
Views: 131


Sylva, a new graph database based tool [short version]Duration:0.35 min.
User rating: 0
Views: 114


Works with persistent graphs using OrientDBDuration:31.72 min.
User rating: 5.0
Views: 346


Apache Giraph: distributed graph processing in the cloud (1/2)Duration:26.57 min.
User rating: 0
Views: 279


Birds of a feather Graph Processing devroomDuration:42.67 min.
User rating: 0
Views: 113


The New IBM Cognos InsightDuration:17.27 min.
User rating: 5.0
Views: 4604


Welcome to Graph Processing Dev RoomDuration:6.17 min.
User rating: 5.0
Views: 205



Recently Viewed

Recently Viewed by our visitors


Categories

Choose category