76. FAST IN-MEMORY SQL ANALYTICS ON GRAPHS
Name: Chunbin Lin
Grad Year: 2017
We study a class of graph analytics SQL queries, which we call relationship queries. A relationship query performs joins, selections and semijoins (WHERE clause subqueries) over tables that correspond to entities and binary relationships (i.e., edges). Intuitively, it discovers target entities that are reachable from source entities specified by the query. Relationship queries also compute aggregated scores for the target entities by applying aggregation functions on measure attributes found on the target entities, the source entities and the paths from the sources to the targets. Relationship queries are a superset of graph reachability queries and of tree pattern queries. We present real-world OLAP scenarios, where efficient relationship queries are needed. However, row stores, column stores and graph databases are unacceptably slow in such OLAP scenarios. The FastR in-memory analytics engine utilizes a novel form of bottom-up fully pipelined query processing execution strategy. The plans run on a novel data organization that combines salient features of column-based organization, indexing and compression. Furthermore, FastR compiles its query plans into query-aware executable C++ source code. Besides achieving runtime efficiency, FastR also reduces main memory requirements because, unlike column databases, FastR selectively allows more dense forms of compression including heavy-weighted compressions, which do not support random access. We used FastR to accelerate queries for two OLAP dashboards in the biomedical field. The first dashboard runs queries on the PubMed dataset and the second one on the SemMedDB dataset. FastR outperforms Postgres by 2-4 orders of magnitude and MonetDB by 1-3 orders of magnitude, when all of them are running on RAM. Our experiments dissect the FastR advantage between (i) the novel FastR execution strategy, (ii) data structure choices and (iii) the use of compiled code. The provided analysis and experiments show the space savings of FastR due to appropriate use of compression methods. Finally, we show that as the number of CPU cores raises, the runtime penalty incurred by the dense compression methods of FastR decreases.
Industry Application Area(s)