Comparison chart for ad-hoc query tools for interactive analysis of large data sets. They are all good and serve their special needs well. Having said that, Apache drill allows a user to deal with many different types of data sources (large sized) and hence obsoletes the need for expensive and error prone ETL. (Extract-Transform-Load) of data.
Please also see my earlier post on NoSQL database comparison.
Reference
Please also see my earlier post on NoSQL database comparison.
Apache Drill | Apache Hive | BigQuery | CitusDB | Hadapt | HAWQ | Impala | Phoenix | |
Owner | Community | Community | CitusData | Hadapt | Greenplum | Cloudera | Salesforce | |
Low-latency | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes |
Operational mode | On-premise | On-premise | Hosted, SaaS offering | On-premise | On-premise | Part of Pivotal HD appliance | On-premise | On-premise |
Data shapes | Nested, tabular | Nested, tabular | Nested, tabular | Nested, tabular | Tabular | Tabular | Tabular | Tabular |
Data sources | Extensible, incl. HDFS, HBase, Cassandra, MongoDB, RDBMS, etc. | HDFS, HBase | N/A | PostgreSQL, MongoDB, HDFS | HDFS/RDBMS | HDFS, HBase | HDFS, HBase | HDFS, HBase |
Hadoop dependent | No | Yes | No | No | Yes | No | Yes | Yes |
Schema | Optional | Required | Required | Required | Required | Required | Required | Required |
License | Apache 2.0 | Apache 2.0 | ToS/SLA | Commercial | Commercial | Commercial | Apache 2.0/Open Source | Proprietary |
Source code | Open | Open | Closed | Closed | Closed | Closed | Open | Open |
Query languages | Extensible, incl. SQL 2003, MongoQL, DSL, etc. | HiveQL | SQL subset | SQL | SQL subset | SQL subset | SQL/HiveQL subset | SQL subset |
Columnar storage | Yes | Possible | Yes | No | No | Yes | Yes | No |
Reference
- http://online.liebertpub.com/action/showPopup?citid=citart1&id=T1&doi=10.1089%2Fbig.2013.0011