BlinkDB - query engine with bounded response times and errors
BlinkDB is a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. It allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. To achieve this, BlinkDB uses two key ideas:
(1) An adaptive optimization framework that builds and maintains a set of multi-dimensional samples from original data over time, and
(2) A dynamic sample selection strategy that selects an appropriately sized sample based on a query’s accuracy and/or response time requirements.
We have evaluated BlinkDB on the well-known TPC-H benchmarks, a real-world analytic workload derived from Conviva Inc. and are in the process of deploying it at Facebook Inc. (Source: BlinkDB homepage)
Statistical Error Convergence
This may be the way to deal with ever mounting data piles when speed is of essence and the results need to be within a statistical error band. For life saving and other important types of query results this may not be the best option.