Thursday, May 29, 2014

6 questions to ask your NoSQL vendor

Many people I talk to are considering make the switch from traditional RDBMSes to NoSQL style data stores. There are many inherent differences between the two. And, if you are at the point of considering a NoSQL data store chances are that you have done your homework or are being pushed to this bridge by your management or customers. Hence I will make no attempt to present high level pros and cons of each type of data store.

Anyway, before you make the final leap, here are 6 questions that you must ask the NoSQL vendor before making your selection. All of the questions are related to consistency of data read from the data store. And it would not come as a surprise to anyone that data consistency is the single most critical element of any data base. And if you answered "Performance" chances are that you are grossly overestimating the data since most companies have no where near the amount of data to push any of the leading NoSQL offerings into the "red zone". 

Here is the list of questions-
  1. What is the probability of observing an accurate value a fixed amount of time, say t seconds, after a write occurs, termed as freshness confidence?
  2. What percentage of reads that observe a value other than what is expected, quantified as the percentage of unpredictable data?
  3. How much time is required for an updated value to be visible by all subsequent reads? This is termed inconsistency window.
  4. What is the probability of a read observing a value fresher than the previous read for a specific data item, termed monotonic read consistency?
  5. What is the mean age of a value read from the updated data item? This might be quantified in terms of versions or time.
  6. How different is the value of a data item from its actual value? For example, with a member with 1000 friends, a solution may return 998 friends for her whereas a different solution may return 20 friends. An application may prefer the first.

Benchmarking Correctness of Operations in Big Data Applications, Sumita Barahmand and Shahram Ghandeharizadeh, Database Laboratory Technical Report 2014-05, Computer Science Department, USC, Los Angeles, California 90089-0781.