6 provocative questions for the big data enthusiast

Recently read this wonderful essay (titled: "CRITICAL QUESTIONS FOR BIG DATA") that raises some serious and provocative questions. I will like to encourage the reader to first form an opinion on the following 6 questions before clicking on this link for a more thoughtful and nuanced discussion on each.

  1. Big Data changes the definition of knowledge
  2. Claims to objectivity and accuracy are misleading
  3. Bigger data are not always better data
  4. Taken out of context, Big Data loses its meaning
  5. Just because it is accessible does not make it ethical
  6. Limited access to Big Data creates new digital divides

Each one of the items above can be a gateway to the wonderful and occasionally frustrating world of big data.

Following is my take on the 6 questions above
  1. Big Data changes the definition of knowledge - Big data can be used for many purposes but I do not believe that big data is knowledge or will be any time soon. Causation can only be mistaken for co-relation by a newbie.
  2. Claims to objectivity and accuracy are misleading - Extrapolating the data in a set to represent a general population is fraught with dangers and often misleading. But, if the data gathering parameters are well know then the big data results can be reliable.
  3. Bigger data are not always better data - This is true to an extent. If the noise to signal ratio is too high then it creates more work and in some cases it can be nearly impossible to separate the too.
  4. Taken out of context, Big Data loses its meaning - Totally true. Context is everything. Its true of marketing data as well as data related to drug discovery or adverse impact analysis.
  5. Just because it is accessible does not make it ethical - If you cannot draw the lineage of the data (very clearly) do not use the data. And check with the local laws. European laws are much stricter than their US counterpart laws.
  6. Limited access to Big Data creates new digital divides - Not true. If its your data you get to use it. If you can purchase data you get to use it. Much like a car or a bike. I do not feel that the "data rich" need to share their data with the "have not"s. Case can be made for open sharing of data for academic and non commercial purposes.
Happy New Year to all!