A lot has been written on what constitutes a cloud based offering versus hosted solutions (aka on-demand) offerings versus on-premise offering. Why is this important in the age of big data? This is relevant since the three classes mentioned above of operational structure of software strongly influences the big data governance and practices. Lets consider each of the three classes below and lets start with on-premise offerings that were the main stay of software products available till recently.
On-premise software is sometimes also referred to as shrink-wrapped software. The idea for installing is simple, the IT/Operations team (and in many cases individuals) either download the software from a central server or use a CD to install the software. Once installation some configuration is needed and that is all. The vendor of the software or the corporation installing the software have very limited access to the data and meta data associated with the software. The information is spread out and the only way is to periodically back up the information in some central server where you hope it can be analyzed etc. The problems are just beginning. Some of the problems worth noting are - 1) privacy issues. 2) data is not factored for multi-user support, and 3) lack of any tools that can analyze siloed data. Hence trying to make use of this kind of data is pretty much a non-starter and is not recommended.
Lets now consider a close cousin of the on-premise offerings referred earlier as the on-demand solutions. Some smart people figured that on-premise solutions are not for everyone and so they rented data center space and hosted servers. Conceptually, software installed on one server could be used by one of their tenants. This was and is a business solution to the address lack of IT skills. It worked great, People loved the companies that offered such services and they are still in existence today and making money. Without taking any further detour on the business side of on-demand solutions, lets dive back into the data aspects of it. The problems like on-premise solution do not go away. To add to the concerns listed earlier the new issue that must be tacked are the legal problems since each "slice" of data is owned by a different tenant. Some people have tried to create a centralized location for some of the data for further analysis but success doing so is limited to certain use cases.
Now lets move on to the third option on the list-Cloud based solutions/offerings. What is a cloud based offering? It kind of looks like the on-demand solutions in many ways. You do not need to buy any software, you do not need to install it. You get a URL to login and use and its available any where you can get network connectivity much like the on-demand flavor. The biggest difference is in organization of data behind the curtain. Your data sits in the same schema as any other tenants; I will skip over the security and privacy guards implemented since its not relevant for this discussion. The biggest benefit that you get is that your data is is in a format that can be easily utilized for analysis by Business Intelligence tools. This is not a coincidence but due to conscientious design choices made by the vendor. And, why is that? Since data intelligence and data analytics add-ons can be a big revenue stream for the vendor. And if you are lucky and your vendor savvy enough, you might even benefit from network effect from all the other tenants of the cloud.
Having a cloud based spreadsheet (with superior collaborative and backup features) that can be used by your employees also gives you the option of choosing to run analytics on the types of spreadsheets your employees are working on and even compare your results with summary stats of the general user of the cloud based offering. Its kind of hard to do the same for either on-premise or on-demand offerings. And it even opens the door to some of the big data techniques that can enable a better understanding of your business.