Windows offers the largest and richest set of enterprise-wide end-user business tools.These tools can be used for analysis, visualization and for slicing and dicing the underlying data. As more and more enterprises embrace big data and Hadoop, the need for interoperability between the two is also growing. As you may note that Hadoop is from the Linux stable. And it should not come as a surprise to anyone that it can be daunting to inter-operate the two together. This post presents some basics of your interoperability options.
The diagram below depicts the relationship between various elements of the technology stacks for Windows, Hadoop and deployment cloud architectures.
The diagram below depicts the relationship between various elements of the technology stacks for Windows, Hadoop and deployment cloud architectures.
Quick takeaways
- If you have only Windows/.NET skilled developers and administrators, then you should go with HDInsight product using Windows Azure cloud.
- If you have Linux admins but Windows/.NET developers then you have the option of either having a private Linux cloud or use a service like Rackspace to host your Hadoop cluster.
- There is no one-size-fits-all approach here. You will need to look around your company and decide.
Source: Microsoft.com |
Interoperability Options
APIs and Drivers
- Microsoft .NET SDK for Hadoop. This kit provides .NET API access to aspects of HDInsight including HDFS, HCatalag, Oozie and Ambari, and also some Powershell scripts for cluster management. There are also libraries for MapReduce and LINQ to Hive. The latter is really interesting as it builds on the established technology for .NET developers to access most data sources to deliver the capabilities of Hadoop.
- Microsoft Hive ODBC Driver. This driver enables the connection from Hive to Excel.
- Microsoft .NET Map Reduce API For Hadoop Recently Microsoft released the .NET API to connect with Map/Reduce functionality of Hadoop Streaming.