The diagram below depicts the relationship between various elements of the technology stacks for Windows, Hadoop and deployment cloud architectures.
- If you have only Windows/.NET skilled developers and administrators, then you should go with HDInsight product using Windows Azure cloud.
- If you have Linux admins but Windows/.NET developers then you have the option of either having a private Linux cloud or use a service like Rackspace to host your Hadoop cluster.
- There is no one-size-fits-all approach here. You will need to look around your company and decide.
APIs and Drivers
- Microsoft .NET SDK for Hadoop. This kit provides .NET API access to aspects of HDInsight including HDFS, HCatalag, Oozie and Ambari, and also some Powershell scripts for cluster management. There are also libraries for MapReduce and LINQ to Hive. The latter is really interesting as it builds on the established technology for .NET developers to access most data sources to deliver the capabilities of Hadoop.
- Microsoft Hive ODBC Driver. This driver enables the connection from Hive to Excel.
- Microsoft .NET Map Reduce API For Hadoop Recently Microsoft released the .NET API to connect with Map/Reduce functionality of Hadoop Streaming.