Tuesday, June 4, 2013

Top 3 things to consider while deploying Hadoop Cluster

You have been tasked to setup a Hadoop cluster. What are the three things you need to focus on?

Size

Having a top of the line server might seem like the smart thing to do but it is not. You want to go with the "regular" server config that optimizes investment.  If you ever outgrow the server you can always add another server. Plus your finance department will love you for being a judicious user of company money.

Consistency

Try to keep all the servers in the cluster as close as possible to each other in all aspects-hardware  and software. Hence if you have standardized on a  64 bit CentOS based server with 32 GB memory and dual quad-cores and RAID 1+0 ensure that all your servers meet the same requirement. Some minor changes are OK since the hardware manufacturers keep changing server specs a couple of times a year.

Monitor Everything

If you have followed #2 above, you will find that monitoring the server cluster will be so much easier. Trying to monitor servers with different config is hard and especially at 2 AM in the morning when the alert for a disk that is having bad sector error suddenly pops up on the Nagios screen, your NOC will love you for keeping their run-book simple since the disk sizes and replacement procedures are the same.