Sunday, June 15, 2014

Not all data mining packages are created equal (Comparison of 6 major free tools)

For anyone looking to compare characteristics, pros and cons of the six most commonly used data mining free software tools, please refer to a great expert paper titled "An overview of free software tools for general data mining" by A. Jović, K. Brkićand N. Bogunović, Faculty of Electrical Engineering and Computing, University of Zagreb / Department of Electronics, Microelectronics,Computer and Intelligent Systems, Zagreb, Croatia. The six tools extensively covered in the paper are as following-
  • RapidMiner
  • R
  • Weka
  • Orange
  • scikit-learn
The paper contains a comparison of the implemented algorithms covering all areas of data mining, such as,
  • classification
  • regression
  • clustering
  • associative rules
  • feature selection
  • evaluation criteria
  • visualization
Also covered in the paper are advanced and specialized research topics, such as,
  • big data
  • data streams
  • text mining
In short it is a treasure trove of generally great information for either a novice or an expert who might just be wondering if he or she choose the right tool.

