Subscribe to our Newsletter

Guest forum post by Mirko Krivanek

This is of course the wrong question. I use R because I'm familiar with it, more than SAS or Python. And I use R mostly for graphics / visualization. Though things have changed, I consider R mostly as a tool to perform ad-hoc analysis or EDA (exploratory data analysis) rather than a component of enterprise analytic applications / production code running in batch mode or accessed via API's. Is there an enterprise version of R? Also R used to be limited by the amount of RAM, not sure how easy it is to go around this limitation. RHadoop is R for Hadoop, I suppose that's a possible solution for big data, though I'm not familiar with the product.

Picture from Kunal Jain's blog

I used SAS a while back, and I know it has significantly improved over the last 10 years, including offering a better sort, hash tables, and very fast SAS for really big data. If your client uses SAS, SAS is a great option. You also get support with SAS, more than with R.

My favorite would be Python, but since I code my own applications (as opposed to working with a team), I still use Perl for its automated memory allocation, nice string processing features (though many languages do as good as Perl now with NLP and regular expressions), and high flexibility. Clear, scalable, transportable code is more important than the choice of the language. But I definitely like programming (and scripting) languages more than R or SAS, because I develop proprietary techniques and don't like black boxes (you never know when they don't work, what kind of data make them fail - not an issue if you write your own code). Also speed of execution (fast C versus relatively slow Perl, R or Python) is not a big issue anymore with big data, as most of the computer time is not spent in running algorithms (if the algorithms are well optimized)  but instead in data transfers.

There are also many other tools for data mining, for instance RapidMiner or Mahout (Java code for machine learning). What about Excel? I actually use Perl to summarize data (big data processing), R for graphics, and Excel as the top layer.

What about you?

You need to be a member of Hadoop360 to add comments!

Join Hadoop360

Email me when people reply –

Resources

Research