Guest blog post by Chris Towers
Hadoop has been the foundation for data programmes since Big Data hit the big time. It has been the launching point for data programmes for almost every company who is serious about their data offerings.
However, as we predicted we are seeing that the rise in in-memory databases has seen the need for companies to adopt frameworks that harness this power effectively.
It was therefore no surprise that Apache have launched Spark, a new framework that utilizes in-memory primitives to deliver performance around 100 times faster than Hadoop’s two-stage disk-based version.
This kind of product has become increasingly important as we move forward into a world where the amount and speed of data has been increasing exponentially.
So is Spark going to be the Hadoop beater that it seems to be?
This kind of technology that allows us to make decisions quicker and with increased amounts of data is going to be something that companies are clamouring for.
It is not simply in principle that this platform will be bringing about change either. As an open source platform, it has the most developers working on it across every Apache product.
This suggests that people support the idea through their willingness to dedicate their time to it. It is common knowledge that many of the data scientists working on Apache products are the same ones who will be using it in their day-to-day roles at different companies, which could suggest that they are going to adopt this system in the future.
One of the main reasons for the success of Hadoop in the last few years has been not only due to its ease of use, but also that companies can get it for nothing. This is because you can run the basics of Hadoop on a regular system and will only need to upgrade when they ramp up their data programmes.
Spark runs on-memory systems which requires a system with high performance, something that companies new to data initiatives are unlikely to invest in.
So which is it more likely to be?
In my opinion, Hadoop will always be the foundation of data programmes and with more companies looking at adopting it as the basis for their implementations, this is unlikely to change.
Spark may well become the upgrade that companies who move to a stage where they want, or need, improved performance will adopt. As Spark can work alongside Hadoop this seems to have also been in the minds of the guys at Apache when coming up with the product in the first place.
Therefore, it is unlikely to be a Hadoop beater, but will instead become more like its big brother. It is capable of doing more, but at increased cost and only necessary for certain data volumes and velocities, is not going to be a replacement.