Guest blog post by Ashu Kumar
Let's get over the hype. Unless you are a data driven giant like Amazon or Yahoo, there is no need to spend millions to dig and fill a big data lake. You might be better off with data junk yard strategy. It will let you start early and cheap so you don’t miss the big data bus. It may not sound impressive but play it right and a junk yard will end up a gold mine with an optimum balance of cost, benefits and speed. Here is how it makes sense:
Hadoop, the core of Big Data platform, promises to acquire and store anything digital at a low cost hardware with minimum efforts. This was not possible before with traditional data platforms. They sit on expensive hardware and demand substantial skilled resources to organize the data before loading. Hadoop allows to ingest and load without asking for transformation in advance. It’s cheap to store and you don’t have to clean it unless you need it. A case in point: A data junk yard.
It’s not all about the Use Case
No need to overthink. For most cases you will come up with, there will always be an argument that it could be done in your traditional darling databases. Use cases that will get traction be either around large public datasets or about complicated forms of unstructured data. Both will require some hands-on research as well as use and throw flexibility. No point in investing too much time and efforts on use cases. Bring the data sets with some perceived value and invest in data discovery. What's junk and what's gold will emerge in the process. Ingest. Digest. Divest.
The yard sale is on
There is a host of public data that might be valuable but it’s difficult to assess the real value without having a capability of playing around with it or explore and analyze it with the internal data sets. It's like bringing home a side table from yard sale and see if it fits with your couch. Also, the public and social data is like water flowing through the river. Most of it is free for now. If you do not capture it in time, it will flow past you. Investing in big data is about future and not present. It’s 401 K.
The key is to start early and keep steady with the purpose of building reservoir of data that has a perceived value. Who cares what you call it. It could be a lake, reservoir, a winery or a junk yard. The bottom line is to capitalize on the two basic big data capabilities: an inexpensive way to store any digital form of data. I am sure you will find some gold nuggets in that junk yard.
Good Luck and Happy Hoarding!