Originally posted on Data Science Central
Click on the image for full view
Google recently replaced its AdWords MySql Database with a Database that they built in-house namely F1 Database. AdWords serves thousand of users, " which all share a database over 100TB serving up hundreds of thousands of requests per second, and runs SQL queries that scan tens of trillions of data rows per day," Google said.
After reading Google's paper on its F1 Database (not open source), I started thinking about its ramifications for Databases in general and Big Data in particular. Google F1 Database paper might trigger new initiatives that eventuate in materializing the phantom (next paragraph). The paper mentions few challenges with F1 DB that need to be addressed. I came away with two lingering issues. First, there is no mention of security. Secondly, it states, "Hide RPC latency, Buffer writes in client, send as one RPC". What will happen if the network connection between the client and the Database goes down? Will the data be lost? This is a serious problem for operations that need to commit as fast as possible; Airline reservation is one. I probably misunderstood.
The system resembles a hybrid between Relational and Hierarchical (think mainframe) Databases. What is the Holy Grail in the Database world? Relational Databases (RDBMS) are like high-rises comprising many apartments. What if there are no vacancies and people have lined up to rent from us. The way RDBMS has handled the demand is by adding more floors on top of the high-rise. It is expensive and slows down the day-to-day operations. A new technology (NoSql) emerged a few years ago and solved the space allocation problem. Instead of building new floors we place the tenants in inexpensive houses. Once we run out of vacant houses we give the tenants new houses. The downside? It makes managing the place more difficult and we might unwittingly reserve the same house for two different individuals. There are ways to prevent that, but it's a perplexing task and it places a lot of pressure on the engineers who design the housing complex. The Holy Grail is to discover a method by which we can combine the best of both worlds and remove the negative.
Following Google's invaluable tips in the paper, no doubt some engineers are working hard to figure out how to build an F1++ Database. What if they succeed? What will happen to NoSql and NewSql if they produce an open source Database System? The confluence of several forces that are currently shaping open source, Big Data, Mobile, and Cloud technologies might in time make NoSql and the existing NewSql irrelevant-- flash-aware applications, shared-nothing architecture, Mapreduce methods, software-defined storage, in-memory computing, shared virtual storage array networks, new compression algorithms, atomic writes, horizontal scalability, software-defined networking, columnar technology, progress in fault tolerance, database sharding, and solid state drives.
There is one very powerful force that in my view will keep NoSql alive and well for years to come and that is the power of developers. The genie is out of the bottle and all the nuclear fusion combined in the world cannot put it back in there. Speaking from personal experience as a Developer/DBA, I know that developers hate roadblocks. Once they start on something they like to continue working. To get them away from what they are deeply involved in is like taking a pacifier from a baby. For the first time in history, they can get on their generally free and open source bikes and run without the hassle of calling the DBA's to open the gates for them every 40 miles. NoSql pushed the Database inside the developers' world and they love it! Is it good for the industry? Perhaps not, but it might just create millions of programming jobs. After all, somebody has to untangle the convoluted code (not to the fault of developers) left behind. Separation of Database and code, as painful as it might be for developers is a necessity. It establishes checks and balances. According to Google’s paper, they have taken those factors into account. Google F1 is a developer friendly Database. Hopefully the trend will continue.
See Big Data Studio