Summary: In general, it is true that NOSQL databases can do everything that RDBMS can do. And almost always when data is ‘big’ they can do it faster and cheaper. There is one exception where you’ll need to pay close attention.
In a technical discussion we would launch into the details about how RDBMS have been designed around the principles of ACID, Atomicity, Consistency, Isolation, and Durability. NOSQL databases however have been designed around the principles of BASE, Basically Available, Soft State, Eventually Consistent.
We promised this would not get overly technical, so here’s what you need to know. NOSQL databases achieve their speed and low cost by spreading the data over many different servers and creating several copies of the data on different servers (also known as MPP, massive parallel processing). Your IT staff will specify how many replications of the data are to occur and typical numbers would be three or four. Your data (and its replicated copies) may be spread over those same three or four servers or across hundreds or even thousands of servers if your project is really big.
Here’s the core of the issue. When data arrives in the system the NOSQL controller decides where to store it first. There may be a momentary delay ranging from milliseconds to minutes depending on volume before the data is the same in all three or four duplicated locations. If one of your nodes is off line or requires manual intervention, the delay could be hours, but this would be rare.
So it is POSSIBLE, if not probable that in that millisecond-to-several-second window, if the data is queried the user MIGHT NOT see the same data in each location. In NOSQL terminology, this is ‘eventual consistency’ (though eventual can be a very short period of time). In the RDBMS by contrast, the data is stored only once and promises to be immediately and consistently correct.
The primary business cases where you are most likely to encounter a challenge are in financial transactions or inventory (stock availability).
Take the common example in which a deposit is made to fund A followed immediately by a transfer from fund A to fund B. Since NOSQL relies on distributing the processing these two separate transactions may arrive and take action on fund A and fund B at different times depending on which of the three or four copies is being read. The eventual consistency feature of NOSQL will ensure that all the points of storage agree, but there may be that lapse of a few milliseconds to a few minutes before all the reads from the system will agree. If your customer immediately looks up his balance or initiates a third transaction, it could conceivably show different balances depending on which node is read at exactly which instant.
Lest you think this is an abstract problem, here’s one you probably read about. Understanding the time lag of eventual consistency, a skilled hacker with an account balance of say $500 could write a program to debit his account for $500, and in fact send that command 20,000 times in a single second actually withdrawing $10,000,000. The account balance will eventually become consistent (perhaps only a few seconds or minutes later) and show the correct balance of ($9,999,500). This is exactly what happened, at web scale, to the cash that used to reside in the now defunct Bitcoin Exchange.
Remember though that the market is changing quickly and many players are stepping up to the need for atomicity and immediate consistency in their NOSQL offerings. Among NOSQL offerings MongoDB, MarkLogic, and Splice Machine claim immediate consistency. Among NewSQL offerings Aerospike, and FoundationDB are two who specifically claim to solve this problem. There are probably more in each category and it won’t be long before there are many more. If you’re dealing with financial transactions or anything else that sounds suspiciously like our example, be sure to probe this carefully.
Second, from a business perspective, this window of potential disagreement in the data is quite short. Common sense tells us that for most applications we will never tell a customer “sorry, I can’t take your money right now because our data isn’t consistent”. Take the order and if necessary correct and apologize later. Unless you have a specific financial exposure as in the example, NOSQL and NewSQL databases are delivering as promised.
July 23, 2014
Bill Vorhies, President & Chief Data Scientist – Data-Magnum - © 2014, all rights reserved.
About the author: Bill Vorhies is President & Chief Data Scientist of Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001. He can be reached at:
This original blog can be viewed at:
All nine lessons can be seen as a White Paper available at:
Originally posted on Data Science Central