Guest blog post Bill Vorhies
Summary: What if I told you there’s a database in wide use today that does everything RDBMS and Hadoop can do but is 50 years old? Never heard of MUMPS? Check out these startling facts.
- MUMPS was born in 1966 to solve the problem of massive data flowing into multi-user systems in the healthcare industry.
- It predates RDBMS but has all the features of NoSQL including (in its modern form) massive parallel processing, horizontal scaling, and runs on commodity hardware.
- It easily models all four types of NoSQL DBs (key-value, column, document, graph). In the 70s and 80s it was modified to model RDBMS and handle SQL queries.
- It shares features with NewSQL in that it is fully ACID compliant.
- It’s alive and well today used in a wide variety of healthcare patient information systems, banking, the European Stock Exchange, and the travel industry among others.
If you’ve been to your doctor or to a hospital, or used an ATM it’s likely that the data was processed and stored in a MUMPS-based system. Despite the fact that 2016 represents its 50th anniversary the original design basics of MUMPS are still meeting commercial needs today and show little evidence of being displaced in healthcare or large financial institutions by either RDBMS or NoSQL. It would not be inaccurate to say that MUMPS is/was NoSQL long before ever becoming a gleam in the eye of Google researchers.
A Little Background.
Originally designed in 1966 and constantly updated over the years, MUMPS derives its name from Massachusetts General Hospital Utility Multi-Programming System) or alternatively M. As you should gather from the name the original need was driven by large hospitals (and ultimately banks) to drive high-throughput multi-user transaction processing. As RDBMS emerged (and ultimately NoSQL and NewSQL) MUMPS remained not only viable but superior in performance and capabilities that includes even today.
The original problem to be solved was how to receive, store, and process the wide array of tests and other variables being rapidly generated and collected on a single ICU patient in just one day. That would include at least 12 different variables including temperature, heart rate, blood oxy, blood pH, and others. The data generated via sensors (electrodes) measure many factors in real time, plus lab tests done multiple times per day per patient. On average the data needs to be accessed by about 20 doctors and medical staff for each patient and there are hundreds of thousands of patients.
The thing that particularly strikes me is how this resembles streaming data problems of IoT that we are only recently solving with Spark and Storm, but were solved perfectly adequately 40 and 50 years ago by MUMPS.
MUMPS by Any Other Name
The original copyrights on MUMPS expired about a decade ago. An improved successor version is actively marketed by InterSystems Corp. under the name Caché. A version known as GT.M is available for Linux under a Free Open Source license. Googling either of these names will be as efficient as looking under MUMPS. There was also a movement some years back to simply call it “M” and you will sometimes see it identified as MUMPS/M.
Is It a Data Base With Its Own Language or a Language With Its Own Data Base?
This begins to look like that old peanut butter in my chocolate or cholate in my peanut butter meme but it’s important for understanding why MUMPS is both efficient and successful. MUMPS is both things at once. Specifically it’s a database with an integrated language optimized for accessing and manipulating that database. Although the language has been criticized as archaic, modern users compare it favorably to Python.
Having a ‘built in’ database enables MUMPS high level access to storage not available in other programs or DBs. This access uses ‘variables’ (keys) and ‘arrays’ (tables) which are sparse. The default structure is key-value (though MUMPS can easily be scripted to work as document, columnar, graph, or even RDBMS) and has a modern parallel in JSON. The structure is schema-less and the data is stored in multidimensional hierarchical sparse arrays (also known as key-value nodes, sub-trees, or associative memory). Each array may have up to 32 subscripts, or dimensions. Holy cow Batman! Sounds like we just found the sacred headwaters of Hadoop.
A key to its speed and efficiency is that the database is accessed directly through the variables rather than queries or retrievals. A feature of the MUMPS language/DB is that accessing volatile memory and non-volatile storage use the same basic syntax, enabling a function to work on either local (volatile) or global (non-volatile) variables. Practically, this provides for extremely high performance data access. Michael Byrne, writing on Motherboard does a good job of explaining this.
"Variables (or keys, in this case) are just addresses of different memory locations within those arrays, which are called globals in MUMPS-speak. A MUMPS system, which might be made up of many computers, has its own collection of global arrays stored in non-volatile memory. So, unlike an array created in a language like C++, which exists only for the duration of the program or the program's existence within a computer's RAM address space, a MUMPS global sticks around on a server, accessible at any given time to a computer within the system. We say that it's persistent."
"The result of this is that a MUMPS programmer can tap a database directly rather than using a query. This is faster on its face, eliminating the query abstraction, but direct access also allows a bunch of alternative programming ideas. For one thing, as a programmer, I can take an item stored in one of those globals and give it "children," which might be some additional properties of that item. So, we wind up with lists of different things that can be described and added to in different ways on the fly. The relationships are hierarchical."
Who’s Using It Today
The MUMPS claim to fame is the Veterans Health Information Systems and Technology Architecture (VistA), which is a vast suite of some 80 different software modules supporting the largest medical system in the United States. It maintains the electronic health records for 8 million veterans used by some 180,000 medical personnel across 163 hospitals, over 800 clinics, and 135 nursing homes. It's considered a model for current efforts to create a nationwide medical health records network.
- Indian Health Service
- Major parts of the Department of Defense CHCS hospital system
Large healthcare companies currently using MUMPS include
- Care Centric
- Coventry Healthcare
- Partners HealthCare (including Massachusetts General Hospital)
- GE Healthcare (formerly IDX Systems and Centricity)
- Sunquest Information Systems
- Many reference laboratories such as DASA
- Quest Diagnostics
Among financial institutions
- Ameritrade, the largest online trading service in the US with over 12 billion transactions per day
- Bank of England
- Barclays Bank
In 2010, the European Space Agency selected MUMPS/Cache to support the Gaia mission to map the Milky Way with unprecedented precision.
MUMPS checks all the same boxes as NoSQL and is clearly very mature.
- Elastic horizontal scaling across multiple low-cost commodity servers.
- Designed to support Big Data quantities of data beyond the capabilities of RDBMS with extremely high performance.
- Extremely simple to administer requiring essentially no DBAs.
- Very low cost resulting from commodity hardware and open source code.
- Flexible data modeling able to easily duplicate the features of RDBMS, key value, columnar, document, and graph architectures.
- Readily supports advanced analytics and BI with SQL.
- Full ACID for OLTP.
- Not widely known. Small market share outside of healthcare and finance.
- Small programmer cadre.
- Pretty much no code libraries outside of what InterSystems (the commercial vendor) supplies.
Is this a Technology in Need of an Upgrade?
You might be tempted to think why not marry MUMPS with Hadoop? The fact is that MUMPS will scale and perform in all the ways that Hadoop will. Trying to bolt these together just seems unnecessarily complicated all for no appreciable gain in scale or performance. Plus the requirement to have technical people who understand both systems to keep them in synch. So no, MUMPS is fine just the way it is.
Is There an Opportunity Here?
On the one hand, would anyone choose to start a new project electing MUMPS instead of Hadoop or RDBMS? Probably not. There’s not enough awareness or for that matter enough programmers to go around.
However, if you’re working in healthcare or finance, especially where MUMPS is already in use consider this. A search of LinkedIn yields only 699 MUMPS developers and 77 Cache developers in all of LinkedIn. If you’ve already mastered NoSQL and are looking for a competitive edge this is a vanishingly small pool of competitors. Mastering MUMPS could easily leverage into good pay and job security. There’s a good MUMPS coding tutorial here.
About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist and commercial predictive modeler since 2001. He can be reached at: