What is a temporal database?
A temporal database is a database with built-in support for handling data involving time.
This definition is from Wikipedia. It is simple and straightforward. Since we expect every database to have some kind of support for time operations, we could say that all databases are temporal databases based on this definition.
The reality is a lot more subtle and complex. Temporal databases enable you to see the data as it was seen in the past, while also enabling you to update even the past in the future. A temporal database will show you the actual value back then, as it was known back then, and the actual value back then, as it is known now. A temporal database allows you to know what your organization was forecasting for the future at a certain time in the past. Temporal databases support a multi-faceted view of time that more closely mirrors real life.
Why does anyone need a temporal database?
Most organizations manage data that changes over time. If you are in a regulated industry that does not allow you to physically delete data, you must maintain a temporal database that considers when a record is/was/will be valid (valid time), in addition to the dates on which the data was edited and the data itself (transaction time). If you are required to show what your data looked like on a certain date – to identify what information your organization possessed at a certain time in response to an investigation, audit or legal inquiry – you may not be able to do this without a temporal database. A very good description of this concept – transaction time vs. valid time – is available here. A temporal database records facts about when the data was valid (valid time) as well as the data itself (transaction time), and provides structured queries to interact with both aspects via bitemporal tables involving both concepts of time. This concept is very useful when data changes over time.
An even more utilitarian example that every organization can relate to is the storage of customer addresses. For a lot of applications it is acceptable to store only your current address – your previous addresses aren’t needed. However, there are many organizations that need to track historical data for customer service, financial, security and audit purposes. This is a classic case of requiring two concepts of time. When did you provide a certain address (transaction time) and over what period did you say that you lived there (valid time). Many companies (and even government agencies) have issues tracking your previous data after you change your address, your name, or make corrections to personal data. To avoid a true temporal implementation, many companies store the latest data in one table, and all other historical data is stored in different tables, or even in other systems. Although the information is there somewhere, they lack the tools to properly connect the data.
What if I don’t implement a temporal database?
Richard Snodgrass is a member of the Department of Computer Science at the University of Arizona and is a thought leader in the temporal database space. His book, Developing Time-Oriented Database Applications in SQL , provides a very good example of the consequences of a failure to implement a time-varying database. In June, 1997, Hudson Foods had to recall 25 million pounds of beef in an attempt to stem an outbreak of E. coli, at that time the largest recall of food in the United States. Back then, the product that was recalled was valued at $25 million – around one fifth of Hudson’s annual production. Hudson knew that only a much smaller portion of the beef needed to be recalled based on a specific supplier. However, because their database only supported a current version of the data – which beef came from which sources today – and didn’t show a view of the data as it existed on the day the processing occurred several weeks prior – the entire product set had to be recalled. In short, the absence of an appropriate time-varying database cost Hudson Foods 20% of their annual production.
The challenges of time in a regular database
With the development of SQL and its use in real-life applications, database users realized that when they added date columns to key fields, some issues arose. Adding temporal considerations to traditional databases created a lot of complexity. For example, if a table has a primary key and some attributes, adding a date to the primary key to track historical changes can lead to creation of more rows than intended. Deletes must also be handled differently when rows are tracked in this way.
Time has remained one of the neglected areas in the SQL standard. For decades the database vendors did not agree on a de facto standard for defining time-based tables and their corresponding operations and tools. Thus, every vendor implemented its own libraries, used its own semantics, and applied data type concepts that may not be mutually compatible.
In 1992, this issue was recognized but standard database theory was not yet up to resolving this issue, and neither was the then-newly formalized SQL-92 standard . In 1992, Richard Snodgrass presented a proposal to the ISO/IEC committee: an extension to SQL-92 for temporal databases. The committee developed these extensions, named TSQL2, in 1993. After many revisions, a definitive version was published in 1994.
There was an attempt to incorporate a subset of TSQL2 into the SQL:1999 standard, a subset named SQL/Temporal. TSQL2 was heavily criticized and was not incorporated. One idea proposed was the inclusion of an OVERLAPS predicate that would test if two time periods match or overlap at least in one point of time. For example, something like “(DATE ’2014-01-01′, DATE ’2014-01-03′) OVERLAPS (DATE ’2014-01-02′ DATE ’2014-01-04′)” would yield true. OVERLAPS never made it into the language.
A new Standard – SQL:2011 – Not Widely Supported Yet
SQL:2011 was adopted in December of 2011, and it finally included clauses for the definition of temporal tables as part of the SQL/Foundation: valid time table, transactional time table, and bitemporal table. There were many new features included for temporal databases, which finally opened the way for standard support from the major database vendors.
The problem is that as of today – August 2014 – it is not yet widely adopted. Only a few vendors support SQL:2011, and only in their most recent versions. If you want to use a real temporal database, then you will have to use the latest version provided by your database vendor, or a database extension, or a special-purpose temporal database. However, if that is not possible, you can incorporate temporal considerations into your current database.
Not every database requires a temporal database implementation. But some do. Adding temporal considerations adds complexity in the database structure. There are many cases where it is OK to keep only the latest valid data.
However, when designing a database, it is always good to ask yourself the following question: for what tables might I need to go back in time if I have issues? You don’t want to find out the answer when it is too late.
Our next article in this series will explain how you can make changes to the design of your existing tables in order to make your database (or a least a subset of your database) a temporal database.
Read the next article in the series -> Temporal Databases: Why you should care and how to get started (Part 2 of 3)
Authors: Abel Morelos, Angela Holmes
Originally posted on Data Science Central