Guest blog post by John Fairweather
In following the big data 'buzz' and trends, it appears that there is a disconnect between our analytical goals (i.e., the types of questions our customers are trying to answer) and the computational substrate on which we build in order to answer them.
NoSQL technologies, while being far more scaleable than relational databases, are fundamentally a 'data level' (DL) technology, that is they are at heart document based. Relational databases are an 'information level' (IL) technology capable of answering somewhat more complex questions. The problem is that most of the questions we now seek to answer are 'knowledge level' (KL), that is largely based on the myriad connections between people and things, and less on record content or text search. Relational databases are bad at handling such questions, NoSQL technologies are worse (see here).
It seems that we should be moving upward in the knowledge pyramid (see here) in terms of the technology underpinnings we use. We need to move not from an IL (i.e., relational) database model to a DL (i.e., document model) technology, but instead to a massively scaleable knowledge level (i.e., connection based) database.
I have been actively developing just such a system (see blog here) for over twenty years, targeted at what we now call 'big data' problems. Over that time we built federated solutions to answer massively scaled knowledge level problems by combining relational and other IL/DL technologies. In the end, all such combinations failed to scale due to the shortcomings on the underlying technologies when applied to KL questions. It became necessary to re-examine the very underpinnings we use to build integrated systems. A fully integrated and uniform ontology-driven (see here) KL substrate, all the way from the database to the GUI, was needed to achieve the simultaneous goals of scaleability, adaptability, and knowledge level operation.
Our software systems and the analytics they provide must in the end be designed to facilitation the organization decision cycle or OODA loop (see here), if they do not, they will not provide the help needed in a timely manner. To close such a loop requires KL technology in the 'Orient' stage and Wisdom Level (WL) or human-in-the-loop in the 'Decide' stage. The image to the left shows the knowledge levels required at each stage from an integrated system designed to support any full organization OODA loop. Just as importantly the systems must be rapidly adaptive themselves in the face of inevitable change in the environment and in the data they contain (see discussion here) otherwise they will clog up the organization OODA loop and thus fall into obsolescence.
It should be a fundamental first step in any software development to analyze the OODA loop it is designed to support and the knowledge level of the questions that will be asked at each step in the cycle(s). This in turn leads to an understanding of the capabilities of the underlying data technologies appropriate to build a solution. From what I can see, our data scientists are still not doing this. Analytics front ends with elaborate graphs and visualizers can got only so far to overcome the shortcomings of a limited data architecture.
I know this is heresy, but I've been in the 'big data' game for over two decades so I may know of what I speak. For more of my heretical thoughts on 'big data' and big data analytics, see: http://mitosystems.blogspot.com.