Analyst Coverage: Philip Howard
Originally a NoSQL database was conceived to be a database that was not relational and did not support SQL. Practice has determined that people want SQL, so the “No” in NoSQL now stands for “not only”. Thus, a NoSQL database is one that does not store data in relational tables (either row-based or columnar). Unfortunately, this definition is not especially useful: it means that databases such as Adabas or IMS, not to mention object oriented and XML databases (just to give another couple of examples) are, technically, NoSQL databases.
Leaving aside this question it is perhaps better to define what NoSQL databases are, rather than what they are not. The database architectures that are commonly thought of as falling under the NoSQL banner are key-value stores, document stores and column stores (not be confused with column-based relational databases). In addition, Graph and RDF databases may be considered to be NoSQL databases but they have some very distinct characteristics that differentiate them from other NoSQL products and they are treated as a separate subject in their own right by Bloor Research. There are also hybrid databases that share features of different NoSQL models such as Document-Graph stores.
There are NoSQL databases targeted at batch-based analytics, real-time analytics and transaction processing so it is not possible to generalise about relevant use cases except to say that products aimed at transaction processing will be ACID compliant and the others won’t be. Also in this context many NoSQL databases are “eventually consistent”: this is not good enough for true OLTP environments.
One feature that characterises what most people think of as NoSQL databases (but excluding a number of graph database products) is that they are designed to scale out rather than up. That is, they run across distributed, usually low cost, clustered environments. If you want more capacity you add new nodes rather than expanding a particular server. Commonly this process involves sharding, which is the way that you distribute data across the nodes in the cluster in order to optimise performance. However, unless additional measures are provided, sharding is only useful if you know in advance how you are going to access the data. Note that some (column-based) relational databases also use sharding: these should not be confused with NoSQL databases.
Depending on the type of NoSQL database there are several potential benefits that can be derived from using them. The first is that they run on low cost hardware. However, the expertise and management costs of administering and programming a NoSQL deployment may exceed that of a conventional environment to the extent that any savings are more than swallowed up.
Secondly, NoSQL databases are better suited to storing a variety of structured and unstructured data types, that traditional relational environments have not supported in the past. However, the major relational vendors are adding things such as support for JSON documents so this advantage is shrinking and may eventually disappear.
Thirdly, many (not all) NoSQL databases are schema-free. This makes them significantly more flexible when it comes to adding new types of information: you don’t have to change the schema definition.
Finally, there are specific advantages related to NoSQL databases that address the transaction processing and real-time analytics markets. In the case of the former these are akin to NewSQL databases: true distributed capability, much smaller footprint, easier management and so on. With respect to real-time analytics, NoSQL databases simply do something that you can’t do with a conventional database. In practice, these sort of solutions can be thought of as low-end streaming analytics platforms. They won’t cope with the sort of volumes (millions of events per second) that true event streaming products but will do very nicely thank you, for tens of thousands of events per second, which is more than you could expect a comparably priced relational product to achieve.
Perhaps the biggest trend in the NoSQL space is towards Apache Spark and away from Hadoop and MapReduce (though Spark will run with the Hadoop distributed file system, among others). Spark provides significantly better performance (orders of magnitude) for some applications as well as supporting SQL, graph analytics and streaming analytics.
It has been suggested elsewhere that the term “NoSQL” will no longer be a differentiating factor by 2017, as the major database players add more capabilities to their own database products. While the NoSQL tag will no doubt remain, we agree with this view.
More generally we have now got to the stage where vendors are starting to disappear – either going out of business altogether, being acquired by larger companies either to leverage the technology internally or for incorporation into their own product stack.
Everybody and his uncle is playing in the NoSQL space in one way or another but we are already starting to see consolidation. For example, Apple acquired Acunu (Cassandra based) and Experian acquired 4Store (a graph database). In both cases these were for internal use only so these are just two examples of products that have now disappeared. We expect many more to follow.
The following companies offer solutions:
- Berkeley DB
- Lexis Nexis
- Splice Machine
- Tokyo Cabinet
Further resources to broaden your knowledge:
All things Hadoop
Discussing the Open Data Platform and Apache Spark
Big Data roundup
Some latest news from vendors in the Hadoop and/or big data space
Graph databases and NoSQL
The second in a series of five articles about graph databases
Problems with Hadoop
This is part of the series about big data, discussing various issues, and their resolution, around Hadoop
What is Hadoop?
Since some people seem to think that Hadoop equates to big data it is pertinent to discuss the former before talking about the latter.