Zizo is a UK-based provider of analytics as a service. For those with long memories the company used to be called Data-Re, and it was originally founded back in 2002 before changing its name to Zizo in 2014. What's interesting about Zizo are a couple of things, the first of which is the database that it uses, which is the subject of this blog. A further discussion of what Zizo does will follow in a second blog.
Zizo uses a patented (in Europe and the United States) database known as a pattern database. As far as I know this is the only commercial implementation of a pattern database so I'd best describe how it works. Basically, suppose you want to insert a row of data into your database. What a pattern database does is to examine each field and, whenever it encounters a field that it does not currently have in its database - whether that is a company name, a town, an account value, an email address or whatever - then the database will create a "pattern" to represent that value. If it has seen that value before then the database creates a pointer to the existing pointer and increments a counter that records how many times that that pattern currently occurs within the database. With the possible exception of this last element, this is exactly the same way that tokenisation works, so if you are familiar with the use of tokens to support compression then, at a simple level, you can think of patterns as being equivalent to tokens. Of course, patterns do more than tokens so they are not really the same but you do get all the advantages of advanced compression. Zizo claims typical rates of 30 to 1 or as much as 100 to 1 for call detail records (CDRs).
There are actually several ways in which Zizo's approach differs from that of tokenisation. One (I'll come to another a little bit later) is that you do not need to reverse the tokenisation because Zizo transforms the query so that it runs against the pattern space. This makes a lot of sense: very much in line with taking the query to the data rather than vice versa.
Patterns may be compounded. Thus the row you have inserted consists of a series of patterns that, in effect, you can think of as being concatenated into a "root record". Perhaps more pertinently, queries equate to patterns. For example, "sales by region" would equate to the compound pattern consisting of the sales pattern and the region pattern. It's not hard to see that answering queries of this type - and more complex queries - can be handled very efficiently in a pattern database.
Zizo operates in-memory so there are considerations about total memory size. The fact that the database is pattern-based reduces storage requirements and the compression available does the same. Nevertheless, memory is expensive and you do not want to have to pay for excessive amounts of it so, to optimise memory usage Zizo constantly monitors the patterns that are held in memory. This monitoring not only does conventional things like monitoring how frequently a particular query is run but it also takes into account the complexity of queries and their resulting patterns, as well as how expensive it would be to rebuild that pattern if it was to be rolled out of memory. Thus the software optimises the use of the patterns in memory. This is a second way in which Zizo's pattern database differs from tokenisation in that Zizo is dynamic and adapts to the way that users access the database, while conventional tokenisation is static.
Anyway, enough about the database: this should have given you at least a flavour of how it works. However, I should add that while there is a proprietary environment provided by Zizo for accessing the database, the environment actually supports SQL so you can use your tool of choice for doing so, if you want to. Moreover, the query engine, while it works with patterns under the covers, will transform results in rows and columns so that the environment looks relational even if it is not.
In my next blog I will discuss Zizo's offering more generally but the database is the secret sauce behind the analytics that Zizo is providing, hence my starting with the database.