YarcData

Written By:
Published:
Content Copyright © 2012 Bloor. All Rights Reserved.
Also posted on: The IM Blog

This is the fourth of my series of articles about graph databases. Now I want to discuss uRiKA from YarcData, which is a spin-off from Cray (Yarc is Cray spelled backwards). As far as I know uRiKA is unique within the graph database space, primarily because of its focus on pattern-based recognition and analytics and secondly because of its ability to scale. As a comparison, uRiKA is to graph databases what Netezza was to data warehousing when it first appeared on the market: appliance-based, scalable (uRiKA more so) and focused solely on high-performance analytics. Again, unlike traditional relational databases, which are really only good at query processing when those queries can be predicted in advance, both Netezza and uRiKA were/are designed for ad hoc and complex analytics. The difference is that graph databases are schema-free and they model relationships directly, thereby allowing the user to pose arbitrary queries (in the case of uRiKA typically looking for patterns of relationships) without needing database optimisation.

Let me give you an example. Suppose you want to identify three or more people who are connected in some way (directly or indirectly) at least one of whom has rented or bought a truck, one of whom has bought fertiliser even though he doesn’t own or work on a farm, one of whom has visited a website dealing with bomb making, and one of whom has been seen visiting national monuments. Graph analytics allow you to search for this pattern in the sea of what otherwise might appear to be innocuous relationships that when identified form a plot. Then, once you have detected these persons of interest, you can graphically visualise the relationships between these people and things and search for more evidence of this possible plot. And, according to YarcData uRiKA can do this in (near) real-time, compared to the days or weeks that might be required using conventional methods.

Clearly, for this sort of analysis the more data you have the better (I discussed Metcalfe’s so-called law in a previous article) so scalability is of paramount importance as well as, of course, performance. So, uRiKA can be huge. The uRiKA product, which is delivered as an appliance, can support up to 512Tb (!) of memory shared across up to 8,192 processors, each running 128 active simultaneous threads. YarcData appliance’s hardware is proprietary, but the software on top is based on an industry standard software stack (Java, Apache, SPARQL and so on) so that common skill sets can be utilised. The company is also developing specific business solutions for the public sector, financial services and healthcare markets.

Finally, I should provide an insight into how uRiKA is actually being used. The Institute of Systems Biology is an early adopter of YarcData and aims to use the technology to look for new cancer treatments. Another, about which I can provide more detail, relates to one of YarcData’s customers, a leading healthcare organisation, which is using uRiKA to analyse patient’s healthcare profiles. 10 million of these patient records are being compared with one another including anonymised historical data spanning all events, symptoms, diseases, treatments, prescriptions, genetics and family history: in order to identify “similar patients” where similarity, and degree thereof, and under what conditions, forms the basis of the graph stored within uRiKA.

This graph is then used to support doctors and to provide information in real-time about what treatments were most successful on similar patients. As the doctors are consulting with patients live, they can consult the uRiKA system on iPads or other devices to get guidance and tweak their search parameters on the fly during the patient visit. The organisation’s goal is to deliver truly personalised medicine and to ensure the consistent selection of the most effective treatment for each patient by each doctor. Although it was not their mission, the solution should also reduce their risk (of not prescribing the wrong prescription or treatment).

This is a cool application. Of course the idea of it has been floating around in the ether for a long time but I don’t think anyone has successfully implemented it previously. And that’s probably because they didn’t have a graph database and didn’t have a system as powerful as uRiKA to run it. The product has only been generally available for a few weeks but I think it’s aptly named.