Content Copyright © 2014 Bloor. All Rights Reserved.
This blog was originally posted under: The IM Blog
I recently posted a blog about an interview style webcast I was doing with Informatica on the uses and costs associated with data integration tools. I’m not sure that John Donne was right when he said that it was strange, let alone fatal, but somewhat surprisingly I have had a significant amount of feedback following this webinar. I say “surprisingly” because the truth is that I very rarely get direct feedback—most of it, I assume, goes to the vendor—so when a number of people comment to me that the research we conducted was both unique and valuable it’s a bit of a thrill (yes, I know, I’m easily pleased).
There were a number of questions that arose as a result of our discussions of which probably the most interesting was whether moving data into Hadoop (or some other NoSQL database) should be treated as a separate use case. We certainly didn’t include it as such in our original research. With hindsight I’m not sure that the answer I gave at the time was fully correct. I acknowledged that you certainly need some different functionality to integrate with a Hadoop environment and that some vendors have more comprehensive capabilities than others when it comes to Hadoop and the same also applies (but with different suppliers, when it comes to integrating with, say, MongoDB or Cassandra or graph databases). However, as I pointed out in my previous blog, functionality is ephemeral and just because a particular capability isn’t supported today it may be tomorrow. So that doesn’t really affect use cases.
However, where I was inadequate in my reply was that I only referenced Hadoop as a platform for data warehousing, stating that moving data into Hadoop was not essentially different from moving it into Oracle Exadata or Teradata or HP Vertica. And that’s true. What I forgot was the use of Hadoop as an archiving platform. As it happens we didn’t have an archiving use case in our survey either. Why not? Because archiving is essentially a form of data migration—you have some information lifecycle management and access and security issues that are relevant to archiving once it is in place but that is after the fact: the process of discovering and moving the data is exactly the same as with data migration. So: my bad.
Aside from that little caveat, I quite enjoyed the whole event. Somebody or other (there’s always one!) didn’t quite get how quantifying the number of end points in a data integration scenario was a surrogate measure for complexity (something we took into account) and so I had to explain that. Of course, it’s not perfect as a metric but it’s the only alternative to ask ‘eye of the beholder’ type questions which aren’t very satisfactory.
Anyway, if you want to listen to the whole thing you can find it at http://www.informatica.com/us/company/informatica-talks/?commid=130859&utm_source=Bloor&utm_medium=Blog&utm_campaign=14Q4-Wbr-NA-BTalk-DI-Bloor Research Nov5