At the Apache 2015 Big Data conference, hosted by the Linux Foundation, in Budapest I was well-impressed by the Apache OSS projects, which seem to generally run themselves quite well. Perhaps there are one or two exceptions, but the Apache "attic" should ensure that no IP is actually lost even from a less-than-successful project (or one that has just run its course), even if developers leave.
The "big data" noise is all around Hadoop, of course, but the story is a little more complicated than people often think - a good starter is Philip Howard's analysis here; with some further thoughts on Spark, which seems to be the current hot spot, here.
Philip seems to have covered all this rather well, but at Apache Big Data I was particularly excited by a new Hadoop-related database project, currently in the Apache "Incubator". Its Incubator, Apache says, is possibly its most important project. It's where software donations are checked for compliance with the Apache Software Foundation's legal standards and new communities are developed in accordance with "The Apache Way" and its guiding principles.
This new database project, Trafodion (which is Welsh for "transactions"), is offering SQL on Hadoop (with full ACID properties, which are sometimes compromised in new and "affordable" databases). This promises, amongst other things, to let you run both operational transaction processing workloads and "big data" analytics against the same Hadoop datastore environment. This sort of capability is where databases need to be going, as real-time analytics become more important and the old split between OLTP processing and the data warehouse, with its associated latencies, can't deliver the goods. The separate Data Warehouse tool isn't the result of the laws of physics, it is a technology fix, which we are beginning to need no longer, as technology overcomes its old limitations.
And why do I think that Trafodion might pull this trick off? Because its authors managed to persuade HP to release some of the impressive database technology IP largely associated with its extensions to the classic Tandem NonStop database product line, to OpenSource and Apache. Its old Wiki pages (not affiliated with Apache) are here.
Apart from Hadoop and its rich SQL implementation (with ACID support), a key feature of Trafodion is something that some supposedly industrial strength databases lack, a really first-class rule-based and cost-based optimiser. It also supports distributed transaction management, with 2-phase commit; and parallel data flow execution. Performance promises to be good; approaching that of Apache HBase and with linear scalability over a wide range of workloads. It will probably not compete with, say, Oracle (which doesn't have the overhead of Hadoop underneath) in absolute performance, but it confidently expects to compete in performance normalised by total cost, including hardware - large-scale high-performance Oracle transaction-processing systems aren't cheap.
Trafodion will be packaged commercially by Esgyn Corporation. This was spun out from HP and has offices in Silicon Valley in the USA and in Shanghai, China. There will be an Enterprise EsgynDB product including Apache Trafodion, with a 24x7 enterprise support subscription and consulting and support services. This is a database to keep an eye on, I think.