Infoworks
Last Updated:
Analyst Coverage: Philip Howard
Infoworks was founded in 2014 and it has offices in both California and India. Head office is in the former, but the core engineering team is in the latter. The company is backed by venture capital. Unusually for a VC-backed company Infoworks was encouraged to work with its beta customers for a prolonged period (a year) before coming to market more generally in order to ensure the product was “enterprise-ready”. Infoworks partners with all the major Hadoop distributors and with cloud-based providers of big data platforms.
Infoworks
Last Updated: 13th February 2019
Mutable Award: Gold 2018
Infoworks is a big data automation platform. That is, it automates – requires no coding – all the steps required for Hadoop deployments from ingestion through to the generation of OLAP (online analytic processing) cubes and/or the creation of data science models. This describes the first use case at which Infoworks is targeted. However, there are also two subsets of this for which the product is suitable. The first of these is migration of all or part of an enterprise data warehouse, or data mart, onto Hadoop; and the second is creation and management of data lakes more generally.
Infoworks has multiple competitors that have overlapping capabilities with its product, but we are not aware of any other vendor that has the full range of functionality that Infoworks offers.
Customer Quotes
“Infoworks reduced our time to introduce new end to-end analytics models from 6 months to a couple of weeks, without IT involvement.”
Fortune 500 CPG Company
“With Infoworks we were able to complete our project plan for the entire year, in just a few days!”
major retail healthcare
Figure 1 illustrates an Infoworks workflow and it is easiest to describe how the product works by reference to this diagram. In this example, analytics are required that relate product sales to weather. Sales data is extracted from a source database via a crawler that runs on that source, accessing the database catalogue and leveraging a native connector built by Infoworks. A number of these are available and the company plans to introduce an SDK so you can develop your own connectors. Weather data is ingested from the Internet. Static data is loaded in parallel and in batch mode, but rapidly changing data can be loaded incrementally (via a schedule that you define) using change data capture. High speed merge capabilities are provided.
Having defined your source data and how you want to load that data, you select the Hive schema (in the case of a Hadoop environment) that you want this to be to be mapped to and define the location of the HDFS cluster. The software automatically normalises the data for you to match this schema. It also automatically parallelises the ingest process and handles the merge of change data onto the cluster (something that Hadoop does not do natively). You can then prepare the data and there is data profiling built into the product. Data blending and other transformations are defined by dragging and dropping widgets onto your canvas, creating what the company calls “pipelines”. An example of this is shown in Figure 2, where the grey boxes represent input data, the blue circles transformation widgets and the green boxes output data. If all you were doing was to migrate data from a warehouse into Hadoop, or if you were just creating a data lake, you might stop at this point.
To support business intelligence and analytics a further step is provided whereby you graphically create your fact and dimension tables, either for a star or snowflake schema. From this you can generate relevant OLAP cubes that can be visualised using your choice of front-end tools. ODBC and JDBC connectors are supported so that you can use Tableau, Qlik, MicroStrategy or any other appropriate tool. Alternatively, you may wish to create a predictive model rather than simply report on sales by weather. In this case you select an appropriate algorithm, again represented as a widget, and generate the appropriate data which, again, you can visualise in the tool of your choice. Infoworks supports the Spark ML library for these predictive models but you could also use others or add your own. Once development is completed, Infoworks has notification capabilities built into it so you can notify business analysts or data scientists when a new cube or model is available.
Finally, Infoworks provides automated facilities to push pipelines into test and then production with a mouse click. Orchestration and monitoring facilities are provided for the live environment. Fault tolerance is built-in and role-based security (by domain) is provided, along with integration options for products such as Apache Ranger.
Deploying a data work-flow into full production on Hadoop cluster is a non-trivial exercise. According to the Gartner Group only 15% of Hadoop deployments are in production and a large part of the reason for this is the complexity involved in moving from prototype into production. There are many moving parts (products) that are typically involved in setting up a data lake, and many of these parts do not work well with one another or, at least, are not as tightly integrated as one might like. What Infoworks does is to remove this complexity by providing a single, integrated and highly automated environment that spans everything from data ingestion through to feeding your favourite visualisation tool. Moreover, it does this without requiring any coding.
The Bottom Line
Infoworks overlaps with many other products from a lot of different parts of the market. Its key differentiators are that none of these potential rivals do all of the things that Infoworks does and that it does all that it does in a model-driven and automated fashion. While we know of many competitors to Infoworks we don’t of any other vendor that has the breadth of capability – or the automation – that Infoworks can provide.
Mutable Award: Gold 2018
Commentary
Coming soon.
Solutions
Research
Coming soon.