InfluxData is a VC-backed company with offices in San Francisco and Austin, and a sales office in London that covers Europe. It is the software company behind InfluxDB, the leading, open source time series database targeted at “metrics and events”. Although the company was only founded in April 2013, it has already made a big impact, with more than 600 customers using its products over 350,000 database instances. These customers include a variety of big names such as Cisco, Tesla, Siemens, PayPal, Salesforce.com and IBM. ThingWorx is a notable partner in the Internet of Things (IoT) space. The company is based in San Francisco and is backed by venture capital.
The product is available both on premises and in the cloud, with InfluxDB Cloud 2.0 representing a major recent release, including a shift to usage-based pricing. The database is available on AWS and Google Cloud Platform. The core functionality for each InfluxData product is open source. However, a variety of advanced features are only available via the closed source, proprietary offerings provided by InfluxData. For example, clustering and high availability are popular features that are only available via a commercial offering. There is a rate-limited free version of the software available as a Platform as a Service.
Last Updated: 28th February 2020
Mutable Award: Gold 2019
InfluxDB is a time series database that has been designed that way, as opposed to a relational (or other) database that supports time series. It supports schema on write, which has significant advantages in some IoT environments. Its architecture is illustrated in Figure 1, where Telegraf is a collection agent (over 200 plugins are available) for metrics and event data that powers InfluxDB. For the purposes of IoT, Telegraf is often run as close to the edge as possible, as it is very lightweight.
In addition to Telegraf two further elements are Kapacitor, which is a real-time streaming engine, and Chronograf, which provides a graphical, front-end interface providing monitoring and dashboard capability. In InfluxDB 1.0 these existed outside the main platform, with their own APIs. However, in 2.0 they are incorporated within the architecture. Nevertheless, they are still separate and independent components of the stack. This means that they can scale separately, as required, while queries running on InfluxDB won’t impact on the performance of Kapacitor, and vice versa.
“If we hadn’t adopted InfluxData, we wouldn’t have been able to scale to the capacity or requirements of customers we have today. Running 900 nodes across 30-node clusters, Elasticsearch would have been extremely painful. We probably would have lost business.”
“The name of the game is all about how fast we can build high-quality software in production. I don’t want to be worried about building my own time series database. I don’t want to even be worried about managing my own time series database… By using InfluxDB Cloud, we’ve got that whole responsibility lifted off our shoulders.”
InfluxDB uses in-memory indexing along with time-structured merge trees. The latter includes a write-ahead log and read-only files that contain sorted, compressed time-series data. Spill to disk is also available in the event that memory is not adequate.
The product employs a schema-on-read approach and supports both regular and irregular time-series down to nanosecond precision. All data is stored though we would like to see an option to refrain from storing unchanging data or data only changing within (user-defined) tolerance levels.
The environment supports user defined functions that can be written in a variety of languages including Go and Python, and there are SDKs for Java, Scala and R, and support for Jupyter Notebooks. Historically, the company has also offered InfluxQL, which is SQL-like and TICKScript, which is used by Kapacitor. In its latest release the company has introduced Flux, which is a superset of these two. Flux is extensible; a significant feature is that it allows queries to access external data sources to pull in contextual information about, say, devices. From a machine learning perspective, the company focuses on providing a plug-in framework that integrates with third-party platforms such as TensorFlow, and anomaly detection tools. While there are significant capabilities for storing, indexing and manipulating time-series data, as one would expect, the product is relatively weak when it comes to geo-spatial data, which will limit the company’s ability to address some IoT use cases. We understand that the company is looking at how it can address this issue.
The graphical user interface supports query building and visualisation, dashboarding, and alerting. It allows you to quickly create queries (as seen in Figure 2) and use these to create real-time visualisations and dashboards. It also provides a variety of prebuilt dashboards that are ready to go out of the box. A rules engine is available, allowing you to build rules via the same interface as the query builder, before leveraging these to set up alerts or other actions based on the outcome of those rules. The latter case is particularly interesting, allowing you to, for example, automatically scale your cloud deployments based on a variety of metrics and statistics (such as, in a microservices environment, elastically scaling the number of containers based on the number of application requests, something which is not easy to do using Kubernetes).
More generally, the platform is designed to be a tool for the developer, who can either use the tools provided for that purpose or some other tool such as Grafana. However, there is currently no support for ODBC or JDBC connectivity – the company is working on these – so you cannot use something like Tableau at present. On the other hand, there are various other connectivity options available such as integration with Kafka and support for Slack (widely used by developers), amongst others.
There are two particularly prominent use cases for InfluxDB. The first is IoT, where, it is used to support the processing of large quantities of sensor data in real-time. The second is in operations (including DevOps) and data centre environments, where you wish to monitor and analyse your infrastructure and/or business processes, also in real-time, and then take actions based on the results. The bulk of the company’s users are in the latter category though IoT use cases are increasing.
More generally, time series databases are an intuitively good match for stream processing and analytics. Considering that both time series databases and stream processing are designed to deal with large volumes of time-based events and time stamped data, combining the two is a natural fit. For example, all IoT based data is time dependent. But it is also often the case that you want to be able to monitor and react to trend-based information, combining historic with real-time data. Many stream-based environments either do not support this or only do so via disparate products that were not designed from the outset to work together.
The Bottom Line
The market for pure-play time-series databases is relatively new but there is growing interest in it. InfluxData has, justifiably, established itself as the leader in this emerging market.
Mutable Award: Gold 2019