Streamlio as a whole, as well as Heron, Pulsar and BookKeeper in particular, is designed to provide high performance, at scale, while maintaining a light footprint. To this end, the entire platform uses a fully distributed micro-services architecture. In addition, the platform is designed to be multi-tenant throughout, in order to serve multiple teams, jobs and applications from a single Streamlio instance.
The stream processing itself is truly event-based (and emphatically not micro-batch). Moreover, Streamlio offers two distinct methods for processing streaming data. First of all, there are Pulsar stream-native functions. These are relatively simple, but lightweight functions that are managed by Pulsar. Secondly, there is Heron. Heron provides fully-fledged real-time processing, and although it is not as easy to set up or use as Pulsar functions, it offers much more complex processing functions, including multi-stage processing. Heron also features resource isolation, an extensible stream engine (which includes support for other technologies, such as Apache Storm, Apache Beam, and so on, via APIs), and intelligent self-regulation (that provides, for example, automatic flow control).
Streamlio’s messaging and queuing system is based on Pulsar. Its major features include the ability to handle many operational management issues during messaging (thereby removing the need for streaming applications to do the same), built-in support for geographically distributed apps via geo-replication, guaranteed, asynchronous data protection and replication, and a unified messaging model that supports both queuing and publish-and-subscribe semantics within a single model. It is compatible with Kafka, and supports Java, C++, Python and WebSocket API clients.

Figure 2 – Pipeline monitoring using Streamlio
Streamlio’s storage is built on BookKeeper and is used by both Pulsar (for message persistence) and Heron (for state storage). It uses an append-only approach and, importantly, it is entirely decoupled from the stream processing, via its segment-based storage architecture. This architecture allows logical partitions to be broken up and stored as (multiple) segments that can be distributed as needed among your physical systems. This stands in contrast to the traditional stream processing storage architecture, which requires copies of each partition to be stored on any given physical storage location. As a result, the Streamlio storage architecture is much more scalable and flexible than the traditional approach. The Streamlio platform also features performance isolation, realised via intelligent resource management.
Finally, the features described are made available through a web-based dashboard interface. This interface provides management and monitoring (including usage statistics) of your streaming processes, as well as data pipelines and processing flows through your system, with these capabilities being visualised and displayed in real-time. An example of pipeline monitoring is shown in Figure 2.