Simplify ESG Data with Yotilla Data Warehouse Automation

Corporations increasingly need to be aware of Environmental, Social and Governance (ESG) issues. They need to be able to transparently explain their own ESG situation and risk exposure to investors, customers and, in some cases, governments. Data involved includes the carbon emissions of a company, its status with regard to recycling, and its policies on diversity and inclusion. This ESG data presents a particular challenge since it is not typically stored in the usual corporate databases that house financial information. ESG data is frequently scattered around spreadsheets, HR systems, specialist databases and third-party data sources. Because the scope of ESG is quite broad, companies will probably take an iterative approach to capturing and reporting this data, “thinking big but starting small”. With this in mind, it is important to have a flexible data platform that can easily add new and potentially volatile data sources.

Traditional data warehouses are set up to tackle corporate reporting such as information on sales, products and customers, and may need to tackle very large volumes of data: the sheer quantity of individual retail sales transactions or Telco call logs in a large company creates major challenges of scalability and performance. ESG data presents different issues: it is typically not large in volume, but may be complex and come from many different systems, and may change in format frequently. I was recently shown an interesting demonstration of a new data warehouse tool that suits this kind of issue very well, using ESG data as an example.

Exasol is known as a vendor of a very fast, scalable in-memory database aimed at analytics. In late 2023 they launched a complementary product called Yotilla, a tool that is aimed at greatly simplifying the design and deployment of data warehouses. With Yotilla, a semantic layer of business terminology is defined that shields the end-user from seeing the physical structure of a database. For example, in Yotillla an end-user sees an entity called “customer” rather than a physical database table (for example, in SAP the customer master table has the catchy name “KNA1”). A user can add data sources via a menu that creates a staging area, and Yotilla interprets the database schemas that it finds and presents a visual representation of the structure of the data, in either tabular or graphical form showing key relationships. An end user can then select the data that they are interested in and Yotilla will generate the necessary SQL statements to produce a new data mart that shows just the data that the customer is interested in. The underlying database could be Exasol, but it could also be one of a range of partner databases, such as Snowflake.

Once the data mart structure is defined, Yotilla generates the necessary data mart tables and populates them, and it is then possible to plug in a reporting or analytic tool like as Tableau to produce whatever reports you need. Crucially, because Yotilla holds its metadata in a catalogue, if the underlying source structure changes then it can detect such changes and rebuild the data mart without the need for technical intervention from the IT department: Yotilla just regenerates the SQL. In the demonstration of ESG data that I saw, even a fairly simple set of data structures resulted in some quite complex SQL statements to generate the necessary tables for the data mart when the structure of an underlying source changed. Without such a tool, it would be necessary to involve a database administrator to handle the schema changes associated with the changes in the sources. In reality, changing the underlying structure of a populated data warehouse schema is a non-trivial task, so a tool that can radically speed this process up will have a significant effect on support effort and the timeliness of responding to business changes. In an area such as ESG, with many diverse data sources, some of which are from 3rd parties, the ability to rapidly deal with changes in underlying data structures is very useful. Naturally, Yotilla is not restricted to ESG data but would apply to any situation with complex, diverse, changing data sources. As the commercial rollout of Yotilla begins in 2024 it will be interesting to see the level of take-up amongst customers in such situations.