Analyst Coverage: Philip Howard
Experian is one of the world’s leading global information services companies. Its roots go back to 1826, but it was not officially founded until 1996 when distinct US and UK-based credit scoring companies were merged to form Experian under the ownership of Great Universal Stores. It was a decade later that the company was de-merged and became independent. It is now headquartered in Ireland and is a public company listed in London. It is part of the FTSE100. Revenues, which are reported in dollars, were $4.66bn in the year ending March 2018. The company has more than 16,000 employees working in 37 different countries. Apart from Ireland the company has operational headquarters in the UK, the US and Brazil.
Experian is perhaps most well-known for its credit scoring but, more broadly, offers a range of information services. It offers a number of products and solutions spanning analytics (for example, Experian PowerCurve), business information, consumer credit, data quality management (for example, Experian Aperture Data Studio), identity and fraud, marketing services and payment delivery.
Experian Aperture Data Studio
Last Updated: 29th March 2021
Mutable Award: Gold 2021
Experian Aperture Data Studio is a data quality and enrichment platform that was first formally released in 2018 to replace Experian Pandora, a product that Experian inherited through the acquisition of X88. The company targets this at the use cases illustrated in Figure 1, which apply across industries and across the globe.
At heart it is a data quality solution that is able to perform in-depth data profiling and data quality analysis. The design elements of the product support all the sorts of functional requirements that are shown in Figure 2. The product is persona-based and, in the latest release (2.2) there is a strong emphasis on supporting collaboration between the people who need to use your data for operational and analytic reasons and those that have a more technical relationship with and understanding of that data.
“It is very cool and very powerful. The things you are doing are pretty good, very top notch.”
“I’m impressed. Clean and easy to use, fast install.”
“It is pretty. It is visually appealing. It is inviting.”
“Like Excel on steroids. It is nice. That’s good.”
“The workflows. They are absolutely really good.”
The functions that Aperture Data Studio can perform are detailed in Figure 2. Notable are the enrichment functions that can leverage other aspects of Experian’s data product portfolio for things like geospatial, demographic and credit-based information.
Experian Aperture Data Studio is browser-based, has strong data workflow capabilities and supports user personas. The product has been designed to work within existing technology stacks. Thus, for example, the product’s data preparation capabilities which are largely targeted at operational rather than analytic environments are complementary to both data cataloguing and data governance solutions from third-party vendors such as Collibra and Alation. The software will store relevant metadata defining such integration.
As far as data workflow is concerned, Experian Aperture Data Studio separates data profiling (which is done on all your data, not just samples) from data loading. Thus, data profiling is an explicit workflow step or action during data discovery, and therefore distinct from data manipulation and load processes. Moreover, during profiling you can choose which columns to profile rather than profiling the whole table, and interactive drilldown is supported. Address and email verification can also be defined as steps within a workflow, as can functions you have written using the comprehensive Software Developer’s Kit (SDK). Further, it is worth commenting that you can explore and drill-down from a workflow without having to run a workflow, which is not the case with some other products. This means users can save time by checking the workflow without having to spend time running the file. Other features include the ability to chart results – compare data between runs and undertake version management for workflows. Exhaustive object versioning is a new capability introduced within the latest release and this also supports reversions. Data workflow annotation is supported as well as the ability embed workflows within other workflows.
As mentioned, Experian Aperture Data Studio is persona-based, and there is a particular emphasis on business users and collaboration. In particular, the company there is the concept of “spaces”, where you can define and use functions that are specific to a particular area such as a marketing department. Experian is increasingly adding pre-packaged sets of these functions, for example for credit information quality checking or single customer view templates. Individual functions may come out of the box or you can define your own. As an example, you might have a function that obfuscates credit card numbers. If you want to use something more sophisticated, such as format preserving encryption then there is an SDK that will enable this. Alongside spaces, Experian also supports pre-configured views, consisting of trusted data, that are specific to a particular space or sub-space, such as a “view for marketing” so that the marketing department only sees the data that is relevant to its function.
From a matching perspective an example of duplicate matching rules is illustrated in Figure 3, These are applied when looking for and evaluating potential duplicates. Experian Aperture Data Studio does not apply machine learning to the matching process itself, but it does apply machine learning to matching rules to obtain the best set of those rules.
After investigating the first release of Experian Aperture Data Studio we wrote that “most vendors, in any market, try to meet modern requirements by bolting on extra capabilities. If that is simply a question of adding on a feature here or there, that is not problem. However, when it comes to fundamentals such as self-service and collaboration, these are not the type of services that are amenable to bolting on. Experian is therefore to be applauded for biting the bullet and developing a product suitable for the third decade of this century.” This is the second time that we have reviewed the product since its original release and it continues to build on its early promise.
The Bottom Line
Experian Aperture Data Studio is a modern application for data management and offers the sort of features that businesses require from such a solution. We said something similar in our first review of the product but were forced to qualify our valediction because of the lack of some capabilities. Since then we have happy to remove that qualification: Experian Aperture Data Studio is a product we are happy to recommend.
Mutable Award: Gold 2021
Experian Data Quality
Last Updated: 14th July 2014
Experian Data Quality provides a number of different product offerings: data capture and validation solutions; data cleansing and standardisation; data matching and deduplication; and the Data Quality Platform.
The data capture and validation solutions, which are often embedded in web-based applications, provide validation capabilities not just for physical addresses but also for email addresses and mobile phone numbers. The company’s data cleansing solution, QAS Batch, is a batch-processing application for cleansing, standardising and enriching data. Thirdly, there is QAS Unify which provides a matching engine and rules-based deduplication.
The Data Quality Platform has three main areas of functionality. Firstly, it provides data profiling and discovery capability. Secondly, it provides prototyping, which allows you to use a graphical rule builder to cleanse and transform your data interactively, the sort of operations that are normally associated with both ETL (extract, transform and load) tools and data cleansing tools, thus creating a 'prototype' of the data you need and generating a specification of what you did to produce it. Thirdly, the Data Quality Platform can be used to instantiate these rules so that the product may be used for data quality and data governance purposes as well as for data migrations. In this last case this is enabled by the fact that the product can generate load files that can be used in conjunction with native application and database loaders. In other words, it enables data migrations without the need for an ETL tool.
Experian has an extensive direct sales force and also a significant partner base. The company’s name change, along with the introduction of the Data Quality Platform, marks a change in direction for the company. Historically it has been a market leader in the name and address space but it wants to move away from party data into the more general data quality market.
Experian Data Quality has some 9,000 users around the world for QAS Batch, QAS Unify and the company’s capture and validation services. The Data Quality Platform has, in fact, been white-labelled from another vendor, and although Experian has only very few customers for this product as yet, there are approaching 500 using the original product. Needless to say, Experian has not only integrated its other products with the Data Quality Platform but is also working to extend the product.
The Data Quality Platform is a Java software product that runs under Windows (client and server), Linux (Red Hat and SUSE) as well as Solaris, AIX and HP-UX. Other Java-compliant platforms are available upon request. The product supports JDBC for database connectivity as well as offering support for both flat files and Microsoft Excel. However it does not provide support for non-relational databases (including NoSQL sources). The product is underpinned by a proprietary correlation database. This stores data based on unique values (each value is stored just once) rather than tables or columns. This means that it uses less disk space than traditional databases, as well as enabling unique functionality and improving performance. As an indication of the latter, the Data Quality Platform supports as many as two billion records on-screen with full browsing and filtering capabilities. Another advantage that derives from having its own database is that there is no need to embed a third party database engine within it, so there are no bugs, compatibility, administration or performance issues related to that. As far as functionality is concerned, the product can distinguish all (sub)types of data and one particularly interesting feature is the ability to assign monetary weightings during on-going monitoring. This is useful for justifying and prioritising remediation.
Another major feature is that it supports prototyping of the sort of business rules that are used within a data quality context or transformation rules within a data migration environment. In the latter case the product supports the generation of ETL (extract, transform and load) specifications and can be used as a standalone solution for data migration. The big advantage for the Experian technology set is that you don’t need a separate ETL tool with its associated staff, infrastructure and project timescales.
More generally, the Data Quality Platform supports full cleansing, enrichment and de-duplication using reference data, patterns, synonyms, fuzzy matching and parsing via over 300 native functions and any number of customer-specified functions. Functions can also be called by external applications via the REST API, allowing enterprise-wide re-use. It is also very flexible with respect to both data and metadata and supports customisations such as the construction of a business glossary associated with data assets. The product lacks support for external authentication mechanisms such as Active Directory or LDAP, using its own role-base security instead.
Experian Data Quality has a significant services business, both in the UK and USA. As an example, the business is worth around £3m in revenues in the UK. In addition to training, support and so forth, this division offers integration services for the company’s products, data strategy services, and what might be called bureau services for off-site, one-off cleansing initiatives.
Last Updated: 17th January 2020
Experian PowerCurve is a decision management platform upon which the company delivers a variety of customer-centric and analytically driven decisioning solutions, as illustrated in Figure 1.
The PowerCurve platform makes use of a common component called PowerCurve Strategy Management and there are three core components: Design Studio, which provides the user interface and a common design environment for decision strategies and simulation; Decision Agent, which is the decisioning engine; and a common repository (not shown) for sharing reusable resources such as previously used strategies and so forth. It has an open architecture built on HDFS and Apache Spark that supports plug-ins from third party machine learning environments. There is also a core element called the Analytic Component Extension (ACE) framework, discussed below.
Additional capabilities are provided within the PowerCurve platform and associated solutions. For example, there is a Virtual Assistant to support customer engagement, and there are various pre-built templates to support particular functions.
“Today Alfa Bank’s employees are actively using PowerCurve to assess applications for all unsecured cash loans, credit cards, and refinancing products. We have substantially increased the application processing speed and decisioning accuracy, while maintaining the credit portfolio quality. Due to its flexibility, PowerCurve has made it possible to implement several new products in a short timeframe.”
Experian has recognised that there are four trends within the market related to operationalising analytics: a demand to use machine/deep learning models for predictive functions; issues around the operationalisation of those models; requirements around the monitoring, management and replacement (or retraining) of models; and governance issues with respect to both explainability and bias. As a result, Experian is in the process of enhancing PowerCurve, which was originally released in 2011, to support these capabilities. For example, it has introduced the ACE Framework. This provides an environment for plugging in predictive models developed within third party environments. Currently ACE supports PMML (predictive modelling mark-up language) based models as well as those developed using R or Python. H2O support is forthcoming and other plug-ins will be introduced based on customer demand.
However, there are some elements of what Bloor Research calls AnalyticOps that are not yet fully implemented, though Experian is by no means alone in this. For example, there are facilities for model auditing today, but it is not an easy and simple process. Experian plans to enhance both its explainability and model governance capabilities, adding model performance monitoring. While there are facilities for automatically deploying and testing models into production there are no workflow/approvals processes to complement this yet. The company is considering whether to implement such processes within PowerCurve or to leave this to third party tools.
AnalyticOps concerns the operationalisation, monitoring, improvement, management and governance of augmented intelligent models. As such models become more and more widely deployed, AnalyticOps capabilities will become more and more vital. Where most companies today have a handful of deployed models, if that, we expect large organisations to have thousands or tens of thousands of models to deploy and manage in the near future.
In practice, there are very few vendors that can address all, or even some, of the AnalyticOps requirements of a modern data science environment. Experian PowerCurve is one of these: it has a business user-friendly design environment enabling business users to easily include models in decision strategies and is in process of adding and enhancing capabilities for identifying bias, model explainability, model performance monitoring and model management. In this context it is noteworthy that the company supports a methodology known as FACT for fairness, accuracy, (responsibility to the) customer, and transparency (including both explainability and auditability). By way of contrast, most vendors in this space have no model management capabilities at all, few have any capabilities with respect to bias and explainability is by no means common.
The Bottom Line
However, PowerCurve has not historically been marketed as a generic decisioning platform but only as providing underlying capabilities to support the various decision management solutions offered by Experian. While the latter remains a valid strategy, it means that PowerCurve remains a well-kept secret. Given that many of the well-known data science platforms cannot match the capabilities of PowerCurve when it comes to AnalyticOps – at least at this time – we would like to see PowerCurve marketed more aggressively to a wider market.