A preview of the next version of Greenplum

There are many different use cases for data warehousing but some are highly demanding on resources, either because the volumes of data involved are very large or the analyst processing involved is complex, or both. Many traditional databases like SQL Server and Oracle were designed as symmetric multi-processing (SMP) products, where a single server handles shared memory, software and I/O resources. Other databases, including Greenplum, were designed to be massively parallel processing (MPP), where operations are performed by many processors at the same time. Writing a database optimizer for MPP is a significant engineering effort, but opens up the possibility of efficiently handling very complex database queries on very large datasets, which SMP products can struggle with at a certain level of scale.

Greenplum is based on the opensource Postgres database, and in its next version 7 due in 2023 will migrate to V12 of Postgres. In particular this version of Greenplum will have an embedded compiler that will recompile machine code for every query, which speeds up performance. Greenplum already also supports in database analytics such as mathematical functions in Python, and handles complex datatypes such as geo-spatial and time series. It is worth considering Greenplum if your data warehouse requirements are particularly demanding.