GPUs are coming to your database!
Database technology for transaction processing and that for analytics have, kinda, moved in different directions. This has largely been from a “separation of concerns” perspective .
Transactions are all about velocity, with people speeding up databases by making them in-memory (e.g. HANA), and/or adding columnar capabilities (e.g. Postgres, Oracle, and, well, everybody).
Analytics, OTOH, are usually about volume — having massively parallel or distributed back-ends that can run through gobs of data very fast (e.g. Vertica, Redshift, Netezza, etc.) is pretty much table-stakes in this world.
Transactions are all about velocity, with people speeding up databases by making them in-memory (e.g. HANA), and/or adding columnar capabilities (e.g. Postgres, Oracle, and, well, everybody).
Analytics, OTOH, are usually about volume — having massively parallel or distributed back-ends that can run through gobs of data very fast (e.g. Vertica, Redshift, Netezza, etc.) is pretty much table-stakes in this world.
Mind you, when one talks about massively parallel, GPUs should come to mind, and it clearly did for Kinetica, SQream, and MapD who are all using GPUs to run the parallelized parts of SQL queries in their analytics databases. The result, as you might imagine, is a huge speed up in query processing for analytics.
MapD, in particularly, had done some seriously interesting stuff in the way they shard data across chunks of a GPU’s frame buffer, and then combine the results after each query — think “MapReduce running on GPUs” (it’s not really an exact analogy, but close enough 😀). In fact, the same thing works across higher level hardware abstractions, like PCI-Express or NVLink, as the graphic below shows.
MapD, in particularly, had done some seriously interesting stuff in the way they shard data across chunks of a GPU’s frame buffer, and then combine the results after each query — think “MapReduce running on GPUs” (it’s not really an exact analogy, but close enough 😀). In fact, the same thing works across higher level hardware abstractions, like PCI-Express or NVLink, as the graphic below shows.
![]() |
/via https://www.nextplatform.com/2017/04/26/pushing-trillion-row-database-gpu-acceleration/ |
Where things gets interesting though is that “in memory” doesn’t really contra-indicate “GPU-based”! Imagine a hybrid system of sorts, where all the transactional stuff happens in CPU memory, and the analytics happen on the GPU(s) — no reason that couldn’t work, right?
Well, historically it’s been kinda difficult, because of the bandwidth issues between memory and, well, anything else. However, recent CUDA work by Nvidia can synchronize CPU and GPU memory, which means that now the transactional and analytics sides of a database can talk to each other muchmore efficiently. Brytlyt, in fact, is working on exactly that, with engines for Postgres and MariaDB. It’s still designed around analytics (for now), but the path forward clearly involves in-memory solutions working hand-in-hand with the GPUs.
Well, historically it’s been kinda difficult, because of the bandwidth issues between memory and, well, anything else. However, recent CUDA work by Nvidia can synchronize CPU and GPU memory, which means that now the transactional and analytics sides of a database can talk to each other muchmore efficiently. Brytlyt, in fact, is working on exactly that, with engines for Postgres and MariaDB. It’s still designed around analytics (for now), but the path forward clearly involves in-memory solutions working hand-in-hand with the GPUs.
Given all this, I suspect that GPU-based acceleration will soon be standard across databases. When you add in the sea-changes coming thanks to the emergence of NVM (non-volatile memory), and Machine Learning being used for optimizations, it sure looks like there are fun-times ahead in the db world 🙌.
Comments