ClickHouse tips #3: the transform function
Clickhouse has a powerful feature, {% code-line %}JOIN{% code-line-end %} engines, that allows us to prepare a table to be joined with better performance than a regular table (MergeTree, Log…). It also allows to use {% code-line %}joinGet{% code-line-end %} to get table values using a key.
Somtimes you don’t have a JOIN table but you’d like to use something with the joinGet performance. Unfortunately, you can’t use joinGet with something created on the fly.
However there is a way to do that, using transform
The basic structure would be like this
A real-life example
Given a {% code-line %}sales{% code-line-end %} table like this with 1M rows (download the csv here):
and a {% code-line %}exchange_rate{% code-line-end %} table like this (CSV here), with daily data for 3 years:
Imagine you want to get the total amount of sales in dollars per country. You could do it with a join like this if the data is small:
But if the data is big, using {% code-line %}transform{% code-line-end %} would have a better performance. You’d do something like this:
Run it yourself
We’ve created two CSVs with fake data and those schemas. If you want to replicate the queries above in your account, create the data sources with this command: