Benchmarking datasets
When you provision a Lakehouse node, it comes pre-configured to point to a public S3 bucket in its same region, containing sample benchmarking datasets.
You can query tables in these datasets by referencing them with their schema name.
Schema Name | Dataset |
---|---|
tpcds_sf_1 | TPC-DS, Scale Factor 1 |
tpcds_sf_10 | TPC-DS, Scale Factor 10 |
tpcds_sf_100 | TPC-DS, Scale Factor 100 |
tpcds_sf_1000 | TPC-DS, Scale Factor 1000 |
tpch_sf_1 | TPC-H, Scale Factor 1 |
tpch_sf_10 | TPC-H, Scale Factor 10 |
tpch_sf_100 | TPC-H, Scale Factor 100 |
tpch_sf_1000 | TPC-H, Scale Factor 1000 |
clickbench | ClickBench, 100 million rows |
brc_1b | Billion row challenge |
Notes about ClickBench data:
Data columns (EventData
) are integers, not dates.
You must quote ClickBench column names, because they contain uppercase letters, but unquoted identifiers in Postgres are case-insensitive. For example:
✅ select "Title" from clickbench.hits;
🚫 select Title from clickbench.hits;
Could this page be better? Report a problem or suggest an addition!