DirectScan, fallback modes, and EXPLAIN
Postgres Lakehouse is fastest when it can "push down" your entire query to DataFusion, the vectorized query used for handling queries when possible. (In the future, this will be more fine-grained as we add support for partial pushdowns.)
Postgres Lakehouse can execute your query in two modes. First, it attempts to run the entire query using Seafowl (a dedicated columnar database based on DataFusion). If Seafowl can't run the entire query, for example, because it uses PostgreSQL-specific operations like JSON, then Postgres Lakehouse will fall back to using the PostgreSQL executor, with Seafowl streaming full table contents to it.
If your query is extremely slow, it's possible that's what's happening.
You can check which mode is being used by running an EXPLAIN
on the query and
making sure that the top-most query node is DirectScan
. For example:
In this case, the query is executed by PostgreSQL and Seafowl is only involved when scanning the table (see CompatScan
at the bottom).
The fix in this case is to explicitly name the inner COUNT(*)
column, since Seafowl gives it an implicit name count(*)
whereas PostgreSQL calls it count
:
Here, we can see the DirectScan
at the top, which means that Seafowl is running the entire query.
If you're having trouble rewording your query to make it run fully on Seafowl, open a support ticket.
Could this page be better? Report a problem or suggest an addition!