-
Notifications
You must be signed in to change notification settings - Fork 2
pushdown for nested expressions #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add `should_push_through_operator` method to `ProjectionExprs` that checks whether a projection provides actual benefit when pushed through operators like Filter, Sort, Repartition, etc. A projection should be pushed through when it is: 1. Trivial (no expensive computations to duplicate) 2. AND provides benefit via one of: - Narrowing the schema (fewer output columns than input) - Having field accessors that reduce data size - Having literals that can be absorbed by the datasource Column-only projections that just rename without narrowing the schema are NOT pushed through, as they provide no benefit. This fixes test failures where column-only renaming projections were incorrectly being pushed through filters and other operators. Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements projection pushdown for nested expressions, specifically enabling get_field operations on struct/map columns to be pushed down through various operators (filters, repartitions, sorts) to the data source level. This optimization reduces data processing by allowing field extraction to happen earlier in the query plan.
Changes:
- Introduces a new
ArgTrivialityclassification system to distinguish between trivial expressions (columns, literals, field accessors) and non-trivial expressions (computations) - Implements projection splitting to extract beneficial sub-expressions from mixed projections
- Updates pushdown logic across operators (RepartitionExec, FilterExec, SortExec, etc.) to allow trivial expressions through
- Enhances
update_exprwith two-pass rewriting to handle expression matching and column remapping
Reviewed changes
Copilot reviewed 28 out of 30 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
datafusion/expr-common/src/triviality.rs |
New file defining ArgTriviality enum for expression classification |
datafusion/functions/src/core/getfield.rs |
Implements triviality logic for get_field function |
datafusion/physical-optimizer/src/projection_pushdown.rs |
Adds projection splitting logic to extract beneficial sub-expressions |
datafusion/physical-plan/src/projection.rs |
Adds is_trivial_or_narrows_schema helper and updates projection merging logic |
datafusion/physical-plan/src/filter.rs |
Updates filter pushdown to use new triviality-based checks |
datafusion/physical-plan/src/repartition/mod.rs |
Removes restrictive checks to allow trivial expression pushdown |
datafusion/physical-plan/src/sorts/*.rs |
Updates sort operators to allow pushdown without schema narrowing requirement |
datafusion/physical-expr/src/projection.rs |
Implements two-pass expression rewriting in update_expr |
datafusion/core/tests/physical_optimizer/projection_pushdown.rs |
Adds comprehensive test for nested field accessor pushdown |
*.slt files |
Updates expected query plans to reflect new pushdown behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Allow TrivialExpr as a valid base for get_field triviality check, enabling nested field access like get_field(get_field(col, 'a'), 'b') to be considered trivial. Literal base is explicitly not considered trivial since it would be constant-folded anyway. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix typo "expressions" -> "expression" in triviality.rs - Add missing period in coalesce_partitions.rs comment - Fix typo "expressions" -> "expression" in projection_pushdown.rs - Clarify comment in projection.rs about beneficial expressions Co-Authored-By: Claude Opus 4.5 <[email protected]>
When SortExec or SortPreservingMergeExec has a fetch (TopK behavior), they act as filters reducing rows. Pushing non-trivial projections (like literals) through them causes the expression to be evaluated on all input rows instead of just the filtered output rows. Added is_trivial_or_narrows_schema check for sort operators with fetch to prevent pushing literals and computations below TopK operators. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add comprehensive sqllogictest coverage for projection pushdown optimization with struct field access (get_field) expressions. Test coverage includes: - Basic get_field pushdown into DataSourceExec - Nested struct access (s['outer']['inner']) - Projection through Filter, Sort, and TopK operators - Multi-partition scenarios with SortPreservingMergeExec - Edge cases: nullable structs, common subexpressions, literals Co-Authored-By: Claude Opus 4.5 <[email protected]>
No description provided.