Skip to content

Conversation

@adriangb
Copy link
Member

No description provided.

adriangb and others added 22 commits January 21, 2026 19:45
Add `should_push_through_operator` method to `ProjectionExprs` that
checks whether a projection provides actual benefit when pushed through
operators like Filter, Sort, Repartition, etc.

A projection should be pushed through when it is:
1. Trivial (no expensive computations to duplicate)
2. AND provides benefit via one of:
   - Narrowing the schema (fewer output columns than input)
   - Having field accessors that reduce data size
   - Having literals that can be absorbed by the datasource

Column-only projections that just rename without narrowing the schema
are NOT pushed through, as they provide no benefit.

This fixes test failures where column-only renaming projections were
incorrectly being pushed through filters and other operators.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements projection pushdown for nested expressions, specifically enabling get_field operations on struct/map columns to be pushed down through various operators (filters, repartitions, sorts) to the data source level. This optimization reduces data processing by allowing field extraction to happen earlier in the query plan.

Changes:

  • Introduces a new ArgTriviality classification system to distinguish between trivial expressions (columns, literals, field accessors) and non-trivial expressions (computations)
  • Implements projection splitting to extract beneficial sub-expressions from mixed projections
  • Updates pushdown logic across operators (RepartitionExec, FilterExec, SortExec, etc.) to allow trivial expressions through
  • Enhances update_expr with two-pass rewriting to handle expression matching and column remapping

Reviewed changes

Copilot reviewed 28 out of 30 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
datafusion/expr-common/src/triviality.rs New file defining ArgTriviality enum for expression classification
datafusion/functions/src/core/getfield.rs Implements triviality logic for get_field function
datafusion/physical-optimizer/src/projection_pushdown.rs Adds projection splitting logic to extract beneficial sub-expressions
datafusion/physical-plan/src/projection.rs Adds is_trivial_or_narrows_schema helper and updates projection merging logic
datafusion/physical-plan/src/filter.rs Updates filter pushdown to use new triviality-based checks
datafusion/physical-plan/src/repartition/mod.rs Removes restrictive checks to allow trivial expression pushdown
datafusion/physical-plan/src/sorts/*.rs Updates sort operators to allow pushdown without schema narrowing requirement
datafusion/physical-expr/src/projection.rs Implements two-pass expression rewriting in update_expr
datafusion/core/tests/physical_optimizer/projection_pushdown.rs Adds comprehensive test for nested field accessor pushdown
*.slt files Updates expected query plans to reflect new pushdown behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

adriangb and others added 10 commits January 21, 2026 21:21
Allow TrivialExpr as a valid base for get_field triviality check,
enabling nested field access like get_field(get_field(col, 'a'), 'b')
to be considered trivial.

Literal base is explicitly not considered trivial since it would be
constant-folded anyway.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix typo "expressions" -> "expression" in triviality.rs
- Add missing period in coalesce_partitions.rs comment
- Fix typo "expressions" -> "expression" in projection_pushdown.rs
- Clarify comment in projection.rs about beneficial expressions

Co-Authored-By: Claude Opus 4.5 <[email protected]>
When SortExec or SortPreservingMergeExec has a fetch (TopK behavior),
they act as filters reducing rows. Pushing non-trivial projections
(like literals) through them causes the expression to be evaluated
on all input rows instead of just the filtered output rows.

Added is_trivial_or_narrows_schema check for sort operators with fetch
to prevent pushing literals and computations below TopK operators.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add comprehensive sqllogictest coverage for projection pushdown
optimization with struct field access (get_field) expressions.

Test coverage includes:
- Basic get_field pushdown into DataSourceExec
- Nested struct access (s['outer']['inner'])
- Projection through Filter, Sort, and TopK operators
- Multi-partition scenarios with SortPreservingMergeExec
- Edge cases: nullable structs, common subexpressions, literals

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants