Skip to content

Conversation

@hariharans29
Copy link
Member

@hariharans29 hariharans29 commented Jan 24, 2026

Description

Introduces a backend kernel selector config struct in MLAS that allows users to configure selection of backend kernels at runtime based on their preference. The immediate use-case of such a feature is to allow users to opt-out of using/selecting KleidiAI kernels should they choose to do so on ARM platforms. This solution should scale to other kernel implementation backends in the future.

Motivation and Context

Allow users to opt-out of using/selecting KleidiAI kernels should they choose to do so on ARM platforms

@hariharans29 hariharans29 requested a review from Copilot January 24, 2026 20:56
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a backend-kernel selection config that can be plumbed through MLAS/ORT call sites to optionally disable KleidiAI-backed kernels via a new session option.

Changes:

  • Add MLAS_BACKEND_KERNEL_SELECTOR_CONFIG and thread it through MLAS GEMM/QGEMM/DynamicQGEMM APIs (and call sites).
  • Add session option key mlas.disable_kleidiai and propagate it through multiple CPU kernels (RNN/Conv/MatMul/Gemm/Softmax/Einsum/Attention/etc.).
  • Update unit tests and benchmarks to use the new MLAS API signatures.

Reviewed changes

Copilot reviewed 85 out of 85 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
onnxruntime/test/mlas/unittest/test_qgemm.h Update MLAS packing/GEMM calls with backend config param.
onnxruntime/test/mlas/unittest/test_qgemm.cpp Update pack-size queries with backend config param.
onnxruntime/test/mlas/unittest/test_fgemm.h Update batch GEMM + pack APIs with backend config param.
onnxruntime/test/mlas/unittest/test_dynamic_qgemm.cpp Update dynamic QGEMM APIs + availability check signature.
onnxruntime/test/mlas/unittest/test_conv2d.h Update MLAS GEMM call signature.
onnxruntime/test/mlas/bench/bench_sgemm.cpp Update GEMM/pack benchmark calls for new signature.
onnxruntime/test/mlas/bench/bench_qgemm.cpp Update QGEMM pack-size query signature.
onnxruntime/test/framework/math_test.cc Update math::Gemm calls to pass backend config param.
onnxruntime/core/util/math_cpu.cc Extend math APIs to accept backend selector config and forward to MLAS.
onnxruntime/core/util/math.h Extend math API declarations and include MLAS config type.
onnxruntime/core/providers/cpu/rnn/uni_directional_lstm.h Store and plumb backend selector config through LSTM implementation.
onnxruntime/core/providers/cpu/rnn/uni_directional_lstm.cc Forward backend selector config into GEMM calls.
onnxruntime/core/providers/cpu/rnn/rnn_helpers.h Extend helper GEMM wrappers to accept backend selector config.
onnxruntime/core/providers/cpu/rnn/rnn_helpers.cc Forward backend selector config into math::Gemm/GemmEx.
onnxruntime/core/providers/cpu/rnn/rnn.h Add session-option-driven config to RNN op kernel.
onnxruntime/core/providers/cpu/rnn/rnn.cc Pass backend selector config into MatMul calls.
onnxruntime/core/providers/cpu/rnn/lstm_base.h Add session-option-driven config to LSTM base.
onnxruntime/core/providers/cpu/rnn/lstm_base.cc Pass backend selector config to UniDirectionalLstm.
onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.h Formatting-only constructor change.
onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.cc Pass backend selector config into pack-size/pack calls.
onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.h Add session-option-driven config; plumb into GRU implementation.
onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.cc Pass backend selector config through packing and GEMM paths.
onnxruntime/core/providers/cpu/reduction/reduction_ops.h Update MatMul call signature and add TODO for config plumbing.
onnxruntime/core/providers/cpu/quantization/qlinearconv.cc Add session-option-driven config and pass to pack-size call.
onnxruntime/core/providers/cpu/quantization/matmul_integer_base.h Introduce backend selector config usage in packing/dynamic quant paths.
onnxruntime/core/providers/cpu/nn/conv_transpose.h Add session-option-driven config to ConvTranspose.
onnxruntime/core/providers/cpu/nn/conv_transpose.cc Pass backend selector config into MatMul call.
onnxruntime/core/providers/cpu/nn/conv.h Add session-option-driven config to Conv.
onnxruntime/core/providers/cpu/nn/conv.cc Attach backend selector config into MLAS conv parameters and GEMM path.
onnxruntime/core/providers/cpu/ml/svmregressor.cc Pass backend selector config into GEMM path.
onnxruntime/core/providers/cpu/ml/svmclassifier.h Add session-option-driven config and pass to GEMM path.
onnxruntime/core/providers/cpu/ml/linearregressor.h Store backend selector config for LinearRegressor.
onnxruntime/core/providers/cpu/ml/linearregressor.cc Read session option and forward backend selector config into GEMM calls.
onnxruntime/core/providers/cpu/ml/linearclassifier.h Store backend selector config for LinearClassifier.
onnxruntime/core/providers/cpu/ml/linearclassifier.cc Read session option and forward backend selector config into GEMM calls.
onnxruntime/core/providers/cpu/math/softmax_shared.h Extend SoftmaxCPU signature to accept backend selector config.
onnxruntime/core/providers/cpu/math/softmax_shared.cc Forward config into math::Gemm where applicable; ignore in float fast path.
onnxruntime/core/providers/cpu/math/softmax.h Add session-option-driven config and plumb through compute paths.
onnxruntime/core/providers/cpu/math/softmax.cc Pass backend selector config into SoftmaxCPU calls.
onnxruntime/core/providers/cpu/math/matmul.h Add session-option-driven config to MatMul kernels.
onnxruntime/core/providers/cpu/math/matmul.cc Pass backend selector config into math::MatMul/GemmBatch/packing.
onnxruntime/core/providers/cpu/math/gemm_matmul_common.h Extend GemmPackBFp32 signature to accept backend selector config.
onnxruntime/core/providers/cpu/math/gemm.h Add session-option-driven config; plumb into GEMM helpers and MLFloat16 path.
onnxruntime/core/providers/cpu/math/gemm.cc Forward backend selector config through packing and GEMM calls.
onnxruntime/core/providers/cpu/math/einsum_utils/einsum_typed_compute_processor.h Store backend selector config pointer in processor.
onnxruntime/core/providers/cpu/math/einsum_utils/einsum_typed_compute_processor.cc Pass backend selector config into Einsum MatMul execution path.
onnxruntime/core/providers/cpu/math/einsum_utils/einsum_auxiliary_ops.h Extend Einsum MatMul callback signatures to accept backend selector config.
onnxruntime/core/providers/cpu/math/einsum_utils/einsum_auxiliary_ops.cc Forward backend selector config into math::MatMul and device MatMul callback.
onnxruntime/core/providers/cpu/math/einsum.h Add session-option-driven config to Einsum kernel.
onnxruntime/core/providers/cpu/math/einsum.cc Pass backend selector config into typed compute processors.
onnxruntime/core/providers/cpu/llm/attention.h Add session-option-driven config to AttentionBase.
onnxruntime/core/providers/cpu/llm/attention.cc Pass backend selector config into GEMM/MatMul calls in attention.
onnxruntime/core/mlas/lib/softmax.h Add session options header include.
onnxruntime/core/mlas/lib/sgemm.cpp Add backend selector config parameter and gate KleidiAI overrides on it.
onnxruntime/core/mlas/lib/qgemm.cpp Add backend selector config parameter for dynamic QGEMM availability/pack/batch.
onnxruntime/core/mlas/lib/mlasi.h Update MLAS internal function pointer typedefs for new signatures.
onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp Respect backend selector config to opt out of KleidiAI conv.
onnxruntime/core/mlas/lib/convolve.cpp Forward backend selector config into GEMM calls within convolution.
onnxruntime/core/mlas/inc/mlas.h Define backend selector config and extend MLAS API signatures.
onnxruntime/contrib_ops/cpu/word_conv_embedding.h Add session-option-driven config and store it in kernel.
onnxruntime/contrib_ops/cpu/word_conv_embedding.cc Pass backend selector config into GEMM call.
onnxruntime/contrib_ops/cpu/transformers/sampling_cpu_helper.h Update SoftmaxCPU call signature (config TODO left as nullptr).
onnxruntime/contrib_ops/cpu/transformers/generation_device_helper.cc Update SoftmaxCPU call signature (config TODO left as nullptr).
onnxruntime/contrib_ops/cpu/sparse/sparse_attention_base.h Add session-option-driven config and pass into GEMM paths.
onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc Add session-option-driven config and pass into GemmBatch.
onnxruntime/contrib_ops/cpu/quantization/matmul_bnb4.cc Add session-option-driven config and pass into GemmBatch.
onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc Add backend selector config usage in dynamic quant matmul path.
onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_lstm.cc Pass backend selector config into pack-size call.
onnxruntime/contrib_ops/cpu/quantization/attention_quant.cc Pass backend selector config into pack-size call.
onnxruntime/contrib_ops/cpu/moe/moe_quantization_cpu.cc Pass backend selector config into GEMM calls.
onnxruntime/contrib_ops/cpu/moe/moe_cpu.cc Pass backend selector config into MLAS GEMM calls.
onnxruntime/contrib_ops/cpu/moe/moe_base_cpu.h Add session-option-driven config shared by MoE CPU base.
onnxruntime/contrib_ops/cpu/bert/gqa_attention_base.h Add session-option-driven config and pass into GEMM paths.
onnxruntime/contrib_ops/cpu/bert/attention_cpu_base.h Pass backend selector config into GEMM/MatMul usage.
onnxruntime/contrib_ops/cpu/bert/attention_base.h Store backend selector config in base attention class.
onnxruntime/contrib_ops/cpu/bert/attention.cc Pass backend selector config into pack-size/pack and GEMM paths.
onnxruntime/contrib_ops/cpu/attnlstm/uni_dir_attn_lstm.h Extend constructors to accept backend selector config.
onnxruntime/contrib_ops/cpu/attnlstm/uni_dir_attn_lstm.cc Store and pass backend selector config into GEMM paths.
onnxruntime/contrib_ops/cpu/attnlstm/deep_cpu_attn_lstm.h Add session-option-driven config.
onnxruntime/contrib_ops/cpu/attnlstm/deep_cpu_attn_lstm.cc Pass backend selector config into attention/LSTM components.
onnxruntime/contrib_ops/cpu/attnlstm/bahdanau_attention.h Extend constructor to accept backend selector config.
onnxruntime/contrib_ops/cpu/attnlstm/bahdanau_attention.cc Store and pass backend selector config into GEMM paths.
onnxruntime/contrib_ops/cpu/attnlstm/attention_wrapper.h Extend constructor to accept backend selector config.
onnxruntime/contrib_ops/cpu/attnlstm/attention_wrapper.cc Store and pass backend selector config into GEMM paths.
include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h Add mlas.disable_kleidiai session option key.
Comments suppressed due to low confidence (1)

onnxruntime/core/providers/cpu/quantization/matmul_integer_base.h:61

  • MlasGemmPackBSize is now called with a backend selector config, but the subsequent MlasGemmPackB(...) call in this function still uses the old signature and doesn't pass the config. This will either fail to compile (if the signature changed) or pack B using a potentially different backend selection than the size computation, which can break correctness. Update the MlasGemmPackB call to pass the same mlas_backend_kernel_selector_config_.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hariharans29 and others added 11 commits January 24, 2026 13:05
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…per.cc

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

hariharans29 and others added 3 commits January 24, 2026 13:11
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 106 out of 106 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

onnxruntime/contrib_ops/cpu/bert/attention_base.h:41

  • mlas_backend_kernel_selector_config_ is never initialized from session config in this class, so the new mlas.disable_kleidiai option will not affect contrib Attention kernels (it will always use the default). Consider setting use_kleidiai in the AttentionBase constructor using OpKernelInfo::GetConfigOptions(), consistent with other kernels in this PR.
  MLAS_BACKEND_KERNEL_SELECTOR_CONFIG mlas_backend_kernel_selector_config_;

  AttentionBase(const OpKernelInfo& info, bool require_same_hidden_size) {
    int64_t num_heads = 0;
    ORT_ENFORCE(info.GetAttr("num_heads", &num_heads).IsOK() && num_heads > 0);
    num_heads_ = static_cast<int>(num_heads);

onnxruntime/core/util/math_cpu.cc:139

  • In the MLAS_SUPPORTS_GEMM_DOUBLE specialization, the call to MlasGemm is missing the new BackendKernelSelectorConfig argument introduced in mlas.h. This will fail to compile (and should pass either nullptr or the provided mlas_backend_kernel_selector_config).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 106 out of 106 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

onnxruntime/contrib_ops/cpu/bert/attention_base.h:56

  • The new mlas_backend_kernel_selector_config_ member is never initialized from session config options in this base class. As a result, contrib BERT attention will always use the default (use_kleidiai=true) and ignore the mlas.disable_kleidiai session option. Initialize mlas_backend_kernel_selector_config_ in the AttentionBase constructor using info.GetConfigOptions().GetConfigEntry(kOrtSessionOptionsMlasDisableKleidiai) (and add the required header include).
 protected:
  MLAS_BACKEND_KERNEL_SELECTOR_CONFIG mlas_backend_kernel_selector_config_;

  AttentionBase(const OpKernelInfo& info, bool require_same_hidden_size) {
    int64_t num_heads = 0;
    ORT_ENFORCE(info.GetAttr("num_heads", &num_heads).IsOK() && num_heads > 0);
    num_heads_ = static_cast<int>(num_heads);

    is_unidirectional_ = info.GetAttrOrDefault<int64_t>("unidirectional", 0) == 1;
    do_rotary_ = info.GetAttrOrDefault<int64_t>("do_rotary", 0) == 1;
    rotary_embedding_ = static_cast<int>(info.GetAttrOrDefault<int64_t>("rotary_embedding_dim", 0));
    mask_filter_value_ = info.GetAttrOrDefault<float>("mask_filter_value", -10000.0f);
    scale_ = info.GetAttrOrDefault<float>("scale", 0.0f);

    if (!info.GetAttrs<int64_t>("qkv_hidden_sizes", qkv_hidden_sizes_).IsOK()) {
      qkv_hidden_sizes_.clear();
    }

    past_present_share_buffer_ = info.GetAttrOrDefault<int64_t>("past_present_share_buffer", 0LL);

    require_same_hidden_size_ = require_same_hidden_size;
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hariharans29 hariharans29 changed the title [WIP Prototype / DO NOT REVIEW] : Hari/kleidiai opt out [WIP Prototype] : Hari/kleidiai opt out Jan 25, 2026
@hariharans29 hariharans29 changed the title [WIP Prototype] : Hari/kleidiai opt out [MLAS/CPU EP]: Introduce a backend kernel selector config in MLAS Jan 26, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 108 out of 108 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

onnxruntime/contrib_ops/cpu/bert/attention_base.h:46

  • AttentionBase stores mlas_backend_kernel_selector_config_ and downstream code passes it into MLAS calls, but the constructor never sets use_kleidiai based on the new mlas.disable_kleidiai session option. As a result, users cannot opt out of KleidiAI for this operator. Plumb the config option into this constructor (and add the needed onnxruntime_session_options_config_keys.h include).
  MLAS_BACKEND_KERNEL_SELECTOR_CONFIG mlas_backend_kernel_selector_config_;

  AttentionBase(const OpKernelInfo& info, bool require_same_hidden_size) {
    int64_t num_heads = 0;
    ORT_ENFORCE(info.GetAttr("num_heads", &num_heads).IsOK() && num_heads > 0);
    num_heads_ = static_cast<int>(num_heads);

    is_unidirectional_ = info.GetAttrOrDefault<int64_t>("unidirectional", 0) == 1;
    do_rotary_ = info.GetAttrOrDefault<int64_t>("do_rotary", 0) == 1;
    rotary_embedding_ = static_cast<int>(info.GetAttrOrDefault<int64_t>("rotary_embedding_dim", 0));
    mask_filter_value_ = info.GetAttrOrDefault<float>("mask_filter_value", -10000.0f);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 114 out of 114 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 114 out of 114 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

#include "core/common/safeint.h"
#include "core/framework/op_kernel.h"
#include "contrib_ops/cpu/utils/dump_tensor.h"
#include "core/mlas/inc/mlas.h"
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core/mlas/inc/mlas.h appears redundant here because this header already includes contrib_ops/cpu/bert/attention_base.h, which defines/uses MLAS_BACKEND_KERNEL_SELECTOR_CONFIG. Consider removing the extra include to reduce compile-time dependencies.

Suggested change
#include "core/mlas/inc/mlas.h"

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +8
#include "core/mlas/inc/mlas.h"
#include "core/session/onnxruntime_session_options_config_keys.h"
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core/session/onnxruntime_session_options_config_keys.h is included here but no config keys are referenced in this header. Please remove this unused include to reduce compile-time dependencies.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI commented Jan 27, 2026

@hariharans29 I've opened a new pull request, #27166, to work on those changes. Once the pull request is ready, I'll request review from you.

@damdoo01-arm
Copy link
Contributor

Hari, do you think you can add the flag to the following KleidiAI calls also? (I've asked AI to iterate the branch and identify these)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants