Skip to content

Conversation

@castorsky
Copy link

@castorsky castorsky commented Jan 20, 2026

Adaptation of error parsing fix from OpenSearch module #11374 for Elasticsearch module.

Previously, any response containing 'errors=true' was considered successful if there were at least one message with 200/201 status, regarding any error statuses. In addition, all status codes other than 409 (including range [200;299]) caused the FLB_ES_STATUS_ERROR bit to be set in the 'check' flag. This behavior caused some batches to skip retrying when batch contained errors but had one successful message status.

Now the logic of error treatment considers only statuses from range [200;299] as successful and only statuses from range [400;409)&(409;599] as faulty. Afterward, the message batch is considered successful if there were only 2xx statuses or 409 (version conflict), and scheduled for retry if there were any errors ([400;409)&(409;599], or failed response parsing).


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes
    • Corrected status/flag handling to fix incomplete/invalid-argument recognition.
    • Broadened HTTP success range and refined error detection for Elasticsearch responses (better handling of 200–299 and 409, and 400–499 errors).
    • Unified error-path cleanup to ensure resources are released on failures.
    • Tightened success checks so mixed/partial results trigger retries rather than being treated as full success.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

Rename status macros and tighten error/cleanup control flow; HTTP response handling broadened to treat 200–299 and 409 as non-error, 400–499 (except 409) as errors, and cb_es_flush now treats only exact FLB_ES_STATUS_SUCCESS as pure success (mixed flags trigger retry).

Changes

Cohort / File(s) Summary
Elasticsearch status & flush logic
plugins/out_es/es.c
Replace exact-status checks with range checks: success = 200–299 (and treat 409 as non-error); error = 400–499 excluding 409. Change cb_es_flush to require ret == FLB_ES_STATUS_SUCCESS for pure success; mixed success+error flags now lead to retry paths and updated comments.
Error parsing & control-flow cleanup
plugins/out_es/es.c
On JSON packing/unpacking failures, unify error paths to use goto done for consistent cleanup; adjust returned status codes when invalid key types are found; tighten callback/retry semantics and add explanatory comments.
Macro renames / header updates
plugins/out_es/es.h
Rename macros: FLB_ES_STATUS_IMCOMPLETEFLB_ES_STATUS_INCOMPLETE, and FLB_ES_STATUS_INVAILD_ARGUMENTFLB_ES_STATUS_INVALID_ARGUMENT.

Sequence Diagram(s)

sequenceDiagram
    participant Plugin as Fluent Bit Plugin
    participant ES as Elasticsearch (HTTP)
    participant Retry as Retry/Emitter

    Plugin->>ES: send bulk request
    ES-->>Plugin: HTTP response (status, body)
    alt status 200-299
        Plugin->>Plugin: set FLB_ES_STATUS_SUCCESS
        Plugin-->>Retry: acknowledge success (no retry)
    else status == 409
        Plugin->>Plugin: treat as conflict (non-error)
        Plugin-->>Retry: acknowledge (no retry)
    else status >= 400 and status < 500
        Plugin->>Plugin: set error flags (not pure SUCCESS)
        Plugin-->>Retry: schedule retry for failed items
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped through macros, fixed a name,
200–299 now share the fame,
409 gets a gentle look,
errors send me back to the book,
I patch, I tidy — then I game. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing error treatment logic in response parsing for the Elasticsearch output module, which is the core focus of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@plugins/out_es/es.c`:
- Around line 958-961: The comment above the equality check for
FLB_ES_STATUS_SUCCESS incorrectly mentions "OpenSearch" instead of
"Elasticsearch"; update the comment near the equality check (the block that logs
when ret == FLB_ES_STATUS_SUCCESS and calls flb_plg_debug(ctx->ins,
"Elasticsearch response\n%s", c->resp.payload)) to reference "Elasticsearch" so
it correctly reflects this plugin and the behavior of FLB_ES_STATUS_SUCCESS.

@castorsky castorsky force-pushed the out_elasticsearch_bulk_errors_treat branch from 8c847fb to 7350c49 Compare January 20, 2026 09:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@plugins/out_es/es.c`:
- Around line 792-799: The current status-check logic (variable check) only sets
FLB_ES_STATUS_SUCCESS for 2xx and treats non-409 4xx/5xx as FLB_ES_STATUS_ERROR,
which causes batches with only 409 responses to neither be marked success nor
error and thus be retried forever; modify the checks in the status aggregation
(where FLB_ES_STATUS_SUCCESS and FLB_ES_STATUS_ERROR are OR'ed into check) to
also mark 409 responses as success (i.e., treat item_val.via.i64 == 409 the same
as the 2xx case) so all-409 batches return FLB_ES_STATUS_SUCCESS and do not
retry indefinitely.

@castorsky castorsky force-pushed the out_elasticsearch_bulk_errors_treat branch from 7350c49 to c7f2250 Compare January 20, 2026 09:13
Previously, any response containing 'errors=true' was considered successful
if there were at least one message with 200/201 status, regarding any error
statuses. In addition, all status codes other than 409 (including range
[200;299]) caused the FLB_ES_STATUS_ERROR bit to be set in the 'check' flag.
This behavior caused some batches to skip retrying when batch contained
errors but had one successful message status.

Now the logic of error treatment considers only statuses from range
[200;299] as successful and only statuses from range [400;409)&(409;599]
as faulty. Afterward, the message batch is considered successful if there
were only 2xx statuses or 409 (version conflict), and scheduled for retry
if there were any errors ([400;409)&(409;599], or failed response parsing).

Signed-off-by: Castor Sky <[email protected]>
@castorsky castorsky force-pushed the out_elasticsearch_bulk_errors_treat branch from c7f2250 to 285c300 Compare January 20, 2026 09:15
Used cleanup procedure for 'out_buf' and 'result' for the case when errors of JSON parsing occurred.

Signed-off-by: Castor Sky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant