-
Notifications
You must be signed in to change notification settings - Fork 247
Don't rollback uncommittable indices on become_follower #7620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes issue #7618 where a backup node could get into a perpetual disagreement with the leader after a network partition. The issue occurred when a backup node acked an entry, became a pre-vote candidate during a partition, and then truncated its log when becoming a follower again. This left the leader's match_idx higher than the backup's actual ledger, causing an infinite nack loop.
Changes:
- Removed rollback of uncommittable entries from
become_follower()to prevent premature log truncation - Moved
last_ack_timeoutreset to occur only after successful append-entries validation - Improved code clarity by using
ccf::VIEW_UNKNOWNconstant instead of literal0
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/raft_scenarios/follower_rollback_match_index | New test scenario that reproduces the partition scenario from issue #7618 |
| src/consensus/aft/raft.h | Core fix: removed rollback from become_follower, moved last_ack_timeout reset, and improved code clarity with VIEW_UNKNOWN constant |
Co-authored-by: Eddy Ashton <[email protected]>
| /\ Len(log[logline.msg.state.node_id]) = logline.msg.state.last_idx | ||
| \* The log is truncated during BecomeLeader to the last committable index, and so membership state may have also changed | ||
| /\ membershipState'[logline.msg.state.node_id] \in ToMembershipState[logline.msg.state.membership_state] | ||
| /\ Len(log'[logline.msg.state.node_id]) = logline.msg.state.last_idx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous behaviour was that when becoming candidate we truncated the ledger to last committable index, and then the truncation during BecomeLeader was a no-op.
Now we only truncate during BecomeLeader, so we need to talk about the subsequent state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
Closes #7618
The hypothesis is that a backup had previously acked a regular entry, but was missing any subsequent signature, was partitioned, and became a pre-vote candidate.
When the partition was removed it received an append-entries message and became a follower again, in doing so truncating the previously acked regular entry.
From this point onwards, the backup would keep nacking the append-entries, but the leader who's
match_idxfor that replica is now higher than the last index in the ledger would never reduce thesent_idxpast thematch_idx, and so keep sending messages which could not be ack'd by the backup.The solution is to remove the rollback of uncommittable entries during
become_follower.This leaves the
recv_append_entriesfunction fixup path as the only place backups truncate entries, and only until a non-fixup append-entries is received.Additionally I've moved the
last_ack_timeoutto after the checks for a valid append_entries_response, as if a leader is unhealthy it should step-down via check-quorum, rather than keeping on being a faulty leader.Finally I cleaned up a line number in the documentation, and made the invalid view check more explicit.