Don't rollback uncommittable indices on become_follower #7620

cjen1-msft · 2026-01-26T15:38:50Z

The hypothesis is that a backup had previously acked a regular entry, but was missing any subsequent signature, was partitioned, and became a pre-vote candidate.
When the partition was removed it received an append-entries message and became a follower again, in doing so truncating the previously acked regular entry.

From this point onwards, the backup would keep nacking the append-entries, but the leader who's match_idx for that replica is now higher than the last index in the ledger would never reduce the sent_idx past the match_idx, and so keep sending messages which could not be ack'd by the backup.

The solution is to remove the rollback of uncommittable entries during become_follower.
This leaves the recv_append_entries function fixup path as the only place backups truncate entries, and only until a non-fixup append-entries is received.

Additionally I've moved the last_ack_timeout to after the checks for a valid append_entries_response, as if a leader is unhealthy it should step-down via check-quorum, rather than keeping on being a faulty leader.

Finally I cleaned up a line number in the documentation, and made the invalid view check more explicit.

Copilot

Pull request overview

This PR fixes issue #7618 where a backup node could get into a perpetual disagreement with the leader after a network partition. The issue occurred when a backup node acked an entry, became a pre-vote candidate during a partition, and then truncated its log when becoming a follower again. This left the leader's match_idx higher than the backup's actual ledger, causing an infinite nack loop.

Changes:

Removed rollback of uncommittable entries from become_follower() to prevent premature log truncation
Moved last_ack_timeout reset to occur only after successful append-entries validation
Improved code clarity by using ccf::VIEW_UNKNOWN constant instead of literal 0

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
tests/raft_scenarios/follower_rollback_match_index	New test scenario that reproduces the partition scenario from issue #7618
src/consensus/aft/raft.h	Core fix: removed rollback from become_follower, moved last_ack_timeout reset, and improved code clarity with VIEW_UNKNOWN constant

tests/raft_scenarios/follower_rollback_match_index

src/consensus/aft/raft.h

Co-authored-by: Eddy Ashton <[email protected]>

tests/raft_scenarios/soft_rollback

cjen1-msft · 2026-01-28T15:26:35Z

tla/consensus/Traceccfraft.tla

-    /\ Len(log[logline.msg.state.node_id]) = logline.msg.state.last_idx
+    \* The log is truncated during BecomeLeader to the last committable index, and so membership state may have also changed
+    /\ membershipState'[logline.msg.state.node_id] \in ToMembershipState[logline.msg.state.membership_state]
+    /\ Len(log'[logline.msg.state.node_id]) = logline.msg.state.last_idx


The previous behaviour was that when becoming candidate we truncated the ledger to last committable index, and then the truncation during BecomeLeader was a no-op.
Now we only truncate during BecomeLeader, so we need to talk about the subsequent state.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

cjen1-msft added 4 commits January 26, 2026 12:50

Failing test

5eaaea5

Add fixup documentation.

68d83bd

Don't rollback when become follower.

b0511f6

Cleanup of raft.h

c86cdf6

cjen1-msft requested a review from a team as a code owner January 26, 2026 15:38

Copilot AI review requested due to automatic review settings January 26, 2026 15:38

Copilot started reviewing on behalf of cjen1-msft January 26, 2026 15:39 View session

Copilot AI reviewed Jan 26, 2026

View reviewed changes

eddyashton reviewed Jan 26, 2026

View reviewed changes

tests/raft_scenarios/follower_rollback_match_index Outdated Show resolved Hide resolved

eddyashton reviewed Jan 26, 2026

View reviewed changes

tests/raft_scenarios/follower_rollback_match_index Outdated Show resolved Hide resolved

eddyashton reviewed Jan 26, 2026

View reviewed changes

src/consensus/aft/raft.h Outdated Show resolved Hide resolved

eddyashton reviewed Jan 26, 2026

View reviewed changes

src/consensus/aft/raft.h Show resolved Hide resolved

cjen1-msft and others added 7 commits January 26, 2026 16:39

Update src/consensus/aft/raft.h

9c2d5cd

Co-authored-by: Eddy Ashton <[email protected]>

Merge branch 'main' into follower-rollback

e592b08

Update test to be more realistic

c6758ab

Remove rollback when advancing term

73604c9

Fix trace validation failure

4a960bd

fmt

3bc0805

Merge branch 'main' into follower-rollback

45fecfc

achamayou reviewed Jan 28, 2026

View reviewed changes

tests/raft_scenarios/soft_rollback Outdated Show resolved Hide resolved

cjen1-msft commented Jan 28, 2026

View reviewed changes

Be explicit about ledger state during tests

ec20ca9

cjen1-msft requested a review from Copilot January 28, 2026 15:36

Copilot started reviewing on behalf of cjen1-msft January 28, 2026 15:37 View session

Copilot AI reviewed Jan 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't rollback uncommittable indices on become_follower #7620

Don't rollback uncommittable indices on become_follower #7620

cjen1-msft commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cjen1-msft Jan 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Don't rollback uncommittable indices on become_follower #7620

Are you sure you want to change the base?

Don't rollback uncommittable indices on become_follower #7620

Conversation

cjen1-msft commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cjen1-msft Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants