Make `find_actor()` delete stale sockaddr entries from registrar on `OSError` by goodboy · Pull Request #366 · goodboy/tractor

goodboy · 2023-08-28T23:23:37Z

All in the title 😎

goodboy · 2023-08-29T04:47:12Z

Lul gotta add the bidict dep?

goodboy · 2025-07-15T20:48:11Z

This needs to be bumped to integrate better with the new multi-ipc-proto stuff landed in #375 !1

Since stale addrs can be leaked where the actor transport server task crashes but doesn't (successfully) unregister from the registrar, we need a remote way to remove such entries; hence this new (registrar) method. To implement this make use of the `bidict` lib for the `._registry` table thus making it super simple to do reverse uuid lookups from an input socket-address.

In cases where an actor's transport server task (by default handling new TCP connections) terminates early but does not de-register from the pertaining registry (aka the registrar) actor's address table, the trying-to-connect client actor will get a connection error on that address. In the case where client handles a (local) `OSError` (meaning the target actor address is likely being contacted over `localhost`) exception, make a further call to the registrar to delete the stale entry and `yield None` gracefully indicating to calling code that no `Portal` can be delivered to the target address. This issue was originally discovered in `piker` where the `emsd` (clearing engine) actor would sometimes crash on rapid client re-connects and then leave a `pikerd` stale entry. With this fix new clients will attempt connect via an endpoint which will re-spawn the `emsd` when a `None` portal is delivered (via `maybe_spawn_em()`).

By spawning an actor task that immediately shuts down the transport server and then sleeps, verify that attempting to connect via the `._discovery.find_actor()` helper delivers `None` for the `Portal` value. Relates to #184 and #216

goodboy · 2025-09-30T05:25:50Z

I think CI is running clean now?

Only thing left is to maybe also do the .delete_addr() call on ConnectionErrors with UDS as well?

goodboy changed the title ~~Make find_actor() delete stale sockaddr entries from registar actor on OSError~~ Make find_actor() delete stale sockaddr entries from registrar on OSError Aug 28, 2023

goodboy mentioned this pull request Aug 29, 2023

Switching to pdbpp is making it impossible to install with pypy and git #364

Closed

goodboy added 7 commits September 29, 2025 20:34

Ensure ._registry values are hashable, since bidict!

c93fbcc

Don't unwrap and unwrapped addr, just warn on delete XD

8b82808

Add stale entry deleted from registrar test

0c2fb98

By spawning an actor task that immediately shuts down the transport server and then sleeps, verify that attempting to connect via the `._discovery.find_actor()` helper delivers `None` for the `Portal` value. Relates to #184 and #216

Always no-raise try-to-pop registry addrs

11acdf8

Rename .delete_sockaddr() -> .delete_addr()

c9fda3f

goodboy changed the base branch from asyncio_debugger_support to main September 30, 2025 05:26

goodboy force-pushed the dereg_on_oserror branch from 3a31c9d to c9fda3f Compare September 30, 2025 05:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `find_actor()` delete stale sockaddr entries from registrar on `OSError`#366

Make `find_actor()` delete stale sockaddr entries from registrar on `OSError`#366
goodboy wants to merge 7 commits intomainfrom
dereg_on_oserror

goodboy commented Aug 28, 2023

Uh oh!

goodboy commented Aug 29, 2023

Uh oh!

goodboy commented Jul 15, 2025

Uh oh!

goodboy commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

goodboy commented Aug 28, 2023

Uh oh!

goodboy commented Aug 29, 2023

Uh oh!

goodboy commented Jul 15, 2025

Uh oh!

goodboy commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant