Lock Sliding Sync connections when inserting lazy members, to prevent repeated deadlocks. by reivilibre · Pull Request #19826 · element-hq/synapse

reivilibre · 2026-06-05T02:56:18Z

Got paged today for this. The sliding sync worker in question had loads of deadlocks in the logs.
I restarted it and it got unwedged, but we should have a more robust defence, which this PR proposes.

psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL:  Process 257324 waits for ShareLock on transaction 688227036; blocked by process 254908.
Process 254908 waits for ShareLock on transaction 688222971; blocked by process 256179.
Process 256179 waits for ExclusiveLock on tuple (302352,92) of relation 2962200779 of database 16403; blocked by process 257213.
Process 257213 waits for ShareLock on transaction 688225005; blocked by process 254905.
Process 254905 waits for ShareLock on transaction 688228814; blocked by process 257324.
HINT:  See server log for query details.
CONTEXT:  while inserting index tuple (183070,103) in relation "sliding_sync_connection_lazy_members"

I wonder if an unfortunate side effect is that these repeated attempts leave a lot of dead tuples on the table,
which would then harm the performance of the next attempt to insert the tuples,
I suspect making it more likely that they will deadlock again (?).

By acquring a FOR NO KEY UPDATE lock upfront before beginning work, we can ensure that one
of the transactions gets queued behind the other one, meaning the first one can succeed unimpeded.

FOR NO KEY UPDATE blocks other FOR NO KEY UPDATE locks and is the weakest lock level that blocks itself.

erikjohnston · 2026-06-05T09:24:46Z

+            # https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-ROWS
+            # (We could also consider sorting our insertions, but not clear if Postgres
+            # guarantees to preservee the insertion order)
+            txn.execute(


We probably want to go further than this and block all concurrent writes on a given connection to also avoid serialisation failures. Generally it's also good to do as much locking as possible at the start of the transaction.

We already fetch the connection_key from sliding_sync_connections as the first query in the transaction, so it might just be as easy as adding the locking there?

Yup derp, I did mean to go to the top but didn't occur to me to check whether this was called as part of something else. I will blame the 3am factor.

…deadlocks

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>

MadLittleMods · 2026-06-05T13:34:37Z

+                # Specifically, the statements seen to deadlock against
+                # each other were
+                # `INSERT INTO sliding_sync_connection_lazy_members`
+                # with conflicting tuples on
+                #     "sliding_sync_connection_lazy_members_idx" UNIQUE, btree
+                #     (connection_key, room_id, user_id)


For my own reference, can someone explain the real life situations that causes this?

Someone is sending multiple concurrent requests with the same connection_key?

It would appear so, I guess? There were multiple users involved.

I suppose it's possible that some of these were retries after a connection dropped/timed out/... or something like that.

erikjohnston · 2026-06-11T15:47:56Z

+                    SELECT 1
+                    FROM sliding_sync_connections
+                    WHERE connection_key = ?
+                    FOR NO KEY UPDATE


I don't suppose we can optionally add this to the query above if its postgres? Or does it not work for more complex select statements?

That's on a different table.
We need a lock over the connection_key so it feels like putting it on the sliding_sync_connections table might be best.

I'm not really seeing a sensible way to rejig this otherwise; in the other branch of the if-else there may not be rows on sliding sync connection positions to lock

The select above joins on sliding_sync_connections so I think should work? You can also do FOR NO KEY UPDATE OF sliding_sync_connections by the looks of it

Ahhhh right I wasn't aware of OF xxx. That'll do it

erikjohnston

Thanks!

reivilibre marked this pull request as ready for review June 5, 2026 02:57

reivilibre requested a review from a team as a code owner June 5, 2026 02:57

erikjohnston reviewed Jun 5, 2026

View reviewed changes

reivilibre added 2 commits June 5, 2026 12:17

Lock sliding sync connections when inserting lazy members to prevent …

0d1c509

…deadlocks

Newsfile

b1c313e

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>

reivilibre force-pushed the rei/ss_deadlock branch from 8c2d312 to b1c313e Compare June 5, 2026 11:17

Hoist lock to entrypoint of outer txn

57a4d7c

reivilibre requested a review from erikjohnston June 5, 2026 13:11

MadLittleMods added the A-Sync label Jun 5, 2026

MadLittleMods reviewed Jun 5, 2026

View reviewed changes

erikjohnston reviewed Jun 11, 2026

View reviewed changes

reivilibre requested a review from erikjohnston June 15, 2026 11:28

Put lock clause in initial query

4374b7b

erikjohnston approved these changes Jun 23, 2026

View reviewed changes

Merge branch 'develop' into rei/ss_deadlock

6ba291b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lock Sliding Sync connections when inserting lazy members, to prevent repeated deadlocks.#19826

Lock Sliding Sync connections when inserting lazy members, to prevent repeated deadlocks.#19826
reivilibre wants to merge 5 commits into
developfrom
rei/ss_deadlock

reivilibre commented Jun 5, 2026

Uh oh!

erikjohnston Jun 5, 2026

Uh oh!

reivilibre Jun 5, 2026

Uh oh!

MadLittleMods Jun 5, 2026

Uh oh!

reivilibre Jun 8, 2026

Uh oh!

erikjohnston Jun 11, 2026 •

edited

Loading

Uh oh!

reivilibre Jun 15, 2026 •

edited

Loading

Uh oh!

erikjohnston Jun 15, 2026

Uh oh!

reivilibre Jun 15, 2026

Uh oh!

erikjohnston left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

reivilibre commented Jun 5, 2026

Uh oh!

erikjohnston Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

reivilibre Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

reivilibre Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

erikjohnston Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reivilibre Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erikjohnston Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

reivilibre Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

erikjohnston left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erikjohnston Jun 11, 2026 •

edited

Loading

reivilibre Jun 15, 2026 •

edited

Loading