Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/19826.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Lock Sliding Sync connections when inserting lazy members, to prevent repeated deadlocks.
26 changes: 26 additions & 0 deletions synapse/storage/databases/main/sliding_sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,32 @@ def persist_per_connection_state_txn(
raise SlidingSyncUnknownPosition()

(connection_key,) = row

if isinstance(self.database_engine, PostgresEngine):
# Lock the sliding sync connection row for update upfront,
# to prevent deadlocks between concurrent transactions
# (which can retry again and again without making progress).
#
# (We don't need to explicitly lock in the other branch,
# where we re-create the connection, as that implies a lock
# anyway)
#
# Specifically, the statements seen to deadlock against
# each other were
# `INSERT INTO sliding_sync_connection_lazy_members`
# with conflicting tuples on
# "sliding_sync_connection_lazy_members_idx" UNIQUE, btree
# (connection_key, room_id, user_id)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own reference, can someone explain the real life situations that causes this?

Someone is sending multiple concurrent requests with the same connection_key?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would appear so, I guess? There were multiple users involved.

I suppose it's possible that some of these were retries after a connection dropped/timed out/... or something like that.

# https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-ROWS
txn.execute(
"""
SELECT 1
FROM sliding_sync_connections
WHERE connection_key = ?
FOR NO KEY UPDATE

@erikjohnston erikjohnston Jun 11, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't suppose we can optionally add this to the query above if its postgres? Or does it not work for more complex select statements?

@reivilibre reivilibre Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's on a different table.
We need a lock over the connection_key so it feels like putting it on the sliding_sync_connections table might be best.

I'm not really seeing a sensible way to rejig this otherwise; in the other branch of the if-else there may not be rows on sliding sync connection positions to lock

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The select above joins on sliding_sync_connections so I think should work? You can also do FOR NO KEY UPDATE OF sliding_sync_connections by the looks of it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh right I wasn't aware of OF xxx. That'll do it

""",
(connection_key,),
)
else:
# We're restarting the connection, so we clear the previous existing data we
# used to track it. We do this here to ensure that if we get lots of
Expand Down
Loading