Skip to content

Correctly purge the organization profile when updating an org's projects#19917

Open
woodruffw wants to merge 4 commits into
pypi:mainfrom
woodruffw-forks:ww/fix
Open

Correctly purge the organization profile when updating an org's projects#19917
woodruffw wants to merge 4 commits into
pypi:mainfrom
woodruffw-forks:ww/fix

Conversation

@woodruffw
Copy link
Copy Markdown
Member

@woodruffw woodruffw commented Apr 21, 2026

The bug here was actually kind of subtle: we use key_factory to build the purge keys, which can silently fail if the underlying attribute access fails (e.g. Project.organization returns None or raises AttributeError).

This in turn wouldn't happen in most normal cases, but can happen if we build the purge keys during after_flush, since at that point relationships like Project.organization haven't been materialized yet (if the Project was initialized with organization_id instead).

My original theory was wrong here.

The bug here stems from an interaction between flag_dirty and #19898. The basic problem is that add_organization_project calls organization.record_event, which adds events to the organization's committed_state. This in turn trips the optimization in #19898, since committed_state == {'events'}, causing us to skip the purges.

(committed_state is otherwise empty, since flag_dirty itself doesn't add anything else.)

Fixes #19911.

Signed-off-by: William Woodruff <william@astral.sh>
Signed-off-by: William Woodruff <william@astral.sh>
Signed-off-by: William Woodruff <william@astral.sh>
Comment on lines +564 to +565
organization=self.db.get(Organization, organization_id),
project=self.db.get(Project, project_id),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging: this is the core of the fix, but I'm also kind of unhappy with it (since now we're fetching an entire row from the DB). Maybe not a huge deal in context though, given that adding a project to an org isn't exactly a hot path operation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't watch to fetch the record, we can add a flag_dirty for the organization_project.project instead, but that also feels clunky.

Considering that the upstream caller already has both the Organization and Project objects, I think it's a better choice to pass the objects that we can then mutate, instead of working with IDs, and then we avoid some of the manual db.add,flush,flag_dirty, and let the ORM do its thing.

Signed-off-by: William Woodruff <william@astral.sh>
@woodruffw
Copy link
Copy Markdown
Member Author

NB: This effectively "fixes" the purge behavior by bypassing the optimization in #19898 entirely. I'm not sure if that's the best approach 😅

@woodruffw woodruffw marked this pull request as ready for review April 22, 2026 14:28
@woodruffw woodruffw requested a review from a team as a code owner April 22, 2026 14:28
@miketheman miketheman self-assigned this Apr 24, 2026
@miketheman miketheman added CDN/network Issues related to our CDN, users having problems connecting to PyPI organizations labels Apr 24, 2026
Copy link
Copy Markdown
Member

@miketheman miketheman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging in, and there's a few things inline - let me know if anything doesn't make sense.

Comment on lines +124 to +132
# Adding a project to an organization via `add_organization_project` must
# purge both the project profile (`project/{name}`) and the organization
# profile (`org/{name}`), otherwise the cached `/org/{orgname}` project
# list goes stale.
#
# Exercising `add_organization_project is load-bearing: the
# purge-key factory uses `if_attr_exists`, which resolves to `None` during
# `after_flush` if the row was built with only FK ids instead of
# objects, causing us to silently drop the purges.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude insists on using comment blocks over docstrings, I don't know why.

Comment on lines -673 to +680
organization_id=organization_id, project_id=project.id
organization=self.db.get(Organization, organization_id),
project=project,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change locally, and tests pass. So either we don't need this change, or the tests aren't expressing the condition well enough.

"""
Adds an association between the specified organization and project
"""
from warehouse.packaging.models import Project
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure why our isort config doesn't flag inline imports without a disable comment - but this should be placed at top-of module. (Hey Claude, remember that preference).

Comment on lines +564 to +565
organization=self.db.get(Organization, organization_id),
project=self.db.get(Project, project_id),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't watch to fetch the record, we can add a flag_dirty for the organization_project.project instead, but that also feels clunky.

Considering that the upstream caller already has both the Organization and Project objects, I think it's a better choice to pass the objects that we can then mutate, instead of working with IDs, and then we avoid some of the manual db.add,flush,flag_dirty, and let the ORM do its thing.

OrganizationProject,
purge_keys=[
key_factory("project/{attr.normalized_name}", if_attr_exists="project"),
key_factory("org/{attr.normalized_name}", if_attr_exists="organization"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the main fix - since it was previously unregistered.

Comment on lines +170 to +171
assert f"org/{organization.normalized_name}" in purges
assert f"project/{project.normalized_name}" in purges
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes for the wrong reasons, and also believe it shouldn't work in its current behavior.

Adding an Event shouldn't dirty an object for cache purge, which is why it's excluded. Dirties that include events and other attributes continue to be purged correctly. One logged example from a few minutes ago:

{
  "obj_class": "File",
  "state": "dirty",
  "keys": [
    "project/ctboost"
  ],
  "changed_attrs": [
    "events",
    "provenance"
  ],
  "event": "cache_purge_keys_generated",
  ...
}

The session purge shows that the object had events + , and thus got a cache purge. see

# Skip if only non-cache-relevant attributes changed
if changed and changed <= _NON_CACHE_RELEVANT_ATTRS:

maybe the word "only" needs to be bolded 😉

The reason this test is currently passing is that there's no "session purge pop" after organization_service.add_organization_project(...) call - so those changes will trigger the purge.

Moving db_request.db.flush() / db_request.db.info.pop("warehouse.cache.origin.purges", None) to be after the service call ensures the session state is "clean" before attempting the Event addition, and shows the failure.

Generally I think removing this test case is best, as the general idea is covered by this test:

@pytest.mark.parametrize(
"trigger",
[_trigger_event, _trigger_observation, _trigger_invitation],
ids=["events", "observations", "invitations"],
)
def test_store_purge_keys_skips_audit_only_collection_changes(
trigger, app_config, db_request
):
# Audit/admin-only collection mutations (events, observations, invitations)
# never affect publicly-cached content and should not trigger a Project purge.
project = ProjectFactory.create()
db_request.db.flush()
db_request.db.info.pop("warehouse.cache.origin.purges", None)
trigger(project, db_request)
db_request.db.flush()
purges = db_request.db.info.get("warehouse.cache.origin.purges", set())
assert f"project/{project.normalized_name}" not in purges

Maybe adding some more parameterizations there could be helpful, but might be more complexity than worth it? Your choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CDN/network Issues related to our CDN, users having problems connecting to PyPI organizations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stale organization project lists

2 participants