Skip to content

Issue 7535 - Fix race in test_schema_update_policy_reject#7598

Open
aadhikar wants to merge 1 commit into
389ds:mainfrom
aadhikar:issue-schema-update-policy-flaky
Open

Issue 7535 - Fix race in test_schema_update_policy_reject#7598
aadhikar wants to merge 1 commit into
389ds:mainfrom
aadhikar:issue-schema-update-policy-flaky

Conversation

@aadhikar

@aadhikar aadhikar commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Description: Removed temporary_oc2 fixture that added OC2 before the reject policy was active. Moved OC2 creation after the policy is set and supplier1 is restarted.

Fixes: #7535

Reviewed by: ???

Assisted by: Claude

Summary by Sourcery

Adjust schema update rejection test to create and clean up the temporary object class after the reject policy is active, ensuring replication behavior is validated correctly.

Bug Fixes:

  • Fix a race condition in the schema update rejection test where the temporary object class could be created before the reject policy was applied.

Tests:

  • Update the schema update policy rejection test flow to verify the temporary object class is absent on the consumer before replication, and add explicit cleanup within the test.

Description: Removed temporary_oc2 fixture that added OC2 before
the reject policy was active. Moved OC2 creation after the policy
is set and supplier1 is restarted.

Fixes: 389ds#7535

Reviewed by: ???

Assisted by: Claude

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Consider extracting the repeated cleanup logic for removing oc2_name from both suppliers into a small helper function to keep the test body focused on behavior rather than teardown details.
  • Since oc2_name is used in multiple places, you might want to promote it to a module-level constant (e.g., OC2_NAME = "OC2UpdatePolicy") to avoid hardcoding the same string in several locations and make future updates easier.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider extracting the repeated cleanup logic for removing `oc2_name` from both suppliers into a small helper function to keep the test body focused on behavior rather than teardown details.
- Since `oc2_name` is used in multiple places, you might want to promote it to a module-level constant (e.g., `OC2_NAME = "OC2UpdatePolicy"`) to avoid hardcoding the same string in several locations and make future updates easier.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@packit-as-a-service

Copy link
Copy Markdown

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo dnf install -y 'dnf*-command(copr)'
  • dnf copr enable packit/389ds-389-ds-base-7598
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@droideck droideck left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, @aadhikar!
Since @mreynolds389 is assigned to #7535 and had also been looking into this, I'd leave the final review to him in case he has additional context or a preferred direction.
Besides the small cleanup/teardown comment:)

Comment on lines +239 to +244
log.info(f"Cleaning up {oc2_name}")
for s in [supplier1, supplier2]:
try:
Schema(s).remove_objectclass(oc2_name)
except (ldap.NO_SUCH_OBJECT, ValueError):
pass

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might worth placing it in try/finally block, IMO.

@aadhikar aadhikar Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droideck Can you elaborate? If you mean wrapping the entire test logic in try with cleanup in finally, I can do that. If it's just the cleanup part, finally wouldn't add value since there's nothing to guard above it. Let me know what I am missing here!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think we need a finalizer/try/finally for the whole test clean up.

@mreynolds389

Copy link
Copy Markdown
Contributor

I don't think the problem is the test. I've been working on this the last few days and there is definitely is a race condition in DS (a bug). When it fails, the schema reject policy is set the default policy for printer-uri-oid instead of the defined one for OCPolicyUpdate. Since the policy set and the server is restarted it should be using the correct policy right away, but sometimes it still has the default policy set (I'm still working on it).

So I suspect if you run this test enough times it will still fail.

@mreynolds389

Copy link
Copy Markdown
Contributor

I don't think the problem is the test. I've been working on this the last few days and there is definitely is a race condition in DS (a bug). When it fails, the schema reject policy is set the default policy for printer-uri-oid instead of the defined one for OCPolicyUpdate. Since the policy set and the server is restarted it should be using the correct policy right away, but sometimes it still has the default policy set (I'm still working on it).

So I suspect if you run this test enough times it will still fail.

Ok so I have verified this is a testing issue. It's not a bug in DS. What's happening is that the schema replication of "OCPolicyUpdate" is happening before we set it in the rejection policy. I can not reproduce the problem using this patch. I think the CI test could be hardened a bit more, but its been passing for me with both bdb and lmdb, so Ack!

@tbordaz

tbordaz commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

My understanding is the 'setup_test_env' fixture adds 'OCUpdatePolicy' in the topology (S1 and S2).
That means that before 'test_schema_update_policy_reject' starts 'OCUpdatePolicy' is already on S1 and S2.
So like @mreynolds389 said, rejecting OCUpdatePolicy is set too late.

The purpose of 'temporary_oc2' is just to update the schema and allow the replication of it. So IMHO we should not care about the presence of 'OC2UpdatePolicy' only 'OCUpdatePolicy' needs to be checked. This is the one we want to reject.

An option is

  • reject OCUpdatePolicy on S1
  • run 'temporary_oc2' to update the schema csn
  • add 'OCUpdatePolicy' on S1
  • update an entry (trigger replication)
  • check 'OCUpdatePolicy' on S1 and 'OCUpdatePolicy' not on S2
  • allow OCUpdatePolicy on S1
  • update an entry (trigger replication)
  • check 'OCUpdatePolicy' on S1 and 'OCUpdatePolicy' on S2

@aadhikar

Copy link
Copy Markdown
Contributor Author

My understanding is the 'setup_test_env' fixture adds 'OCUpdatePolicy' in the topology (S1 and S2). That means that before 'test_schema_update_policy_reject' starts 'OCUpdatePolicy' is already on S1 and S2. So like @mreynolds389 said, rejecting OCUpdatePolicy is set too late.

The purpose of 'temporary_oc2' is just to update the schema and allow the replication of it. So IMHO we should not care about the presence of 'OC2UpdatePolicy' only 'OCUpdatePolicy' needs to be checked. This is the one we want to reject.

An option is

  • reject OCUpdatePolicy on S1
  • run 'temporary_oc2' to update the schema csn
  • add 'OCUpdatePolicy' on S1
  • update an entry (trigger replication)
  • check 'OCUpdatePolicy' on S1 and 'OCUpdatePolicy' not on S2
  • allow OCUpdatePolicy on S1
  • update an entry (trigger replication)
  • check 'OCUpdatePolicy' on S1 and 'OCUpdatePolicy' on S2

Thanks for the suggestion @tbordaz! However, as per schema.c (line 82-83), the supplier-side reject policy checks the consumer's schema, so OCUpdatePolicy must already be on S2 for the reject to trigger. Adding it only to S1 after the policy won't fire the reject, since S2 won't have it. This matches how your original ticket47676_test.py worked too. Let me know if I am missing something!

@tbordaz

tbordaz commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

My understanding is the 'setup_test_env' fixture adds 'OCUpdatePolicy' in the topology (S1 and S2). That means that before 'test_schema_update_policy_reject' starts 'OCUpdatePolicy' is already on S1 and S2. So like @mreynolds389 said, rejecting OCUpdatePolicy is set too late.
The purpose of 'temporary_oc2' is just to update the schema and allow the replication of it. So IMHO we should not care about the presence of 'OC2UpdatePolicy' only 'OCUpdatePolicy' needs to be checked. This is the one we want to reject.
An option is

  • reject OCUpdatePolicy on S1
  • run 'temporary_oc2' to update the schema csn
  • add 'OCUpdatePolicy' on S1
  • update an entry (trigger replication)
  • check 'OCUpdatePolicy' on S1 and 'OCUpdatePolicy' not on S2
  • allow OCUpdatePolicy on S1
  • update an entry (trigger replication)
  • check 'OCUpdatePolicy' on S1 and 'OCUpdatePolicy' on S2

Thanks for the suggestion @tbordaz! However, as per schema.c (line 82-83), the supplier-side reject policy checks the consumer's schema, so OCUpdatePolicy must already be on S2 for the reject to trigger. Adding it only to S1 after the policy won't fire the reject, since S2 won't have it. This matches how your original ticket47676_test.py worked too. Let me know if I am missing something!

I am afraid this comment is not inline with the code. Actually it looks to me that if a given attribute/objectclass in the schema is set to REPL_SCHEMA_UPDATE_REJECT_VALUE then the full update of the schema is abandoned independently if the consumer knows this attr/oc.
Regarding test_ticket47676_reject_action, the schema already contains OC_NAME. it sets REPL_SCHEMA_UPDATE_REJECT_VALUE:OC_NAME. Then update the schema+OC2_NAME and force replication then check that the OC2_NAME has not been propagated to the consumer.

@aadhikar

Copy link
Copy Markdown
Contributor Author

My understanding is the 'setup_test_env' fixture adds 'OCUpdatePolicy' in the topology (S1 and S2). That means that before 'test_schema_update_policy_reject' starts 'OCUpdatePolicy' is already on S1 and S2. So like @mreynolds389 said, rejecting OCUpdatePolicy is set too late.
The purpose of 'temporary_oc2' is just to update the schema and allow the replication of it. So IMHO we should not care about the presence of 'OC2UpdatePolicy' only 'OCUpdatePolicy' needs to be checked. This is the one we want to reject.
An option is

  • reject OCUpdatePolicy on S1
  • run 'temporary_oc2' to update the schema csn
  • add 'OCUpdatePolicy' on S1
  • update an entry (trigger replication)
  • check 'OCUpdatePolicy' on S1 and 'OCUpdatePolicy' not on S2
  • allow OCUpdatePolicy on S1
  • update an entry (trigger replication)
  • check 'OCUpdatePolicy' on S1 and 'OCUpdatePolicy' on S2

Thanks for the suggestion @tbordaz! However, as per schema.c (line 82-83), the supplier-side reject policy checks the consumer's schema, so OCUpdatePolicy must already be on S2 for the reject to trigger. Adding it only to S1 after the policy won't fire the reject, since S2 won't have it. This matches how your original ticket47676_test.py worked too. Let me know if I am missing something!

I am afraid this comment is not inline with the code. Actually it looks to me that if a given attribute/objectclass in the schema is set to REPL_SCHEMA_UPDATE_REJECT_VALUE then the full update of the schema is abandoned independently if the consumer knows this attr/oc. Regarding test_ticket47676_reject_action, the schema already contains OC_NAME. it sets REPL_SCHEMA_UPDATE_REJECT_VALUE:OC_NAME. Then update the schema+OC2_NAME and force replication then check that the OC2_NAME has not been propagated to the consumer.

@tbordaz I ran your proposed flow as a quick experiment:

# Set reject policy BEFORE OC exists anywhere
policy_entry.add('schemaUpdateObjectclassReject', oc_name)
supplier1.restart()

# Bump schema CSN, then add the OC only on S1
_add_oc(supplier1, "...", oc2_name)
_add_oc(supplier1, "...", oc_name)

# Trigger replication
test_user.replace('description', 'experiment_test')
repl.wait_for_replication(supplier1, supplier2)

# Check
reject_policy = policy_entry.get_attr_val_utf8('schemaUpdateObjectclassReject')
s2_has_oc = Schema(supplier2).query_objectclass(oc_name) is not None

# pdb breakpoint:
# (Pdb) reject_policy
# 'OCTestReject'
# (Pdb) s2_has_oc
# True

The reject policy is set, but S2 still got the OC. It seems the reject only fires when the consumer already has the OC in its schema. Am I missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI - Stabilize schema_update_policy_reject in BDB schema tests

4 participants