feat: add chaos testing module for fault injection#224
Conversation
0e06f09 to
28a0679
Compare
Review SummaryAssessment: Request Changes This PR introduces a well-structured chaos testing module with clean separation between effects, scenarios, plugin, and experiment orchestration. The ContextVar-based design for concurrency safety and the composition with the existing Review Categories
The overall design is thoughtful and the test coverage is good. Addressing the |
Review Summary (Follow-up)Assessment: Comment All feedback from the previous review has been thoroughly addressed — Remaining Items
These are straightforward fixes. Once addressed, this looks good to merge. |
2b51586 to
bb023d6
Compare
| # Produces 6 ChaosCase objects: 2 cases × (2 effect maps + 1 baseline) | ||
| """ | ||
|
|
||
| effects: dict[str, list[ChaosEffect]] = Field( |
There was a problem hiding this comment.
Issue: The type annotation dict[str, list[ChaosEffect]] causes Pydantic to only serialize the base class fields during model_dump(). Concrete effect fields (error_type, max_length, remove_ratio, corrupt_ratio) are silently dropped:
c = ChaosCase(name='test', input='hi', effects={'tool': [ToolCallFailure(error_type='timeout')]})
c.model_dump()['effects']
# → {'tool': [{'apply_rate': 1.0}]} ← error_type lost!Deserialization also fails since ChaosEffect is abstract. This affects the base Experiment's error reporting path (case.model_dump() on line 535 of experiment.py) and any serialization/persistence scenario.
Suggestion: Use a Pydantic discriminated union with a type field:
from typing import Annotated, Union
from pydantic import Discriminator, Tag
# Add a type discriminator to each concrete effect:
class ToolCallFailure(ToolEffect):
effect_type: Literal["tool_call_failure"] = "tool_call_failure"
...
# Then annotate:
AnyEffect = Annotated[
Union[
Annotated[ToolCallFailure, Tag("tool_call_failure")],
Annotated[TruncateFields, Tag("truncate_fields")],
Annotated[RemoveFields, Tag("remove_fields")],
Annotated[CorruptValues, Tag("corrupt_values")],
],
Discriminator("effect_type"),
]
class ChaosCase(Case, Generic[InputT, OutputT]):
effects: dict[str, list[AnyEffect]] = Field(default_factory=dict)This ensures full round-trip serialization fidelity.
| """ | ||
| import asyncio | ||
|
|
||
| if asyncio.iscoroutinefunction(task): |
There was a problem hiding this comment.
Issue: The _wrap_task async handling was added per previous review feedback, but there's no test exercising run_evaluations_async with an actual async task. Without it, regressions to this code path would go undetected.
Suggestion: Add a test like:
@pytest.mark.asyncio
async def test_run_evaluations_async_with_async_task(self, cases, effect_maps, evaluator):
chaos_cases = ChaosCase.expand(cases, effect_maps)
experiment = ChaosExperiment(cases=chaos_cases, evaluators=[evaluator])
async def async_task(case: ChaosCase):
active = _current_chaos_case.get()
assert active is case
return "async_output"
reports = await experiment.run_evaluations_async(task=async_task, max_workers=2)
assert len(reports) >= 1
Review Summary (Round 3)Assessment: Request Changes All prior feedback has been resolved. One important serialization issue remains that would cause data loss in error reporting and persistence scenarios. Details
The architecture, ContextVar design, plugin implementation, and test structure are all solid. |
Description
Introduces a chaos testing module for fault injection during agent evaluation. Enables systematic testing of agent resilience under tool failures and response corruption without modifying agent code.
Key capabilities:
ToolCallFailure,TruncateFields,RemoveFields,CorruptValuesCasewith aneffectsfield that carries the failure injection config. ProvidesChaosCase.expand(cases, effect_maps)to generate the Cartesian product of base cases × named effect maps (adict[str, dict[str, list[ChaosEffect]]]where keys are short human-readable condition names)ChaosPluginhooks into Strands' nativeBeforeToolCallEvent/AfterToolCallEventsystem; reads the activeChaosCasefrom a ContextVar (zero chaos concepts in user task code)ChaosExperimentcomposes the baseExperimentto runChaosCaseobjects, managing ContextVar lifecycle per case for thread/async safetyDesign principles:
plugins=[chaos]to the agentChaosCaseextendsCase(not modifies it) — stable extension point for future chaos-specific fields without breaking the base frameworkChaosScenarioclass — simpler API surface with readable case names in reports (e.g.,book_a_flight|search_timeout)Case,Evaluator,EvaluationReport)Related Issues
#114
Documentation PR
strands-agents/docs#836
Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.