-
Notifications
You must be signed in to change notification settings - Fork 34
Add fact parser validation, tests, and bulk fact loading with tests #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 16 commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
972b263
Add fact parser validation, tests, and bulk fact loading with tests
ColtonPayne e498f4a
Don't hardcode default values
ColtonPayne 0e2e395
Prevent predicates from starting with a digit
ColtonPayne cf592e2
Fix typo in MAKEFILE
ColtonPayne 0908471
Improve CSV loader tests
ColtonPayne 5a78cea
Add fact string formatting rules in docstring
ColtonPayne 83a2452
Remove extranious f string for linter
ColtonPayne ee0ea04
Fix api test file loading
ColtonPayne f1395dc
Add test for example with no header
ColtonPayne 0e3db89
Make invalid csv file loads raise exceptions by default
ColtonPayne a402321
Upd tests
ColtonPayne d1cb309
Add support for negated interval and negated explicit true/false
ColtonPayne b903281
Load facts from json instead of csv
ColtonPayne cad2d95
Update file loading tests
ColtonPayne 3eecf32
Revert
ColtonPayne 05b3748
Final cleanup
ColtonPayne 0c79988
Add back csv file loading and add duplicate name checks
ColtonPayne b1ea06c
Merge branch 'main' into input-validation
ColtonPayne 01a20a4
CSV Formatting
ColtonPayne 42d54e2
Add back load rules from file
ColtonPayne f4fbcb0
Merge branch 'input-validation' of github.com:lab-v2/pyreason into in…
ColtonPayne 3ba07ec
Requrie exact header match for csv headers
ColtonPayne 930c32b
Update examples and remove numeric string fact loading
ColtonPayne 4928d98
Merge branch 'main' into input-validation
ColtonPayne e14eb9f
Add back static string bulk csv
ColtonPayne 8fa0ef7
Merge branch 'input-validation' of github.com:lab-v2/pyreason into in…
ColtonPayne 27add9f
Merge branch 'main' into input-validation
ColtonPayne b975b08
Set default end_time to start_time
ColtonPayne a6fb069
Merge branch 'input-validation' of github.com:lab-v2/pyreason into in…
ColtonPayne File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,40 +1,163 @@ | ||
| import pyreason.scripts.numba_wrapper.numba_types.interval_type as interval | ||
| import re | ||
|
|
||
|
|
||
| # Input validation work was implemented with the help of Claude Sonnet 4.5. | ||
| def parse_fact(fact_text): | ||
| # Validate input is not empty or whitespace only | ||
| if not fact_text or not fact_text.strip(): | ||
| raise ValueError("Fact text cannot be empty or whitespace only") | ||
|
|
||
| f = fact_text.replace(' ', '') | ||
|
|
||
| # Check for multiple colons | ||
| colon_count = f.count(':') | ||
| if colon_count > 1: | ||
| raise ValueError(f"Fact text contains multiple colons ({colon_count}), expected at most 1") | ||
|
|
||
| # Check for double negation | ||
| if f.startswith('~~'): | ||
| raise ValueError("Double negation is not allowed") | ||
|
|
||
| # Separate into predicate-component and bound. If there is no bound it means it's true | ||
| negate_interval = False | ||
| if ':' in f: | ||
| pred_comp, bound = f.split(':') | ||
| parts = f.split(':') | ||
| if len(parts) != 2: | ||
| raise ValueError("Invalid fact format: expected at most one colon separator") | ||
| pred_comp, bound = parts | ||
|
|
||
| # Check for negation with explicit bound | ||
| if pred_comp.startswith('~'): | ||
| pred_comp = pred_comp[1:] | ||
| if bound.lower() == 'true': | ||
| bound = 'False' | ||
| elif bound.lower() == 'false': | ||
| bound = 'True' | ||
| else: | ||
| negate_interval = True | ||
| else: | ||
| pred_comp = f | ||
| if pred_comp[0] == '~': | ||
| if pred_comp.startswith('~'): | ||
| bound = 'False' | ||
| pred_comp = pred_comp[1:] | ||
| else: | ||
| bound = 'True' | ||
|
|
||
| # Check if bound is a boolean or a list of floats | ||
| bound = bound.lower() | ||
| if bound == 'true': | ||
| bound = interval.closed(1, 1) | ||
| elif bound == 'false': | ||
| bound = interval.closed(0, 0) | ||
| else: | ||
| bound = [float(b) for b in bound[1:-1].split(',')] | ||
| bound = interval.closed(*bound) | ||
| # Validate predicate-component is not empty | ||
| if not pred_comp: | ||
| raise ValueError("Predicate-component cannot be empty") | ||
|
|
||
| # Validate parentheses exist and are properly formed | ||
| if '(' not in pred_comp: | ||
| raise ValueError("Missing opening parenthesis in fact") | ||
| if ')' not in pred_comp: | ||
| raise ValueError("Missing closing parenthesis in fact") | ||
|
|
||
| # Check for nested or multiple parentheses | ||
| open_count = pred_comp.count('(') | ||
| close_count = pred_comp.count(')') | ||
| if open_count != 1 or close_count != 1: | ||
| raise ValueError(f"Invalid parentheses: found {open_count} '(' and {close_count} ')', expected exactly 1 of each") | ||
|
|
||
| # Check parentheses are in correct order | ||
| open_idx = pred_comp.find('(') | ||
| close_idx = pred_comp.find(')') | ||
| if open_idx >= close_idx: | ||
| raise ValueError("Invalid parentheses order: '(' must come before ')'") | ||
|
|
||
| # Check closing parenthesis is at the end | ||
| if close_idx != len(pred_comp) - 1: | ||
| raise ValueError("Closing parenthesis must be at the end of predicate-component") | ||
|
|
||
| # Split the predicate and component | ||
| idx = pred_comp.find('(') | ||
| pred = pred_comp[:idx] | ||
| component = pred_comp[idx + 1:-1] | ||
|
|
||
| # Validate predicate is not empty | ||
| if not pred: | ||
| raise ValueError("Predicate cannot be empty") | ||
|
|
||
| # Validate predicate contains only valid characters (alphanumeric and underscore) | ||
| # Predicates must start with a letter or underscore (like Python identifiers) | ||
| if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', pred): | ||
| if pred[0].isdigit(): | ||
| raise ValueError(f"Predicate '{pred}' cannot start with a digit. Must start with a letter or underscore") | ||
| else: | ||
| raise ValueError(f"Predicate '{pred}' contains invalid characters. Only letters, digits, and underscores allowed, must start with letter or underscore") | ||
|
|
||
| # Validate component is not empty | ||
| if not component: | ||
| raise ValueError("Component cannot be empty") | ||
|
|
||
| # Check for invalid characters in component | ||
| if '(' in component or ')' in component: | ||
| raise ValueError("Component cannot contain parentheses") | ||
| if ':' in component: | ||
| raise ValueError("Component cannot contain colons") | ||
|
|
||
| # Check if it is a node or edge fact | ||
| if ',' in component: | ||
| fact_type = 'edge' | ||
| component = tuple(component.split(',')) | ||
| components = component.split(',') | ||
|
|
||
| # Validate exactly 2 components for edges | ||
| if len(components) != 2: | ||
| raise ValueError(f"Edge facts must have exactly 2 components, found {len(components)}") | ||
|
|
||
| # Validate no empty components | ||
| for i, comp in enumerate(components): | ||
| if not comp: | ||
| raise ValueError(f"Component {i+1} in edge fact cannot be empty") | ||
|
|
||
| component = tuple(components) | ||
| else: | ||
| fact_type = 'node' | ||
|
|
||
|
kmukherji marked this conversation as resolved.
|
||
| # Check if bound is a boolean or a list of floats | ||
| if bound.lower() == 'true': | ||
| bound = interval.closed(1, 1) | ||
| elif bound.lower() == 'false': | ||
| bound = interval.closed(0, 0) | ||
| else: | ||
| # Validate interval format | ||
| if not bound.startswith('['): | ||
| raise ValueError(f"Invalid bound format: expected '[' at start of interval, got '{bound[0] if bound else 'empty'}'") | ||
| if not bound.endswith(']'): | ||
| raise ValueError(f"Invalid bound format: expected ']' at end of interval, got '{bound[-1] if bound else 'empty'}'") | ||
|
|
||
| # Extract values between brackets | ||
| interval_content = bound[1:-1] | ||
| if not interval_content: | ||
| raise ValueError("Interval cannot be empty") | ||
|
|
||
| # Parse float values | ||
| parts = interval_content.split(',') | ||
| if len(parts) != 2: | ||
| raise ValueError(f"Interval must have exactly 2 values, found {len(parts)}") | ||
|
|
||
| try: | ||
| bound_values = [float(b) for b in parts] | ||
| except ValueError as e: | ||
| raise ValueError(f"Invalid interval values: {e}") | ||
|
|
||
| lower, upper = bound_values | ||
| # Validate bounds are in valid range [0, 1] | ||
| if lower < 0 or lower > 1: | ||
| raise ValueError(f"Interval lower bound {lower} is out of valid range [0, 1]") | ||
| if upper < 0 or upper > 1: | ||
| raise ValueError(f"Interval upper bound {upper} is out of valid range [0, 1]") | ||
|
|
||
| # Validate lower <= upper | ||
| if lower > upper: | ||
| raise ValueError(f"Interval lower bound {lower} cannot be greater than upper bound {upper}") | ||
|
|
||
| # We calculate ~[l,u] = [1-u, 1-l] | ||
| # Round to eliminate floating point precision errors (e.g., 1 - 0.8 = 0.19999999...) | ||
| if negate_interval: | ||
| lower, upper = round(1 - upper, 10), round(1 - lower, 10) | ||
|
|
||
| bound = interval.closed(lower, upper) | ||
|
|
||
|
kmukherji marked this conversation as resolved.
|
||
| return pred, component, bound, fact_type | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.