Skip to content

Commit 3baeead

Browse files
committed
Add cybersecurity knowledge graph tutorial and example
1 parent 0266a8b commit 3baeead

3 files changed

Lines changed: 359 additions & 0 deletions

File tree

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
Cybersecurity Knowledge Graph
2+
==============================
3+
4+
In this tutorial, we build a small cybersecurity knowledge graph using PyReason.
5+
We model network assets, the software they run, and real CVEs that affect that
6+
software. We then demonstrate how PyReason infers which assets are at risk and
7+
how it detects two types of inconsistency.
8+
9+
.. note::
10+
11+
Find the full, executable code `here <https://github.com/lab-v2/pyreason/blob/main/examples/cybersecurity_knowledge_graph_ex.py>`_
12+
13+
Background
14+
----------
15+
16+
A **CVE** (Common Vulnerabilities and Exposures) is a standardised ID for a
17+
known security vulnerability, for example ``CVE-2021-3156``.
18+
19+
A **CVSS score** rates the severity of a CVE on a 0--10 scale. We divide by 10
20+
to normalise it into the [0, 1] range used by PyReason annotation bounds.
21+
22+
The CVEs in this tutorial are real entries from the
23+
`National Vulnerability Database <https://nvd.nist.gov/vuln/search>`_:
24+
25+
+------------------+---------------------+-------+----------------------------------+
26+
| CVE ID | Software | CVSS | Description |
27+
+==================+=====================+=======+==================================+
28+
| CVE-2021-3156 | sudo 1.9.5p1 | 7.8 | Heap buffer overflow (CWE-121) |
29+
+------------------+---------------------+-------+----------------------------------+
30+
| CVE-2022-0185 | Linux Kernel 5.1 | 8.4 | Stack overflow (CWE-121) |
31+
+------------------+---------------------+-------+----------------------------------+
32+
| CVE-2022-26923 | OpenSSL 3.0.1 | 7.5 | Double free (CWE-415) |
33+
+------------------+---------------------+-------+----------------------------------+
34+
35+
Graph
36+
-----
37+
38+
The graph has three layers of nodes connected by directed edges:
39+
40+
.. code-block:: text
41+
42+
[asset] --runs--> [software] --has_cve--> [CVE]
43+
44+
.. code-block:: python
45+
46+
import pyreason as pr
47+
import networkx as nx
48+
49+
pr.reset()
50+
pr.reset_rules()
51+
52+
g = nx.DiGraph()
53+
54+
# Asset nodes
55+
g.add_nodes_from(['web_server', 'workstation_1', 'dev_server'])
56+
57+
# Software nodes
58+
g.add_nodes_from(['sudo_1_9_5p1', 'linux_kernel_5_1', 'openssl_3_0_1'])
59+
60+
# CVE nodes
61+
g.add_nodes_from(['CVE_2021_3156', 'CVE_2022_0185', 'CVE_2022_26923'])
62+
63+
# Which asset runs which software
64+
g.add_edge('web_server', 'sudo_1_9_5p1', runs=1)
65+
g.add_edge('workstation_1', 'linux_kernel_5_1', runs=1)
66+
g.add_edge('dev_server', 'openssl_3_0_1', runs=1)
67+
68+
# Which CVE affects which software
69+
g.add_edge('sudo_1_9_5p1', 'CVE_2021_3156', has_cve=1)
70+
g.add_edge('linux_kernel_5_1', 'CVE_2022_0185', has_cve=1)
71+
g.add_edge('openssl_3_0_1', 'CVE_2022_26923', has_cve=1)
72+
73+
We then configure PyReason and load the graph:
74+
75+
.. code-block:: python
76+
77+
pr.settings.verbose = True
78+
pr.settings.atom_trace = True
79+
pr.settings.inconsistency_check = True
80+
81+
pr.load_graph(g)
82+
83+
We declare ``vulnerable`` and ``patched`` as mutually exclusive predicates.
84+
Setting one automatically updates the other to its negated bound:
85+
86+
.. code-block:: python
87+
88+
pr.add_inconsistent_predicate('vulnerable', 'patched')
89+
90+
Rules
91+
-----
92+
93+
The rules we want to add are:
94+
95+
1. An asset is ``at_risk`` if it runs software that has a CVE.
96+
2. An asset that is ``at_risk`` is also ``vulnerable`` with confidence [0.8, 1.0].
97+
98+
.. code-block:: python
99+
100+
pr.add_rule(pr.Rule('at_risk(x) <- runs(x,y), has_cve(y,z)', 'exposure_rule'))
101+
pr.add_rule(pr.Rule('vulnerable(x):[0.8,1.0] <- at_risk(x)', 'vulnerability_rule'))
102+
103+
Facts
104+
-----
105+
106+
We seed the graph with CVE severity scores from NVD, normalised to [0, 1]:
107+
108+
.. code-block:: python
109+
110+
pr.add_fact(pr.Fact('severity(CVE_2021_3156):[0.78,0.78]', 'sudo_cve_severity', 0, 2))
111+
pr.add_fact(pr.Fact('severity(CVE_2022_0185):[0.84,0.84]', 'kernel_cve_severity', 0, 2))
112+
pr.add_fact(pr.Fact('severity(CVE_2022_26923):[0.75,0.75]', 'openssl_cve_severity', 0, 2))
113+
114+
Inconsistency Demo 1: Monotonic Reasoning Violation
115+
-----------------------------------------------------
116+
117+
PyReason's reasoning is monotonic -- bounds can only get tighter over time.
118+
Two data sources disagree on the severity of ``CVE_2021_3156`` with
119+
non-overlapping bounds, which PyReason cannot reconcile:
120+
121+
.. code-block:: python
122+
123+
pr.add_fact(pr.Fact('severity(CVE_2021_3156):[0.8,1.0]', 'severity_source_A', 0, 2))
124+
pr.add_fact(pr.Fact('severity(CVE_2021_3156):[0.0,0.1]', 'severity_source_B', 0, 2))
125+
126+
``[0.8, 1.0]`` and ``[0.0, 0.1]`` do not overlap. PyReason flags the conflict
127+
and resolves the annotation to ``[0.0, 1.0]`` (complete uncertainty).
128+
129+
Inconsistency Demo 2: Inconsistent Predicate List (IPL) Conflict
130+
-----------------------------------------------------------------
131+
132+
An asset management database says ``web_server`` is patched. A vulnerability
133+
scanner says it is vulnerable. Both assert high confidence:
134+
135+
.. code-block:: python
136+
137+
pr.add_fact(pr.Fact('patched(web_server):[0.9,1.0]', 'patch_db_fact', 0, 2))
138+
pr.add_fact(pr.Fact('vulnerable(web_server):[0.9,1.0]', 'vuln_scanner_fact', 0, 2))
139+
140+
Because ``vulnerable`` and ``patched`` are in the IPL, these two facts
141+
contradict each other. PyReason resolves both to ``[0.0, 1.0]`` and flags
142+
the conflict in the rule trace.
143+
144+
Running PyReason
145+
----------------
146+
147+
.. code-block:: python
148+
149+
interpretation = pr.reason(timesteps=2)
150+
151+
Expected Output
152+
---------------
153+
154+
**Assets at risk:**
155+
156+
.. code-block:: text
157+
158+
TIMESTEP 0:
159+
component at_risk
160+
0 web_server [1.0, 1.0]
161+
1 workstation_1 [1.0, 1.0]
162+
2 dev_server [1.0, 1.0]
163+
164+
All three assets are marked ``at_risk`` because each runs software with a
165+
known CVE.
166+
167+
**CVE severity (Demo 1):**
168+
169+
.. code-block:: text
170+
171+
TIMESTEP 0:
172+
component severity
173+
0 CVE_2022_0185 [0.84, 0.84]
174+
1 CVE_2022_26923 [0.75, 0.75]
175+
2 CVE_2021_3156 [0.0, 0.1]
176+
177+
The conflict on ``CVE_2021_3156`` is detected and logged in the rule trace.
178+
The other two CVEs retain their precise scores.
179+
180+
**Vulnerable / patched (Demo 2):**
181+
182+
.. code-block:: text
183+
184+
TIMESTEP 0:
185+
component vulnerable patched
186+
0 workstation_1 [0.8, 1.0] [0.0, 0.2]
187+
1 dev_server [0.8, 1.0] [0.0, 0.2]
188+
2 web_server [0.0, 1.0] [0.0, 1.0]
189+
190+
``web_server`` resolves to complete uncertainty on both predicates due to the
191+
IPL conflict. The other two assets show normal IPL behaviour -- setting
192+
``vulnerable:[0.8, 1.0]`` automatically forces ``patched`` to ``[0.0, 0.2]``.
193+
194+
The full rule trace can be saved for inspection:
195+
196+
.. code-block:: python
197+
198+
node_trace, edge_trace = pr.get_rule_trace(interpretation)
199+
pr.save_rule_trace(interpretation)

docs/source/tutorials/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,6 @@ Contents
1616
./custom_thresholds.rst
1717
./infer_edges.rst
1818
./annotation_function.rst
19+
./cybersecurity_knowledge_graph.rst
20+
1921

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
"""
2+
Example: Cybersecurity Knowledge Graph with Inconsistency Detection
3+
====================================================================
4+
This example builds a small cybersecurity knowledge graph containing
5+
network assets (servers and workstations), the software they run, and
6+
real CVEs (Common Vulnerabilities and Exposures) from the National
7+
Vulnerability Database (NVD) that affect that software.
8+
9+
CVSS scores are normalised to [0, 1] by dividing by 10 to serve as
10+
PyReason annotation bounds.
11+
12+
Graph structure:
13+
[asset] --runs--> [software] --has_cve--> [CVE]
14+
15+
Two types of inconsistency are demonstrated:
16+
17+
1. Same-predicate non-overlapping bounds (monotonic reasoning violation):
18+
Two data sources assert conflicting severity scores for the same CVE.
19+
The bounds do not overlap so PyReason resolves to [0.0, 1.0].
20+
21+
2. Inconsistent Predicate List (IPL) conflict:
22+
"vulnerable" and "patched" are declared as mutually exclusive.
23+
Asserting both for the same asset creates a contradiction which
24+
PyReason resolves to [0.0, 1.0] on both predicates.
25+
26+
Real CVEs used:
27+
- CVE-2021-3156 sudo 1.9.5p1 CVSS 7.8 CWE-121 (heap buffer overflow)
28+
- CVE-2022-0185 Linux Kernel 5.1 CVSS 8.4 CWE-121 (stack overflow)
29+
- CVE-2022-26923 OpenSSL 3.0.1 CVSS 7.5 CWE-415 (double free)
30+
"""
31+
32+
import pyreason as pr
33+
import networkx as nx
34+
35+
# Reset PyReason to a clean state
36+
pr.reset()
37+
pr.reset_rules()
38+
39+
# ================================ CREATE GRAPH ================================
40+
g = nx.DiGraph()
41+
42+
# Asset nodes -- servers and workstations in a small enterprise network
43+
g.add_nodes_from(['web_server', 'workstation_1', 'dev_server'])
44+
45+
# Software nodes -- specific vulnerable versions
46+
g.add_nodes_from(['sudo_1_9_5p1', 'linux_kernel_5_1', 'openssl_3_0_1'])
47+
48+
# CVE nodes -- real vulnerability identifiers from NVD
49+
g.add_nodes_from(['CVE_2021_3156', 'CVE_2022_0185', 'CVE_2022_26923'])
50+
51+
# Asset --> Software edges (which asset runs which software version)
52+
g.add_edge('web_server', 'sudo_1_9_5p1', runs=1)
53+
g.add_edge('workstation_1', 'linux_kernel_5_1', runs=1)
54+
g.add_edge('dev_server', 'openssl_3_0_1', runs=1)
55+
56+
# Software --> CVE edges (which CVE affects which software version)
57+
g.add_edge('sudo_1_9_5p1', 'CVE_2021_3156', has_cve=1)
58+
g.add_edge('linux_kernel_5_1', 'CVE_2022_0185', has_cve=1)
59+
g.add_edge('openssl_3_0_1', 'CVE_2022_26923', has_cve=1)
60+
61+
# ================================ CONFIGURE ===================================
62+
pr.settings.verbose = True
63+
pr.settings.atom_trace = True # Enable atom trace for full explainability
64+
pr.settings.inconsistency_check = True # Enable inconsistency detection (default)
65+
66+
# ================================ LOAD GRAPH ==================================
67+
pr.load_graph(g)
68+
69+
# Declare vulnerable and patched as inconsistent predicates
70+
# When vulnerable(x):[l,u] is set, PyReason automatically sets
71+
# patched(x):[1-u, 1-l] -- and vice versa
72+
pr.add_inconsistent_predicate('vulnerable', 'patched')
73+
74+
# ================================ ADD RULES ===================================
75+
# Rule 1: If an asset runs software that has a CVE, the asset is at risk
76+
# This is the core two-hop transitive inference: asset --> software --> CVE
77+
pr.add_rule(pr.Rule(
78+
'at_risk(x) <- runs(x,y), has_cve(y,z)',
79+
'exposure_rule'
80+
))
81+
82+
# Rule 2: An asset that is at risk is also vulnerable with high confidence
83+
# This chains off exposure_rule and also triggers the IPL for patched
84+
pr.add_rule(pr.Rule(
85+
'vulnerable(x):[0.8,1.0] <- at_risk(x)',
86+
'vulnerability_rule'
87+
))
88+
89+
# ================================ ADD FACTS ===================================
90+
# CVE severity scores from NVD, normalised to [0,1] by dividing by 10
91+
# CVE-2021-3156: CVSS 7.8 / 10 = 0.78
92+
pr.add_fact(pr.Fact('severity(CVE_2021_3156):[0.78,0.78]', 'sudo_cve_severity', 0, 2))
93+
# CVE-2022-0185: CVSS 8.4 / 10 = 0.84
94+
pr.add_fact(pr.Fact('severity(CVE_2022_0185):[0.84,0.84]', 'kernel_cve_severity', 0, 2))
95+
# CVE-2022-26923: CVSS 7.5 / 10 = 0.75
96+
pr.add_fact(pr.Fact('severity(CVE_2022_26923):[0.75,0.75]', 'openssl_cve_severity', 0, 2))
97+
98+
# ---- Inconsistency Demo 1: Monotonic reasoning violation ----
99+
# Two data sources disagree on the severity of CVE_2021_3156
100+
# [0.8, 1.0] and [0.0, 0.1] do not overlap -- PyReason flags the conflict
101+
# and resolves severity(CVE_2021_3156) to [0.0, 1.0] (complete uncertainty)
102+
pr.add_fact(pr.Fact('severity(CVE_2021_3156):[0.8,1.0]', 'severity_source_A', 0, 2))
103+
pr.add_fact(pr.Fact('severity(CVE_2021_3156):[0.0,0.1]', 'severity_source_B', 0, 2))
104+
105+
# ---- Inconsistency Demo 2: Inconsistent Predicate List (IPL) conflict ----
106+
# Asset management DB says web_server was patched -- high confidence
107+
pr.add_fact(pr.Fact('patched(web_server):[0.9,1.0]', 'patch_db_fact', 0, 2))
108+
# Vulnerability scanner says web_server is vulnerable -- also high confidence
109+
# Since vulnerable and patched are in the IPL, this creates a contradiction
110+
# PyReason resolves both to [0.0, 1.0] and logs the conflict in the trace
111+
pr.add_fact(pr.Fact('vulnerable(web_server):[0.9,1.0]', 'vuln_scanner_fact', 0, 2))
112+
113+
# ================================ REASON ======================================
114+
print('=' * 60)
115+
print('Running PyReason -- Cybersecurity Knowledge Graph')
116+
print('=' * 60)
117+
interpretation = pr.reason(timesteps=2)
118+
119+
# ================================ VIEW RESULTS ================================
120+
print('\n' + '=' * 60)
121+
print('Assets at risk (inferred by exposure_rule)')
122+
print('=' * 60)
123+
dataframes = pr.filter_and_sort_nodes(interpretation, ['at_risk'])
124+
for t, df in enumerate(dataframes):
125+
print(f'\nTIMESTEP {t}:')
126+
print(df)
127+
128+
print('\n' + '=' * 60)
129+
print('CVE Severity (Demo 1: monotonic violation on CVE_2021_3156)')
130+
print('=' * 60)
131+
dataframes = pr.filter_and_sort_nodes(interpretation, ['severity'])
132+
for t, df in enumerate(dataframes):
133+
print(f'\nTIMESTEP {t}:')
134+
print(df)
135+
136+
print('\n' + '=' * 60)
137+
print('Vulnerable / Patched (Demo 2: IPL conflict on web_server)')
138+
print('=' * 60)
139+
dataframes = pr.filter_and_sort_nodes(interpretation, ['vulnerable', 'patched'])
140+
for t, df in enumerate(dataframes):
141+
print(f'\nTIMESTEP {t}:')
142+
print(df)
143+
144+
# ================================ VIEW TRACE ==================================
145+
print('\n' + '=' * 60)
146+
print('Rule Trace (full explainability)')
147+
print('=' * 60)
148+
node_trace, edge_trace = pr.get_rule_trace(interpretation)
149+
print('\nNode trace:')
150+
print(node_trace.to_string())
151+
152+
if not edge_trace.empty:
153+
print('\nEdge trace:')
154+
print(edge_trace.to_string())
155+
156+
# Save the rule trace to CSV files for further inspection
157+
pr.save_rule_trace(interpretation)
158+
print('\nRule trace saved to current directory.')

0 commit comments

Comments
 (0)