Teradata lineage and dependency graph analysis#292
Open
earthshiner wants to merge 5 commits intoTeradata:mainfrom
Open
Teradata lineage and dependency graph analysis#292earthshiner wants to merge 5 commits intoTeradata:mainfrom
earthshiner wants to merge 5 commits intoTeradata:mainfrom
Conversation
…dency analysis
Introduce a new MCP tool that provides comprehensive object dependency analysis
for Teradata databases with support for wildcards, CSV patterns, and bidirectional
dependency traversal.
Key Features:
- Analyses upstream dependencies (what an object depends on) and downstream
dependencies (what depends on the object)
- Supports single objects, wildcard patterns (%), and CSV pattern lists
- Configurable traversal depth for both upstream (max_depth_up) and downstream
(max_depth_down) analysis (0-10 levels)
- Server-side filtering with exclude_objects and include_containers parameters
- Returns dependency graph as nodes and edges for visualisation
- Multiple output formats: 'detailed', 'summary', 'edges_only'
Use Cases:
- Impact analysis: Determine blast radius before dropping/changing objects
- Data lineage tracing: Track upstream data sources
- Dependency discovery: Understand object relationships
- Pre-deployment validation: Assess impacts before changes
- Documentation: Map database object dependencies
Parameters:
- object_name (required): Object pattern(s) - supports wildcards and CSV
Examples: 'DB.Table', '%WBC%.%', 'DB1.T1,DB2.T2'
- max_depth_up (default: 3): Upstream traversal depth (0-10)
- max_depth_down (default: 3): Downstream traversal depth (0-10)
- exclude_objects (default: ''): CSV patterns to exclude from analysis
- include_containers (default: ''): Whitelist of schemas/databases
- edge_repository (default: 'DEV_01_ODEX_STD_0_V.ODEXRepository'):
ODEX repository table
- return_format (default: 'detailed'): Output format
Technical Implementation:
- Leverages ODEX repository for dependency metadata
- Uses STRTOK_SPLIT_TO_TABLE for server-side CSV parsing
- Automatic whitespace trimming of patterns
- Returns formatted response with dependency graph and metadata
- Performance optimised with proper exclusion patterns (20-50% reduction)
Example Usage:
graph_queryDependenciesAgent(
object_name="%WBC%.%,%StGeo%.%",
max_depth_up=5,
exclude_objects="PRD_%,TST_%"
)
BREAKING CHANGE: None - new feature addition
…etectCycles, graph_connectedComponents and _graph_bfsLevels replace QueryDependenciesAgent with QueryDependenciesAgentBatch (better performance). Added findRootObjects to find source objects to start analysing downstream graphs Added graph_detectCycles to identify circular references Add graph_connectedComponents to identify groups of connected component (groups of closely related objects) And Added graph_bfsLevels using a Breadth First Search for use in Object Migration Wave planning
…dComponents, detectCycles, findRootObjects, edgeContract - Replaced monolithic queryDependenciesAgent with modular graph tools - Added _graph_utils shared utility module - Removed graph_prompts.yml and legacy documentation - Updated app.py and profiles.yml for graph tool registration
refactor(graph): compliance pass, contract v1.1, helper consolidation
BREAKING CHANGES
- graph_queryDependenciesAgent renamed to graph_traceLineage (file,
function, constant, tool name string). Update any callers accordingly.
- graph_detectCycles: strategy and max_edges_for_cte parameters removed.
- graph_detectCycles, graph_connectedComponents: object_dependency_table
renamed to edge_repository; excl_patterns renamed to exclude_objects.
- graph_edgeContractDDL: generated DDL column names corrected from
SrcContainer/SrcObject/SrcKind to Src_Container_Name/Src_Object_Name/
Src_Kind (and Tgt equivalents). Previously generated tables were
incompatible with the tool SQL. Contract version bumped to 1.1.
PROGRESSIVE DISCLOSURE COMPLIANCE
- graph_tools.py: graph_analyseDatabase and graph_edgeContractDDL were
missing from GRAPH_TOOLS. All 7 tools now registered in workflow order:
edgeContractDDL → findRootObjects → bfsLevels → traceLineage →
detectCycles → connectedComponents → analyseDatabase.
- GRAPH_EDGE_CONTRACT_DDL_TOOL descriptor added to graph_edge_contract.py
(was absent entirely; tool was unregisterable in static mode).
TERMINOLOGY
- Remove all ODEX references from __init__.py and _graph_utils.py per
standing instruction. Replaced with generic terms (dependency graph,
object dependency graph).
LOGGING
- Replace all f-string logger calls with %s style throughout
graph_findRootObjects.py (5 calls), graph_bfsLevels.py (1 call), and
graph_edge_contract.py (2 calls, including logger.warning).
- Remove stray print() from graph_findRootObjects.py; replaced with
logger.debug.
PARAMETER CHANGES
- edge_repository: runtime validation added to all 6 tools that accept it.
Empty string now returns an early error with the AI-Native Data Product
convention hint ({ProductName}_Semantic.lineage_graph).
- graph_bfsLevels, graph_traceLineage, graph_detectCycles,
graph_connectedComponents: stale cross-references to
graph_queryDependenciesAgent updated to graph_traceLineage throughout
docstrings and descriptors.
GRAPH EDGE CONTRACT v1.1
- Column names corrected throughout: DDL, sample DML, view template,
COMMENT ON COLUMN, canonical contract text, file header.
- Optional enrichment columns added: Edge_Relationship VARCHAR(50),
Transformation_Type VARCHAR(50). Ignored by graph analysis tools;
present in {ProductName}_Semantic.lineage_graph for visualisation
clients. ADDITIONAL COLUMNS section updated accordingly.
- Src_Kind/Tgt_Kind COMPRESS lists expanded to cover both single-letter
codes (T, V, P...) and full-word values (Table, View, Job...) to match
lineage_graph output.
- Sample DML updated: basic examples use 6-column form; new ETL-job
example demonstrates source→job→target two-leg pattern using all 8
columns.
- View template updated: optional columns included as nullable
CAST(NULL AS VARCHAR(50)) placeholders with mapping guidance.
- AI-Native Data Product convention documented in file header, contract
text, docstring, descriptor, and all edge_repository error messages.
HELPER CONSOLIDATION (phase 1 — safe mechanical changes only)
- _graph_utils.py: add parse_csv_patterns() and build_like_or().
- Remove 7 local copies of parse_csv/_parse_csv_patterns (graph_
analyseDatabase, graph_bfsLevels, graph_detectCycles, graph_connected
Components, graph_traceLineage, graph_findRootObjects ×2); replace
with shared import.
- Remove 3 local copies of _build_like_or/_build_like_clauses (graph_
analyseDatabase, graph_detectCycles, graph_connectedComponents);
replace with shared import.
- Deferred to phase 2: _UnionFind consolidation (recursion bug in
graph_detectCycles.find()), _build_excl_* parameterisation.
… tool descriptor GRAPH_CONNECTED_COMPONENTS_TOOL had a duplicate closing brace at line 481 in the parameters dict, causing a SyntaxError at import time. beartype's import hook surfaced the error during package load, which caused the entire graph package to fail silently — all seven graph tools were unregistered with no server-side warning. Removed the spurious at line 481. Root cause: raw dict tool descriptors have no structural validation at definition time. A future refactor to dataclass-based ToolDescriptor would catch this class of error at module load rather than requiring manual import tracing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a complete graph dependency analysis capability to the Teradata MCP Server, comprising seven new tools for directed graph traversal across Teradata object lineage and data pipelines.
It also includes two infrastructure changes to
app.pyandutils/__init__.pythat were developed as prerequisites for the graph tools but affect the broader server. These are called out explicitly below for maintainer review.These two changes are not graph-specific. They affect the entire server and should be reviewed independently of the graph tool additions.
1.
utils/__init__.py—create_response()returnsdictinstead of JSON stringWhy: The MCP framework requires
structured_contentto be adict(orNone). The prior implementation returned a JSON string, which the framework wrapped in a[{"type": "text", ...}]list and rejected as malformed structured content. This caused silent response failures for tools that returned complex nested structures.What changed:
create_response()now returns adictinstead of callingjson.dumps()_make_serialisable()helper recursively converts all nested values to JSON-native Python types, ensuringNone/ SQL NULL values survive asNone(JSONnull) rather than the string"None"serialize_teradata_types()gains an explicitNoneguard for the same reasonisinstance()calls updated to tuple form for Python 3.9 compatibilityImpact: Every tool that calls
create_response()is affected. Existing tools that previously returned a JSON string will now return a dict — this is the correct behaviour per the MCP spec but is a breaking change for any caller that was parsing the JSON string directly.2.
app.py—get_tdconn()lazy factory removed;GRAPH_EDGE_CONTRACTresource registeredWhy: The lazy
get_tdconn()factory was a closure that deferred connection creation but added complexity with no clear benefit given thatTDConnis already constructed eagerly at startup. Simplified to a directtdconn = td.TDConn(settings=settings)call.Note on registry system: The diff shows no registry system code in this branch. This is not a deletion — the registry system exists on the
db-tool-registrybranch and was never merged intomain. There is nothing to restore.Graph-specific addition: The
graph://edge-contractMCP resource is registered when thegraph_edge_contractresource pattern is present in the active profile. This serves the canonical Graph Edge Contract schema to AI agents so they understand theedge_repositoryparameter required by all graph tools.New Capability: Graph Dependency Analysis Tools
Architecture
Seven tools covering the full graph analysis workflow, all stored-procedure-free. The only Teradata privilege required across the entire package is
SELECTon an edge repository conforming to the Graph Edge Contract.graph_edgeContractDDLgraph_findRootObjectsgraph_bfsLevelsgraph_traceLineagegraph_detectCyclesgraph_connectedComponentsgraph_analyseDatabaseGraph Edge Contract
All tools operate on an edge repository — any Teradata table or view exposing six required columns:
Plus two optional enrichment columns for visualisation clients:
Edge direction is consistent across all edge types — Src is always upstream, Tgt always downstream — with three context-specific readings:
CUSTOMER_TABLE→CUSTOMER_VIEWCUSTOMER_TABLE→ETL_LOAD_JOBETL_LOAD_JOB→CUSTOMER_FEATURESThe contract is served as an MCP resource at
graph://edge-contractfor agent discovery.AI-Native Data Product Integration
The
{ProductName}_Semantic.lineage_graphview (Observability Module v1.5 of the AI-Native Data Product standard) already conforms to this contract and can be used directly asedge_repositoryon any graph tool without generating DDL.Progressive Disclosure Support
The package supports both MCP registration modes simultaneously:
graph_tools.py→GRAPH_TOOLSlist → registration at startup__init__.py→ ModuleLoader →ContextCatalogusing docstringsSupporting Infrastructure (Graph Wiring)
module_loader.pyAdds
'graph': 'teradata_mcp_server.tools.graph'toMODULE_MAP, enabling the ModuleLoader to discover and register graph tools when thegraphprefix appears in a profile's tool patterns.profiles.ymlAdds a
graphprofile entry:Files Changed
Infrastructure (maintainer review requested)
tools/utils/__init__.pycreate_response()returns dict;_make_serialisable()added; NULL handling fixedapp.pyget_tdconn()simplified;graph://edge-contractresource registered; minor cleanupsGraph wiring
tools/module_loader.pygraphprefix added toMODULE_MAPconfig/profiles.ymlgraphprofile addedNew graph package
tools/graph/__init__.pyhandle_*functions for ModuleLoader discoverytools/graph/_graph_utils.pyparse_csv_patterns,build_like_or, BFS helperstools/graph/graph_tools.pyGRAPH_TOOLSlisttools/graph/graph_edge_contract.pyGRAPH_EDGE_CONTRACTtext (MCP resource)tools/graph/graph_findRootObjects.pytools/graph/graph_bfsLevels.pytools/graph/graph_traceLineage.pytools/graph/graph_detectCycles.pytools/graph/graph_connectedComponents.pytools/graph/graph_analyseDatabase.pytools/graph/README.mdTesting Notes
graph_edgeContractDDLrequires no database connection — DDL output can be verified without a Teradata instance.edge_repositoryvalidation can be tested by calling any graph tool without the parameter and confirming the error message and convention hint.graph://edge-contractresource can be verified by fetching it via any MCP client with thegraphprofile active.create_response()change should be regression-tested against existing tools (base, dba, sec) to confirm response structure is unchanged from the MCP client's perspective.