Skip to content

Teradata lineage and dependency graph analysis#292

Open
earthshiner wants to merge 5 commits intoTeradata:mainfrom
earthshiner:td_lineage_analysis
Open

Teradata lineage and dependency graph analysis#292
earthshiner wants to merge 5 commits intoTeradata:mainfrom
earthshiner:td_lineage_analysis

Conversation

@earthshiner
Copy link
Copy Markdown

Summary

This PR introduces a complete graph dependency analysis capability to the Teradata MCP Server, comprising seven new tools for directed graph traversal across Teradata object lineage and data pipelines.

It also includes two infrastructure changes to app.py and utils/__init__.py that were developed as prerequisites for the graph tools but affect the broader server. These are called out explicitly below for maintainer review.


⚠️ Infrastructure Changes Requiring Maintainer Review

These two changes are not graph-specific. They affect the entire server and should be reviewed independently of the graph tool additions.

1. utils/__init__.pycreate_response() returns dict instead of JSON string

Why: The MCP framework requires structured_content to be a dict (or None). The prior implementation returned a JSON string, which the framework wrapped in a [{"type": "text", ...}] list and rejected as malformed structured content. This caused silent response failures for tools that returned complex nested structures.

What changed:

  • create_response() now returns a dict instead of calling json.dumps()
  • New _make_serialisable() helper recursively converts all nested values to JSON-native Python types, ensuring None / SQL NULL values survive as None (JSON null) rather than the string "None"
  • serialize_teradata_types() gains an explicit None guard for the same reason
  • isinstance() calls updated to tuple form for Python 3.9 compatibility

Impact: Every tool that calls create_response() is affected. Existing tools that previously returned a JSON string will now return a dict — this is the correct behaviour per the MCP spec but is a breaking change for any caller that was parsing the JSON string directly.

2. app.pyget_tdconn() lazy factory removed; GRAPH_EDGE_CONTRACT resource registered

Why: The lazy get_tdconn() factory was a closure that deferred connection creation but added complexity with no clear benefit given that TDConn is already constructed eagerly at startup. Simplified to a direct tdconn = td.TDConn(settings=settings) call.

Note on registry system: The diff shows no registry system code in this branch. This is not a deletion — the registry system exists on the db-tool-registry branch and was never merged into main. There is nothing to restore.

Graph-specific addition: The graph://edge-contract MCP resource is registered when the graph_edge_contract resource pattern is present in the active profile. This serves the canonical Graph Edge Contract schema to AI agents so they understand the edge_repository parameter required by all graph tools.


New Capability: Graph Dependency Analysis Tools

Architecture

Seven tools covering the full graph analysis workflow, all stored-procedure-free. The only Teradata privilege required across the entire package is SELECT on an edge repository conforming to the Graph Edge Contract.

Step Tool Implementation Purpose
0 graph_edgeContractDDL Template Generate edge repository DDL — no DB connection required
1 graph_findRootObjects SQL Discover objects with no upstream dependencies
2 graph_bfsLevels Python BFS Wave planning, deployment sequencing, blast-radius sizing
3 graph_traceLineage Python + recursive CTE Full lineage tracing, impact path analysis
4 graph_detectCycles Python Union-Find + iterative DFS Circular reference detection, DAG validation
5 graph_connectedComponents Python Union-Find Graph partitioning, isolated sub-graph identification
6 graph_analyseDatabase Composite All four analyses in one call with one shared edge fetch

Graph Edge Contract

All tools operate on an edge repository — any Teradata table or view exposing six required columns:

Src_Container_Name  Src_Object_Name  Src_Kind
Tgt_Container_Name  Tgt_Object_Name  Tgt_Kind

Plus two optional enrichment columns for visualisation clients:

Edge_Relationship   Transformation_Type

Edge direction is consistent across all edge types — Src is always upstream, Tgt always downstream — with three context-specific readings:

Edge type Reading Example
Object dependency Src is referenced by Tgt CUSTOMER_TABLECUSTOMER_VIEW
ETL input Src is read by Tgt CUSTOMER_TABLEETL_LOAD_JOB
ETL output Src writes to Tgt ETL_LOAD_JOBCUSTOMER_FEATURES

The contract is served as an MCP resource at graph://edge-contract for agent discovery.

AI-Native Data Product Integration

The {ProductName}_Semantic.lineage_graph view (Observability Module v1.5 of the AI-Native Data Product standard) already conforms to this contract and can be used directly as edge_repository on any graph tool without generating DDL.

Progressive Disclosure Support

The package supports both MCP registration modes simultaneously:

  • Static mode: graph_tools.pyGRAPH_TOOLS list → registration at startup
  • Progressive Disclosure mode: __init__.py → ModuleLoader → ContextCatalog using docstrings

Supporting Infrastructure (Graph Wiring)

module_loader.py

Adds 'graph': 'teradata_mcp_server.tools.graph' to MODULE_MAP, enabling the ModuleLoader to discover and register graph tools when the graph prefix appears in a profile's tool patterns.

profiles.yml

Adds a graph profile entry:

graph:
  tool:
    - ^graph_.*
  prompt:
    - ^graph_.*
  resource:
    - ^graph_edge_contract$

Files Changed

Infrastructure (maintainer review requested)

File Change
tools/utils/__init__.py create_response() returns dict; _make_serialisable() added; NULL handling fixed
app.py get_tdconn() simplified; graph://edge-contract resource registered; minor cleanups

Graph wiring

File Change
tools/module_loader.py graph prefix added to MODULE_MAP
config/profiles.yml graph profile added

New graph package

File Purpose
tools/graph/__init__.py Re-exports all handle_* functions for ModuleLoader discovery
tools/graph/_graph_utils.py Shared utilities: parse_csv_patterns, build_like_or, BFS helpers
tools/graph/graph_tools.py Registration hub — all 7 tools in GRAPH_TOOLS list
tools/graph/graph_edge_contract.py DDL generator + canonical GRAPH_EDGE_CONTRACT text (MCP resource)
tools/graph/graph_findRootObjects.py Root object discovery
tools/graph/graph_bfsLevels.py Pure-Python BFS
tools/graph/graph_traceLineage.py Python + recursive CTE lineage analysis
tools/graph/graph_detectCycles.py Python Union-Find + iterative DFS
tools/graph/graph_connectedComponents.py Python Union-Find WCC
tools/graph/graph_analyseDatabase.py Composite single-fetch analysis
tools/graph/README.md Full tool reference documentation

Testing Notes

  • All tools are SP-free — no stored procedure deployment required to test.
  • graph_edgeContractDDL requires no database connection — DDL output can be verified without a Teradata instance.
  • edge_repository validation can be tested by calling any graph tool without the parameter and confirming the error message and convention hint.
  • The graph://edge-contract resource can be verified by fetching it via any MCP client with the graph profile active.
  • The create_response() change should be regression-tested against existing tools (base, dba, sec) to confirm response structure is unchanged from the MCP client's perspective.

earthshiner and others added 5 commits March 5, 2026 15:51
…dency analysis

Introduce a new MCP tool that provides comprehensive object dependency analysis
for Teradata databases with support for wildcards, CSV patterns, and bidirectional
dependency traversal.

Key Features:
- Analyses upstream dependencies (what an object depends on) and downstream
  dependencies (what depends on the object)
- Supports single objects, wildcard patterns (%), and CSV pattern lists
- Configurable traversal depth for both upstream (max_depth_up) and downstream
  (max_depth_down) analysis (0-10 levels)
- Server-side filtering with exclude_objects and include_containers parameters
- Returns dependency graph as nodes and edges for visualisation
- Multiple output formats: 'detailed', 'summary', 'edges_only'

Use Cases:
- Impact analysis: Determine blast radius before dropping/changing objects
- Data lineage tracing: Track upstream data sources
- Dependency discovery: Understand object relationships
- Pre-deployment validation: Assess impacts before changes
- Documentation: Map database object dependencies

Parameters:
- object_name (required): Object pattern(s) - supports wildcards and CSV
  Examples: 'DB.Table', '%WBC%.%', 'DB1.T1,DB2.T2'
- max_depth_up (default: 3): Upstream traversal depth (0-10)
- max_depth_down (default: 3): Downstream traversal depth (0-10)
- exclude_objects (default: ''): CSV patterns to exclude from analysis
- include_containers (default: ''): Whitelist of schemas/databases
- edge_repository (default: 'DEV_01_ODEX_STD_0_V.ODEXRepository'):
  ODEX repository table
- return_format (default: 'detailed'): Output format

Technical Implementation:
- Leverages ODEX repository for dependency metadata
- Uses STRTOK_SPLIT_TO_TABLE for server-side CSV parsing
- Automatic whitespace trimming of patterns
- Returns formatted response with dependency graph and metadata
- Performance optimised with proper exclusion patterns (20-50% reduction)

Example Usage:
  graph_queryDependenciesAgent(
    object_name="%WBC%.%,%StGeo%.%",
    max_depth_up=5,
    exclude_objects="PRD_%,TST_%"
  )

BREAKING CHANGE: None - new feature addition
…etectCycles, graph_connectedComponents and _graph_bfsLevels

replace QueryDependenciesAgent with QueryDependenciesAgentBatch (better performance).
Added findRootObjects to find source objects to start analysing downstream graphs
Added graph_detectCycles to identify circular references
Add graph_connectedComponents to identify groups of connected component (groups of closely related objects)
And Added graph_bfsLevels using a Breadth First Search  for use in Object Migration Wave planning
…dComponents, detectCycles, findRootObjects, edgeContract

- Replaced monolithic queryDependenciesAgent with modular graph tools
- Added _graph_utils shared utility module
- Removed graph_prompts.yml and legacy documentation
- Updated app.py and profiles.yml for graph tool registration
refactor(graph): compliance pass, contract v1.1, helper consolidation

BREAKING CHANGES
- graph_queryDependenciesAgent renamed to graph_traceLineage (file,
  function, constant, tool name string). Update any callers accordingly.
- graph_detectCycles: strategy and max_edges_for_cte parameters removed.
- graph_detectCycles, graph_connectedComponents: object_dependency_table
  renamed to edge_repository; excl_patterns renamed to exclude_objects.
- graph_edgeContractDDL: generated DDL column names corrected from
  SrcContainer/SrcObject/SrcKind to Src_Container_Name/Src_Object_Name/
  Src_Kind (and Tgt equivalents). Previously generated tables were
  incompatible with the tool SQL. Contract version bumped to 1.1.

PROGRESSIVE DISCLOSURE COMPLIANCE
- graph_tools.py: graph_analyseDatabase and graph_edgeContractDDL were
  missing from GRAPH_TOOLS. All 7 tools now registered in workflow order:
  edgeContractDDL → findRootObjects → bfsLevels → traceLineage →
  detectCycles → connectedComponents → analyseDatabase.
- GRAPH_EDGE_CONTRACT_DDL_TOOL descriptor added to graph_edge_contract.py
  (was absent entirely; tool was unregisterable in static mode).

TERMINOLOGY
- Remove all ODEX references from __init__.py and _graph_utils.py per
  standing instruction. Replaced with generic terms (dependency graph,
  object dependency graph).

LOGGING
- Replace all f-string logger calls with %s style throughout
  graph_findRootObjects.py (5 calls), graph_bfsLevels.py (1 call), and
  graph_edge_contract.py (2 calls, including logger.warning).
- Remove stray print() from graph_findRootObjects.py; replaced with
  logger.debug.

PARAMETER CHANGES
- edge_repository: runtime validation added to all 6 tools that accept it.
  Empty string now returns an early error with the AI-Native Data Product
  convention hint ({ProductName}_Semantic.lineage_graph).
- graph_bfsLevels, graph_traceLineage, graph_detectCycles,
  graph_connectedComponents: stale cross-references to
  graph_queryDependenciesAgent updated to graph_traceLineage throughout
  docstrings and descriptors.

GRAPH EDGE CONTRACT v1.1
- Column names corrected throughout: DDL, sample DML, view template,
  COMMENT ON COLUMN, canonical contract text, file header.
- Optional enrichment columns added: Edge_Relationship VARCHAR(50),
  Transformation_Type VARCHAR(50). Ignored by graph analysis tools;
  present in {ProductName}_Semantic.lineage_graph for visualisation
  clients. ADDITIONAL COLUMNS section updated accordingly.
- Src_Kind/Tgt_Kind COMPRESS lists expanded to cover both single-letter
  codes (T, V, P...) and full-word values (Table, View, Job...) to match
  lineage_graph output.
- Sample DML updated: basic examples use 6-column form; new ETL-job
  example demonstrates source→job→target two-leg pattern using all 8
  columns.
- View template updated: optional columns included as nullable
  CAST(NULL AS VARCHAR(50)) placeholders with mapping guidance.
- AI-Native Data Product convention documented in file header, contract
  text, docstring, descriptor, and all edge_repository error messages.

HELPER CONSOLIDATION (phase 1 — safe mechanical changes only)
- _graph_utils.py: add parse_csv_patterns() and build_like_or().
- Remove 7 local copies of parse_csv/_parse_csv_patterns (graph_
  analyseDatabase, graph_bfsLevels, graph_detectCycles, graph_connected
  Components, graph_traceLineage, graph_findRootObjects ×2); replace
  with shared import.
- Remove 3 local copies of _build_like_or/_build_like_clauses (graph_
  analyseDatabase, graph_detectCycles, graph_connectedComponents);
  replace with shared import.
- Deferred to phase 2: _UnionFind consolidation (recursion bug in
  graph_detectCycles.find()), _build_excl_* parameterisation.
… tool descriptor

GRAPH_CONNECTED_COMPONENTS_TOOL had a duplicate closing brace at line 481
in the parameters dict, causing a SyntaxError at import time. beartype's
import hook surfaced the error during package load, which caused the entire
graph package to fail silently — all seven graph tools were unregistered
with no server-side warning.

Removed the spurious  at line 481.

Root cause: raw dict tool descriptors have no structural validation at
definition time. A future refactor to dataclass-based ToolDescriptor would
catch this class of error at module load rather than requiring manual
import tracing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant