Skip to content

Add Superset asset management system with automated sync#1870

Merged
blarghmatey merged 2 commits into
mainfrom
feat/superset-asset-sync-docs
Jan 31, 2026
Merged

Add Superset asset management system with automated sync#1870
blarghmatey merged 2 commits into
mainfrom
feat/superset-asset-sync-docs

Conversation

@blarghmatey
Copy link
Copy Markdown
Member

@blarghmatey blarghmatey commented Jan 29, 2026

Overview

Implements complete asset management workflow for Apache Superset using the sup CLI with OAuth authentication and automated database UUID mapping.

New Features

  • Automated export/import of all Superset assets (datasets, charts, dashboards)
  • OAuth2 PKCE authentication with CSRF token support for self-hosted Superset
  • Automatic database UUID mapping between environments
  • Pagination support to fetch ALL assets (not limited to 50)
  • Environment-agnostic scripts (works with any source/target instance)
  • Continue-on-error for resilient imports

Scripts Added

  • export_all.sh: Export all assets from any Superset instance with pagination
  • sync_assets.sh: Sync assets between instances with UUID mapping
  • map_database_uuids.py: Automatic database UUID translation
  • validate_assets.sh: YAML validation (from previous workflow)
  • promote_to_production.sh: Legacy promotion script (from previous workflow)
  • export_from_qa.sh: Legacy QA export script (from previous workflow)

Assets Included

  • 76+ datasets from production Trino and Superset Metadata DB
  • 107+ charts covering all visualization types
  • 18+ published dashboards (enrollment, engagement, orders, etc.)
  • 2 database connection configs (Trino + Superset Metadata DB)

Documentation

  • WORKFLOWS.md: Complete workflow guide with diagrams
  • scripts/README.md: Script reference and troubleshooting
  • Both include examples for common operations

Technical Implementation

  • Uses sup CLI (fork: mitodl/superset-sup with self-hosted support)
  • CSRF token handling for POST requests (multipart/form-data)
  • Database UUID mapping: Production Trino → QA Trino translation
  • Pagination loops fetch 100 items per page until complete
  • Regex fallback for malformed JSON from CLI

Typical Workflows

  1. Production → QA sync (backup/mirroring)
  2. QA → Production promotion (after testing changes)
  3. Weekly backups with git version control

Related Work

Testing

✅ Tested and validated:

  • Full export from production (76 datasets, 107 charts, 18 dashboards)
  • Sync to QA with automatic UUID mapping
  • All assets successfully imported to QA
  • OAuth authentication with CSRF tokens working
  • Pagination fetches complete asset list (no 50-item limit)
  • Continue-on-error handles problematic assets gracefully

To run it yourself, run uv tool install git+https://github.com/mitodl/superset-sup@self_hosted_superset_oidc and then create a configuration file located at ~/.sup.config.yml with the following contents:

current_instance_name: superset-production
superset_instances:
  superset-production:
    auth_method: oauth
    oauth_authorization_url: https://sso.ol.mit.edu/realms/ol-data-platform/protocol/openid-connect/auth
    oauth_client_id: ol-superset-cli
    oauth_scope: openid profile email
    oauth_token_url: https://sso.ol.mit.edu/realms/ol-data-platform/protocol/openid-connect/token
    url: https://bi.ol.mit.edu
  superset-qa:
    auth_method: oauth
    oauth_authorization_url: https://sso-qa.ol.mit.edu/realms/ol-data-platform/protocol/openid-connect/auth
    oauth_client_id: ol-superset-cli
    oauth_scope: openid profile email
    oauth_token_url: https://sso-qa.ol.mit.edu/realms/ol-data-platform/protocol/openid-connect/token
    url: https://bi-qa.ol.mit.edu

Implements complete asset management workflow for Apache Superset using the
sup CLI with OAuth authentication and automated database UUID mapping.

New Features:
- Automated export/import of all Superset assets (datasets, charts, dashboards)
- OAuth2 PKCE authentication with CSRF token support
- Automatic database UUID mapping between environments
- Pagination support to fetch ALL assets (not limited to 50)
- Environment-agnostic scripts (works with any source/target instance)
- Continue-on-error for resilient imports

Scripts Added:
- export_all.sh: Export all assets from any Superset instance with pagination
- sync_assets.sh: Sync assets between instances with UUID mapping
- map_database_uuids.py: Automatic database UUID translation
- validate_assets.sh: YAML validation (from previous workflow)
- promote_to_production.sh: Legacy promotion script (from previous workflow)
- export_from_qa.sh: Legacy QA export script (from previous workflow)

Assets Included:
- 76+ datasets from production Trino and Superset Metadata DB
- 107+ charts covering all visualization types
- 18+ published dashboards (enrollment, engagement, orders, etc.)
- 2 database connection configs (Trino + Superset Metadata DB)

Documentation:
- WORKFLOWS.md: Complete workflow guide with diagrams
- scripts/README.md: Script reference and troubleshooting
- Both include examples for common operations

Technical Implementation:
- Uses sup CLI (fork: mitodl/superset-sup with self-hosted support)
- CSRF token handling for POST requests (multipart/form-data)
- Database UUID mapping: Production Trino → QA Trino translation
- Pagination loops fetch 100 items per page until complete
- Regex fallback for malformed JSON from CLI

Typical Workflows:
1. Production → QA sync (backup/mirroring)
2. QA → Production promotion (after testing changes)
3. Weekly backups with git version control

Related: Requires sup CLI from ~/src/superset-sup (see PR preset-io/superset-sup#19)
The --limit 1000 flags were preventing pagination from triggering.
Now the script relies on the pagination logic in pull commands to fetch
ALL assets without any artificial limit.

This works with the recent sup CLI fix that makes pagination trigger
when no limit is specified (filters.limit = None).

Result:
- Fetches all datasets via pagination
- Fetches all charts via pagination (123+ instead of 100)
- Fetches all dashboards via pagination
@blarghmatey blarghmatey force-pushed the feat/superset-asset-sync-docs branch from 6d2a604 to 00aec8a Compare January 29, 2026 21:17
Copy link
Copy Markdown
Contributor

@rachellougee rachellougee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I wasn't able to run the export script locally but I can circle back next week.

@blarghmatey blarghmatey merged commit 596d868 into main Jan 31, 2026
5 checks passed
@blarghmatey blarghmatey deleted the feat/superset-asset-sync-docs branch January 31, 2026 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants