Skip to content

Terraform template for infra setup for starting a migration#3867

Open
shreyakhajanchi wants to merge 7 commits into
GoogleCloudPlatform:mainfrom
shreyakhajanchi:infra-setup
Open

Terraform template for infra setup for starting a migration#3867
shreyakhajanchi wants to merge 7 commits into
GoogleCloudPlatform:mainfrom
shreyakhajanchi:infra-setup

Conversation

@shreyakhajanchi
Copy link
Copy Markdown
Contributor

@shreyakhajanchi shreyakhajanchi commented Jun 2, 2026

Description

This PR introduces a complete Terraform module that automates the end-to-end provisioning of sharded Cloud SQL databases, target Spanner instances, and the required dynamic configurations necessary for running the CDC Data Generator.

This has been tested against a Ck scale data set of - 128* 8 shards

✨ Key Features

  • Automated Sharded Topology:
    • Dynamically provisions user-defined physical Cloud SQL (MySQL/PostgreSQL) instances.
    • Automatically creates nested logical databases across the physical shards based on the logical_shards_count variable.
  • Target Spanner Provisioning:
    • Automates the creation of the target Spanner instance and database matching the requested regional processing units and SQL dialect.
  • Schema Initialization (import_schema.sh):
    • Automatically uploads a local schema.sql to GCS and securely imports it into every logical database.
    • Features robust exponential backoff and randomized jitter to handle high-parallelism deployments without triggering Cloud SQL Admin API rate limits (429 RESOURCE_EXHAUSTED) or IAM eventual consistency errors.
  • Secure Networking & Secrets Management:
    • Provisions private VPC networks and handles the complex lifecycle of VPC peering connections.
    • Automatically generates database passwords and stores them securely within Google Secret Manager.
  • Dynamic Config Generation:
    • Generates the shard_config.json and bulk_shard_config.json artifacts containing resolved IP addresses, connection strings, and Secret Manager references required by the Dataflow pipeline templates.
  • Clean Teardown Scripts:
    • Implements teardown_vpc_peering.sh and delete_spanner_backups.sh hooks that run during terraform destroy to bypass Google Cloud's native deletion locks and ensure a clean environment teardown.

📝 Documentation & Templates

  • Added comprehensive README.md detailing architecture, prerequisites, and execution commands.
  • Included terraform_simple.tfvars and terraform_advanced.tfvars templates to provide standardized "Getting Started" configurations.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.31%. Comparing base (3b2def5) to head (7956913).
⚠️ Report is 41 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3867      +/-   ##
============================================
+ Coverage     53.73%   63.31%   +9.58%     
+ Complexity     6743     2315    -4428     
============================================
  Files          1087      514     -573     
  Lines         66794    29929   -36865     
  Branches       7478     3298    -4180     
============================================
- Hits          35890    18950   -16940     
+ Misses        28477    10017   -18460     
+ Partials       2427      962    -1465     
Components Coverage Δ
spanner-templates 80.23% <ø> (+7.38%) ⬆️
spanner-import-export ∅ <ø> (∅)
spanner-live-forward-migration 90.16% <ø> (+9.21%) ⬆️
spanner-live-reverse-replication 83.91% <ø> (+6.81%) ⬆️
spanner-bulk-migration 92.58% <ø> (+1.47%) ⬆️
gcs-spanner-dv 90.39% <ø> (+4.62%) ⬆️
see 672 files with indirect coverage changes
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@shreyakhajanchi shreyakhajanchi added the addition New feature or request label Jun 2, 2026
@shreyakhajanchi shreyakhajanchi marked this pull request as ready for review June 3, 2026 10:54
@shreyakhajanchi shreyakhajanchi requested a review from a team as a code owner June 3, 2026 10:54
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request provides a robust Terraform-based infrastructure automation solution for setting up complex sharded database environments. It streamlines the provisioning of Cloud SQL and Spanner resources, handles schema distribution, and generates the necessary configuration artifacts for migration pipelines. The solution is designed to be resilient against common cloud API limitations and ensures a clean lifecycle management of resources.

Highlights

  • Automated Infrastructure Provisioning: Introduced a comprehensive Terraform module to automate the end-to-end setup of sharded Cloud SQL instances and target Spanner databases for migration testing.
  • Robust Schema Initialization: Implemented a schema import script with exponential backoff and jitter to handle high-parallelism deployments and avoid API rate limits or IAM consistency issues.
  • Configuration Management: Added automated generation of shard configuration files (JSON) and secure password management using Google Secret Manager.
  • Clean Teardown Hooks: Included specialized teardown scripts for VPC peering and Spanner backups to ensure clean environment destruction during terraform destroy.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Terraform-based infrastructure setup for source database (Cloud SQL) and target Spanner database migration testing, including automated schema importing and cleanup scripts. The review feedback highlights critical Terraform evaluation issues where resources with conditional counts (such as the private network and database password) are indexed directly, which will cause errors when they are not created. Additionally, the feedback suggests improving configuration flexibility and preventing deployment failures by dynamically defaulting the database version and Spanner region when they are not explicitly provided.

Comment thread v2/spanner-common/terraform/samples/infra-setup/main.tf
Comment thread v2/spanner-common/terraform/samples/infra-setup/main.tf
Comment thread v2/spanner-common/terraform/samples/infra-setup/main.tf
Comment thread v2/spanner-common/terraform/samples/infra-setup/main.tf
Comment thread v2/spanner-common/terraform/samples/infra-setup/main.tf
Comment thread v2/spanner-common/terraform/samples/infra-setup/variables.tf
Comment thread v2/spanner-common/terraform/samples/infra-setup/variables.tf
fi

# Add a random jitter of 1-5 seconds to prevent thundering herds across parallel shards
jitter=$(( RANDOM % 5 + 1 ))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will run sequentially, what are the scenarios where you would need this jitter ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While imports within a single physical instance are sequential, Terraform runs multiple physical instances in parallel (e.g., terraform apply -parallelism=100). If those 100 parallel bash scripts hit a 429 Rate Limit simultaneously and sleep for a fixed duration, they will all wake up at the exact same millisecond and hammer the API again, causing a 'thundering herd'. The random jitter staggers the sleep times across the parallel physical instances.

# SOURCE DATABASE (Cloud SQL)
# ------------------------------------------------------------------------------
database_provider = "MYSQL" # MYSQL or POSTGRES
database_version = "8_0" # MySQL: 8_0, 5_7 | Postgres: 14, 15, 16
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the default values ?

Comment thread v2/spanner-common/terraform/samples/infra-setup/main.tf
# Generate the Shard Config json file matching the Shard.java model properties
locals {
shards = [
for idx in range(var.physical_shards_count * var.logical_shards_count) : {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some logic for this in the sharded migration terraform setup a couple years back. We should look for opportunities to re-use some of that stuff to generate these files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we take it as input in the migration pipelines , hence there is no code for creating this file - checked live and reverse samples

one([for ip in google_sql_database_instance.instances[tostring(floor(idx / var.logical_shards_count))].ip_address : ip.ip_address if ip.type == "PRIVATE"]),
google_sql_database_instance.instances[tostring(floor(idx / var.logical_shards_count))].ip_address[0].ip_address
),
"127.0.0.1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure if this would ever be localhost. Can you check in what case this would be required ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During terraform plan, the Cloud SQL instances don't exist yet, so their ip_address lists are empty. Terraform's strict type evaluator crashes if it tries to index into an empty list. Providing a fallback string safely satisfies the type-checker during the planning phase.

Comment thread v2/spanner-common/terraform/samples/infra-setup/main.tf
## Step-by-Step Guide to Deploying

### Step 1: Prepare Your Local Database Structure
Create a local SQL file named `schema.sql` in this folder. Define the tables and columns you want to load into your source databases. For example:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not immediately relevant, but it will be interesting to see how this extends to schema less databases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it would be interesting in case of data generation , but for infra setup we could just skip this step in those cases

Comment thread v2/spanner-common/terraform/samples/infra-setup/README.md
Comment thread v2/spanner-common/terraform/samples/infra-setup/variables.tf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

addition New feature or request size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants