-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Terraform template for infra setup for starting a migration #3867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
shreyakhajanchi
wants to merge
7
commits into
GoogleCloudPlatform:main
Choose a base branch
from
shreyakhajanchi:infra-setup
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,064
−0
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
bbfa0c0
Infrastructure setup - first draft
shreyakhajanchi c6662ad
updating infra setup code
shreyakhajanchi 9782ecd
tfvars fi;e
shreyakhajanchi f96812a
some fixes
shreyakhajanchi fb0d866
minor changes
shreyakhajanchi 3443fd1
tested against CK setup
shreyakhajanchi 7956913
some fixes
shreyakhajanchi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
179 changes: 179 additions & 0 deletions
179
v2/spanner-common/terraform/samples/infra-setup/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,179 @@ | ||
| # Source Database & Spanner Target Setup for Migration Testing | ||
|
|
||
| This folder contains Terraform configuration files to automatically set up, configure, and clean up database resources on Google Cloud Platform (GCP). | ||
|
|
||
| This setup is designed to help you prepare and test database migration pipelines. It automatically creates: | ||
| 1. One or more **source database instances** using Google Cloud SQL (either MySQL or PostgreSQL). | ||
| 2. Inside those database instances, it creates multiple **logical databases (shards)**. | ||
| 3. It imports a database table structure (your SQL schema) from a local file into all created logical databases. | ||
| 4. A **target Cloud Spanner database instance**. | ||
| 5. Two **sharding configuration files** (`shard-config.json` and `bulk-config.json`) that list the host IP, database name, and credentials for all created database shards. You can pass either file directly as an input parameter to your Dataflow migration jobs. | ||
|
|
||
| --- | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| Before you begin, make sure your computer has the following installed and configured: | ||
|
|
||
| 1. **Terraform CLI** (Version 1.2.0 or newer) | ||
| 2. **Google Cloud SDK (`gcloud` CLI)**: Installed, logged in, and set up with your project: | ||
| ```bash | ||
| gcloud auth login | ||
| gcloud auth application-default login | ||
| ``` | ||
| 3. **Python 3** (installed and accessible from your command line) | ||
| 4. **Google Cloud Project** with billing enabled. | ||
|
|
||
| --- | ||
|
|
||
| ## How the Automated Scripts Work | ||
|
|
||
| This setup includes several helper scripts in the `scripts/` folder to handle database loading, cleanup, and state reconciliation. | ||
|
|
||
| ### 1. Database Schema Loader (`scripts/import_schema.sh`) | ||
| Once the Cloud SQL database instances are created, Terraform runs this bash script **once per physical instance** (the import step uses `for_each`), so a failure on one instance only re-imports that instance on the next apply instead of all of them. Each run reads your local SQL structure file (like `schema.sql`) and imports it sequentially into that instance's logical databases (Cloud SQL allows only one import at a time per instance); Terraform runs the instances in parallel. | ||
| * **Why the retries are needed:** The bucket grants each Cloud SQL instance's service account read access just before the import runs, but IAM changes take a few seconds to propagate across Google Cloud. An import attempted in that window fails with a permission error. To handle this, the script retries each import up to 6 times (waiting 10 seconds between attempts) until the permission propagates and the schema loads successfully. | ||
|
|
||
| ### 2. Spanner Backup Cleanup (`scripts/delete_spanner_backups.sh`) | ||
| When you run `terraform destroy` to delete your setup, Google Cloud Spanner will refuse to delete the database instance if there are any automatic database backups present. This script automatically finds and deletes all backups for the Spanner instance right before Terraform deletes the instance. | ||
|
|
||
| ### 3. Private Connection Cleanup (`scripts/teardown_vpc_peering.sh`) | ||
| If you configure your databases to use private IPs instead of public IPs, Google Cloud creates private networking connections between your network and Cloud SQL. When deleting this infrastructure, Google Cloud occasionally takes time to release these connections. This script cleanly deletes the private network connection using the `gcloud` tool, or safely bypasses it if there are other active resources still using the connection. | ||
|
|
||
|
|
||
| --- | ||
|
|
||
| ## Step-by-Step Guide to Deploying | ||
|
|
||
| ### Step 1: Prepare Your Local Database Structure | ||
| Create a local SQL file named `schema.sql` in this folder. Define the tables and columns you want to load into your source databases. For example: | ||
| ```sql | ||
| CREATE TABLE users ( | ||
| id INT PRIMARY KEY, | ||
| name VARCHAR(100), | ||
| email VARCHAR(100) | ||
| ); | ||
| ``` | ||
|
|
||
| ### Step 2: Configure Your Variables | ||
| There are two variable sample files provided: | ||
| 1. **`terraform_simple.tfvars` (Recommended for beginners)**: A simple, minimal configuration containing only the most important variables. It leverages the automated prefix generation. | ||
| 2. **`terraform.tfvars`**: A comprehensive variable template containing all available settings (such as database user, password, network CIDRs, tags, Spanner processing units). | ||
|
|
||
| #### Key Naming Variables: | ||
| * **`instance_prefix` (Optional)**: A string prefixed to physical database instances and target Spanner instances. If not provided, a unique random pet name of the form `smt-<word>-<word>` (e.g. `smt-clever-mongoose`) is generated automatically. | ||
| * **`migration_prefix` (Optional)**: A string prefixed to other resources like VPC networks, subnets, Secret Manager secrets, and GCS schema buckets. If not provided, a unique random pet name of the form `smt-<word>-<word>` is generated automatically. | ||
| * **`spanner_instance_name` / `spanner_database_name` (Optional)**: Overrides the target Spanner instance and database names completely. If left blank, they are dynamically derived from your `instance_prefix` and `migration_prefix` respectively. | ||
|
|
||
| Open `terraform_simple.tfvars` or `terraform.tfvars`, replace the placeholders (like `<PROJECT_ID>`) with your actual values, and save the file. | ||
|
|
||
| ### Step 3: Initialize and Deploy | ||
|
|
||
| Run the following commands in your terminal: | ||
|
|
||
| ```bash | ||
| # 1. Download necessary Terraform providers and plugins | ||
| terraform init | ||
|
|
||
| # 2. Deploy the databases and generate the configuration | ||
| # Note: For large scale deployments (e.g., 128 shards), you MUST use the -parallelism flag | ||
| # for faster resource creation (default is 10). | ||
| terraform apply -parallelism=100 --var-file=terraform_simple.tfvars | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Outputs & Results | ||
|
|
||
| Once the deployment completes successfully, Terraform will print the resource details on your screen and generate two sharding configuration files in this directory: | ||
|
|
||
| ### 1. Regular Shard Config Format (`shard-config.json`) | ||
| ```json | ||
| [ | ||
| { | ||
| "logicalShardId": "shard-0", | ||
| "host": "198.51.100.5", | ||
|
shreyakhajanchi marked this conversation as resolved.
|
||
| "port": "3306", | ||
| "user": "migration_user", | ||
| "password": null, | ||
| "dbName": "shard_db_0", | ||
| "namespace": "public", | ||
| "secretManagerUri": "projects/my-gcp-project/secrets/smt_clever_mongoose_db_password/versions/latest", | ||
| "connectionProperties": "jdbcCompliantTruncation=true" | ||
| } | ||
| ] | ||
| ``` | ||
|
|
||
| ### 2. Bulk Shard Config Format (`bulk-config.json`) | ||
| ```json | ||
| { | ||
| "shardConfigurationBulk": { | ||
| "dataShards": [ | ||
| { | ||
| "host": "198.51.100.5", | ||
| "port": 3306, | ||
| "user": "migration_user", | ||
| "password": null, | ||
| "secretManagerUri": "projects/my-gcp-project/secrets/smt_clever_mongoose_db_password/versions/latest", | ||
| "connectionProperties": "jdbcCompliantTruncation=true", | ||
| "namespace": "public", | ||
| "databases": [ | ||
| { | ||
| "dbName": "shard_db_0", | ||
| "databaseId": "shard-0" | ||
| }, | ||
| { | ||
| "dbName": "shard_db_1", | ||
| "databaseId": "shard-1" | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### Handling Creation Timeouts & Operation Dropouts | ||
| When deploying a high number of physical database instances concurrently (e.g., 128 shards), you may occasionally encounter a transient timeout or polling connection dropout error from the Google Cloud API: | ||
| ``` | ||
| Error: Error waiting for Create Instance: ... | ||
| ``` | ||
| Or when running `terraform apply` again after a timeout: | ||
| ``` | ||
| Error: Error, failed to create instance ...: googleapi: Error 409: The Cloud SQL instance already exists., instanceAlreadyExists | ||
| ``` | ||
|
|
||
| #### Why this happens: | ||
| When Terraform requests the creation of 100+ databases, Google Cloud schedules their creation asynchronously in the background. If the local Terraform process loses connection to the GCP Operation API or hits a client-side wait timeout, Terraform aborts the command and **fails to save those specific instances to your local `terraform.tfstate` file**, even though the creation continues successfully in the background on Google's servers. | ||
|
|
||
| #### How to resolve this: | ||
| 1. **Verify creation in GCP**: Run this CLI command to confirm that the instances are active and running on Google Cloud: | ||
| ```bash | ||
| gcloud sql instances list --project="<YOUR_PROJECT_ID>" --filter="name~smt-sharded" | ||
| ``` | ||
| 2. **Import the affected instances into Terraform State**: For any instances that were successfully created on GCP but are missing from your local state file (causing `409 Already Exists` errors), import them manually back into Terraform. The instances use `for_each`, so the resource address is keyed by the shard index **as a quoted string** (e.g. `["18"]`, not `[18]`): | ||
| ```bash | ||
| terraform import --var-file=terraform_simple.tfvars 'google_sql_database_instance.instances["<INDEX>"]' "projects/<YOUR_PROJECT_ID>/instances/<INSTANCE_NAME>" | ||
| ``` | ||
| *Example:* | ||
| ```bash | ||
| terraform import --var-file=terraform_simple.tfvars 'google_sql_database_instance.instances["18"]' "projects/my-gcp-project/instances/smt-sharded-demo-new-physical-shard-18" | ||
| ``` | ||
| 3. **Resume the Deployment**: Once all missing instances are imported, simply rerun the deployment command with controlled parallelism: | ||
| ```bash | ||
| terraform apply -parallelism=30 --var-file=terraform_simple.tfvars | ||
| ``` | ||
| Terraform will successfully refresh the state and complete the configuration setup in minutes! | ||
|
|
||
| --- | ||
|
|
||
| ### Cleaning Up Resources | ||
| To delete all created Google Cloud resources and avoid ongoing charges, run: | ||
| ```bash | ||
| terraform destroy --var-file=terraform_simple.tfvars | ||
| ``` | ||
| All Cloud SQL databases, target Spanner databases, Secret Manager secrets, and networking links will be cleanly removed. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not immediately relevant, but it will be interesting to see how this extends to schema less databases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it would be interesting in case of data generation , but for infra setup we could just skip this step in those cases