diff --git a/v2/spanner-common/terraform/samples/infra-setup/README.md b/v2/spanner-common/terraform/samples/infra-setup/README.md new file mode 100644 index 0000000000..851d649c1e --- /dev/null +++ b/v2/spanner-common/terraform/samples/infra-setup/README.md @@ -0,0 +1,179 @@ +# Source Database & Spanner Target Setup for Migration Testing + +This folder contains Terraform configuration files to automatically set up, configure, and clean up database resources on Google Cloud Platform (GCP). + +This setup is designed to help you prepare and test database migration pipelines. It automatically creates: +1. One or more **source database instances** using Google Cloud SQL (either MySQL or PostgreSQL). +2. Inside those database instances, it creates multiple **logical databases (shards)**. +3. It imports a database table structure (your SQL schema) from a local file into all created logical databases. +4. A **target Cloud Spanner database instance**. +5. Two **sharding configuration files** (`shard-config.json` and `bulk-config.json`) that list the host IP, database name, and credentials for all created database shards. You can pass either file directly as an input parameter to your Dataflow migration jobs. + +--- + +## Prerequisites + +Before you begin, make sure your computer has the following installed and configured: + +1. **Terraform CLI** (Version 1.2.0 or newer) +2. **Google Cloud SDK (`gcloud` CLI)**: Installed, logged in, and set up with your project: + ```bash + gcloud auth login + gcloud auth application-default login + ``` +3. **Python 3** (installed and accessible from your command line) +4. **Google Cloud Project** with billing enabled. + +--- + +## How the Automated Scripts Work + +This setup includes several helper scripts in the `scripts/` folder to handle database loading, cleanup, and state reconciliation. + +### 1. Database Schema Loader (`scripts/import_schema.sh`) +Once the Cloud SQL database instances are created, Terraform runs this bash script **once per physical instance** (the import step uses `for_each`), so a failure on one instance only re-imports that instance on the next apply instead of all of them. Each run reads your local SQL structure file (like `schema.sql`) and imports it sequentially into that instance's logical databases (Cloud SQL allows only one import at a time per instance); Terraform runs the instances in parallel. +* **Why the retries are needed:** The bucket grants each Cloud SQL instance's service account read access just before the import runs, but IAM changes take a few seconds to propagate across Google Cloud. An import attempted in that window fails with a permission error. To handle this, the script retries each import up to 6 times (waiting 10 seconds between attempts) until the permission propagates and the schema loads successfully. + +### 2. Spanner Backup Cleanup (`scripts/delete_spanner_backups.sh`) +When you run `terraform destroy` to delete your setup, Google Cloud Spanner will refuse to delete the database instance if there are any automatic database backups present. This script automatically finds and deletes all backups for the Spanner instance right before Terraform deletes the instance. + +### 3. Private Connection Cleanup (`scripts/teardown_vpc_peering.sh`) +If you configure your databases to use private IPs instead of public IPs, Google Cloud creates private networking connections between your network and Cloud SQL. When deleting this infrastructure, Google Cloud occasionally takes time to release these connections. This script cleanly deletes the private network connection using the `gcloud` tool, or safely bypasses it if there are other active resources still using the connection. + + +--- + +## Step-by-Step Guide to Deploying + +### Step 1: Prepare Your Local Database Structure +Create a local SQL file named `schema.sql` in this folder. Define the tables and columns you want to load into your source databases. For example: +```sql +CREATE TABLE users ( + id INT PRIMARY KEY, + name VARCHAR(100), + email VARCHAR(100) +); +``` + +### Step 2: Configure Your Variables +There are two variable sample files provided: +1. **`terraform_simple.tfvars` (Recommended for beginners)**: A simple, minimal configuration containing only the most important variables. It leverages the automated prefix generation. +2. **`terraform.tfvars`**: A comprehensive variable template containing all available settings (such as database user, password, network CIDRs, tags, Spanner processing units). + +#### Key Naming Variables: +* **`instance_prefix` (Optional)**: A string prefixed to physical database instances and target Spanner instances. If not provided, a unique random pet name of the form `smt--` (e.g. `smt-clever-mongoose`) is generated automatically. +* **`migration_prefix` (Optional)**: A string prefixed to other resources like VPC networks, subnets, Secret Manager secrets, and GCS schema buckets. If not provided, a unique random pet name of the form `smt--` is generated automatically. +* **`spanner_instance_name` / `spanner_database_name` (Optional)**: Overrides the target Spanner instance and database names completely. If left blank, they are dynamically derived from your `instance_prefix` and `migration_prefix` respectively. + +Open `terraform_simple.tfvars` or `terraform.tfvars`, replace the placeholders (like ``) with your actual values, and save the file. + +### Step 3: Initialize and Deploy + +Run the following commands in your terminal: + +```bash +# 1. Download necessary Terraform providers and plugins +terraform init + +# 2. Deploy the databases and generate the configuration +# Note: For large scale deployments (e.g., 128 shards), you MUST use the -parallelism flag +# for faster resource creation (default is 10). +terraform apply -parallelism=100 --var-file=terraform_simple.tfvars +``` + +--- + +## Outputs & Results + +Once the deployment completes successfully, Terraform will print the resource details on your screen and generate two sharding configuration files in this directory: + +### 1. Regular Shard Config Format (`shard-config.json`) +```json +[ + { + "logicalShardId": "shard-0", + "host": "198.51.100.5", + "port": "3306", + "user": "migration_user", + "password": null, + "dbName": "shard_db_0", + "namespace": "public", + "secretManagerUri": "projects/my-gcp-project/secrets/smt_clever_mongoose_db_password/versions/latest", + "connectionProperties": "jdbcCompliantTruncation=true" + } +] +``` + +### 2. Bulk Shard Config Format (`bulk-config.json`) +```json +{ + "shardConfigurationBulk": { + "dataShards": [ + { + "host": "198.51.100.5", + "port": 3306, + "user": "migration_user", + "password": null, + "secretManagerUri": "projects/my-gcp-project/secrets/smt_clever_mongoose_db_password/versions/latest", + "connectionProperties": "jdbcCompliantTruncation=true", + "namespace": "public", + "databases": [ + { + "dbName": "shard_db_0", + "databaseId": "shard-0" + }, + { + "dbName": "shard_db_1", + "databaseId": "shard-1" + } + ] + } + ] + } +} +``` + +--- + +## Troubleshooting + +### Handling Creation Timeouts & Operation Dropouts +When deploying a high number of physical database instances concurrently (e.g., 128 shards), you may occasionally encounter a transient timeout or polling connection dropout error from the Google Cloud API: +``` +Error: Error waiting for Create Instance: ... +``` +Or when running `terraform apply` again after a timeout: +``` +Error: Error, failed to create instance ...: googleapi: Error 409: The Cloud SQL instance already exists., instanceAlreadyExists +``` + +#### Why this happens: +When Terraform requests the creation of 100+ databases, Google Cloud schedules their creation asynchronously in the background. If the local Terraform process loses connection to the GCP Operation API or hits a client-side wait timeout, Terraform aborts the command and **fails to save those specific instances to your local `terraform.tfstate` file**, even though the creation continues successfully in the background on Google's servers. + +#### How to resolve this: +1. **Verify creation in GCP**: Run this CLI command to confirm that the instances are active and running on Google Cloud: + ```bash + gcloud sql instances list --project="" --filter="name~smt-sharded" + ``` +2. **Import the affected instances into Terraform State**: For any instances that were successfully created on GCP but are missing from your local state file (causing `409 Already Exists` errors), import them manually back into Terraform. The instances use `for_each`, so the resource address is keyed by the shard index **as a quoted string** (e.g. `["18"]`, not `[18]`): + ```bash + terraform import --var-file=terraform_simple.tfvars 'google_sql_database_instance.instances[""]' "projects//instances/" + ``` + *Example:* + ```bash + terraform import --var-file=terraform_simple.tfvars 'google_sql_database_instance.instances["18"]' "projects/my-gcp-project/instances/smt-sharded-demo-new-physical-shard-18" + ``` +3. **Resume the Deployment**: Once all missing instances are imported, simply rerun the deployment command with controlled parallelism: + ```bash + terraform apply -parallelism=30 --var-file=terraform_simple.tfvars + ``` + Terraform will successfully refresh the state and complete the configuration setup in minutes! + +--- + +### Cleaning Up Resources +To delete all created Google Cloud resources and avoid ongoing charges, run: +```bash +terraform destroy --var-file=terraform_simple.tfvars +``` +All Cloud SQL databases, target Spanner databases, Secret Manager secrets, and networking links will be cleanly removed. \ No newline at end of file diff --git a/v2/spanner-common/terraform/samples/infra-setup/main.tf b/v2/spanner-common/terraform/samples/infra-setup/main.tf new file mode 100644 index 0000000000..d50962c9ce --- /dev/null +++ b/v2/spanner-common/terraform/samples/infra-setup/main.tf @@ -0,0 +1,370 @@ +# Random suffix for storage bucket name to ensure global uniqueness +resource "random_id" "bucket_suffix" { + byte_length = 4 +} + +# Random prefixes generated using pet names if none are supplied +resource "random_pet" "migration_id" { + prefix = "smt" +} + +resource "random_pet" "instance_id" { + prefix = "smt" +} + +# Random passwords for database users if a specific password was not provided +resource "random_password" "db_password" { + count = var.database_password != null && var.database_password != "" ? 0 : 1 + length = 16 + special = false +} + +locals { + migration_prefix_resolved = var.migration_prefix != null && var.migration_prefix != "" ? var.migration_prefix : random_pet.migration_id.id + instance_prefix_resolved = var.instance_prefix != null && var.instance_prefix != "" ? var.instance_prefix : random_pet.instance_id.id + + # Spanner instance/database names must match ^[a-z][-a-z0-9]*[a-z0-9]$, so we + # lower() the resolved value. + # without this an uppercase prefix would but fail Spanner. + spanner_instance_name_resolved = lower(var.spanner_instance_name != null && var.spanner_instance_name != "" ? var.spanner_instance_name : "${local.instance_prefix_resolved}-spanner") + spanner_database_name_resolved = lower(var.spanner_database_name != null && var.spanner_database_name != "" ? var.spanner_database_name : "${local.migration_prefix_resolved}-db") + + database_version_reconstructed = "${upper(var.database_provider)}_${upper(var.database_version)}" + resolved_vpc_id = var.vpc_network_id != null ? var.vpc_network_id : google_compute_network.private_network[0].id + + # Stable for_each keys for the sharded resources. Keying by index string + # (instead of count) means changing attributes on one shard never re-indexes + # or recreates the others. + physical_shard_ids = toset([for i in range(var.physical_shards_count) : tostring(i)]) + + # Map of every logical database keyed by its flat global index. Each value + # records the owning physical shard and the database name, so the resource + # and the generated shard configs stay in lockstep. + logical_databases = { + for idx in range(var.physical_shards_count * var.logical_shards_count) : + tostring(idx) => { + physical_key = tostring(floor(idx / var.logical_shards_count)) + name = "${var.logical_shard_prefix}_${idx}" + } + } +} + + + +# Create a VPC network if one is not supplied as an input variable +resource "google_compute_network" "private_network" { + count = var.vpc_network_id == null ? 1 : 0 + name = "${lower(local.migration_prefix_resolved)}-vpc" + auto_create_subnetworks = false + project = var.project_id + depends_on = [ + google_project_service.enabled_apis + ] +} + +# Create a subnetwork within the VPC network +resource "google_compute_subnetwork" "private_subnetwork" { + count = var.vpc_network_id == null ? 1 : 0 + name = "${lower(local.migration_prefix_resolved)}-subnet" + ip_cidr_range = "10.0.0.0/24" + region = var.region + network = google_compute_network.private_network[0].id + project = var.project_id +} + +# Allocate an IP range for private service connection +resource "google_compute_global_address" "private_ip_alloc" { + count = var.vpc_network_id == null ? 1 : 0 + name = "${lower(local.migration_prefix_resolved)}-pip-alloc" + purpose = "VPC_PEERING" + address_type = "INTERNAL" + prefix_length = 16 + network = google_compute_network.private_network[0].id + project = var.project_id +} + +# Establish the private service connection mapping using gcloud rather than the +# native google_service_networking_connection resource. The native resource has +# a long-standing destroy bug (hashicorp/terraform-provider-google #16275, +# #19908): after a Cloud SQL instance is deleted, the producer side releases the +# peering only after a delay, so the connection reports "Producer services still +# using the connection" and terraform destroy fails/hangs. The teardown script +# below deletes the peering via gcloud and exits cleanly if it is still in use, +# allowing destroy to complete. +resource "null_resource" "private_vpc_connection" { + count = var.vpc_network_id == null ? 1 : 0 + + triggers = { + project_id = var.project_id + network_name = google_compute_network.private_network[0].name + range_name = google_compute_global_address.private_ip_alloc[0].name + } + + provisioner "local-exec" { + environment = { + NETWORK_NAME = self.triggers.network_name + RANGE_NAME = self.triggers.range_name + PROJECT_ID = self.triggers.project_id + } + command = "${path.module}/scripts/connect_vpc_peering.sh" + } + + provisioner "local-exec" { + when = destroy + environment = { + NETWORK_NAME = self.triggers.network_name + PROJECT_ID = self.triggers.project_id + } + command = "${path.module}/scripts/teardown_vpc_peering.sh" + } + + depends_on = [ + google_project_service.enabled_apis, + google_compute_global_address.private_ip_alloc + ] +} + +# Provision Cloud SQL physical database instances +resource "google_sql_database_instance" "instances" { + for_each = local.physical_shard_ids + name = "${lower(local.instance_prefix_resolved)}-physical-shard-${each.key}" + database_version = local.database_version_reconstructed + region = var.region + project = var.project_id + + settings { + tier = var.cloudsql_tier + + ip_configuration { + ipv4_enabled = var.enable_public_ip + private_network = local.resolved_vpc_id + enable_private_path_for_google_cloud_services = true + + dynamic "authorized_networks" { + for_each = var.authorized_networks + content { + name = authorized_networks.value.name + value = authorized_networks.value.value + } + } + } + + user_labels = var.resource_labels + } + + deletion_protection = false + + # Large concurrent deployments (e.g. 128 shards) can exceed the default + # client-side wait, causing Terraform to abort while creation continues + # server-side (leading to 409 "already exists" on re-apply). Generous + # timeouts keep Terraform polling instead of dropping the operation. + timeouts { + create = "60m" + update = "60m" + delete = "60m" + } + + depends_on = [ + google_project_service.enabled_apis, + null_resource.private_vpc_connection + ] +} + +# Create the database migration user on all physical database shards +resource "google_sql_user" "users" { + for_each = local.physical_shard_ids + name = var.database_user + instance = google_sql_database_instance.instances[each.key].name + host = length(regexall(".*POSTGRES.*", upper(var.database_provider))) > 0 ? null : "%" + password = var.database_password != null && var.database_password != "" ? var.database_password : random_password.db_password[0].result + project = var.project_id +} + +# Provision Secret Manager secrets to store the shard passwords +resource "google_secret_manager_secret" "db_passwords" { + count = 1 + secret_id = "${replace(local.migration_prefix_resolved, "-", "_")}_db_password" + + replication { + auto {} + } + + labels = var.resource_labels + depends_on = [google_project_service.enabled_apis] +} + +# Store database user passwords securely in Secret Manager secret versions +resource "google_secret_manager_secret_version" "db_password_versions" { + count = 1 + secret = google_secret_manager_secret.db_passwords[0].id + secret_data = var.database_password != null && var.database_password != "" ? var.database_password : random_password.db_password[0].result +} + +# Create the logical shard databases distributed across physical instances +resource "google_sql_database" "logical_databases" { + for_each = local.logical_databases + name = each.value.name + instance = google_sql_database_instance.instances[each.value.physical_key].name + project = var.project_id +} + +# Create GCS bucket to upload the schema file for Cloud SQL import +resource "google_storage_bucket" "schema_bucket" { + name = "${lower(local.migration_prefix_resolved)}-schema-${random_id.bucket_suffix.hex}" + location = var.region + project = var.project_id + uniform_bucket_level_access = true + force_destroy = true + labels = var.resource_labels + depends_on = [google_project_service.enabled_apis] +} + +# Upload local schema file to GCS bucket +resource "google_storage_bucket_object" "schema_file" { + name = "schema.sql" + source = var.local_schema_file_path + bucket = google_storage_bucket.schema_bucket.name +} + +# Grant IAM permissions to all Cloud SQL service accounts to read schema from the GCS bucket in a single API call to prevent ETag lock collision delays +resource "google_storage_bucket_iam_binding" "sql_gcs_reader" { + bucket = google_storage_bucket.schema_bucket.name + role = "roles/storage.objectViewer" + members = [ + for inst in google_sql_database_instance.instances : + "serviceAccount:${inst.service_account_email_address}" + ] +} + +# Import the schema into each physical instance's logical databases. One +# null_resource per physical shard so a failed import only taints (and re-runs) +# that single instance on the next apply; the script serializes the logical +# imports within an instance (Cloud SQL allows one import operation at a time). +resource "null_resource" "schema_import" { + for_each = local.physical_shard_ids + + triggers = { + schema_md5 = filemd5(var.local_schema_file_path) + instance_name = google_sql_database_instance.instances[each.key].name + database_ids = join(",", [ + for idx in range(var.logical_shards_count) : + google_sql_database.logical_databases[tostring(tonumber(each.key) * var.logical_shards_count + idx)].id + ]) + } + + depends_on = [ + google_storage_bucket_iam_binding.sql_gcs_reader, + google_sql_user.users, + google_sql_database.logical_databases, + google_storage_bucket_object.schema_file + ] + + provisioner "local-exec" { + # Pass parameters via shell environments to avoid shell injection issues + environment = { + PROJECT_ID = var.project_id + BUCKET_NAME = google_storage_bucket.schema_bucket.name + OBJECT_NAME = google_storage_bucket_object.schema_file.name + INSTANCE_NAME = google_sql_database_instance.instances[each.key].name + DATABASE_NAMES = join(",", [ + for idx in range(var.logical_shards_count) : + google_sql_database.logical_databases[tostring(tonumber(each.key) * var.logical_shards_count + idx)].name + ]) + } + + command = "${path.module}/scripts/import_schema.sh" + } +} + +# Provision Spanner Target Instance +resource "google_spanner_instance" "spanner_instance" { + name = local.spanner_instance_name_resolved + config = var.spanner_config + display_name = var.spanner_display_name + processing_units = var.spanner_processing_units + project = var.project_id + labels = var.resource_labels + depends_on = [ + google_project_service.enabled_apis + ] + + # Automated teardown of Spanner backups to prevent destroy failures + provisioner "local-exec" { + when = destroy + environment = { + INSTANCE_NAME = self.name + PROJECT_ID = self.project + } + command = "${path.module}/scripts/delete_spanner_backups.sh" + } +} + +# Provision Spanner Target Database +resource "google_spanner_database" "spanner_database" { + instance = google_spanner_instance.spanner_instance.name + name = local.spanner_database_name_resolved + project = var.project_id + database_dialect = var.spanner_database_dialect + deletion_protection = false +} + +# Generate the Shard Config json file matching the Shard.java model properties +locals { + shards = [ + for idx in range(var.physical_shards_count * var.logical_shards_count) : { + logicalShardId = "shard-${idx}" + host = try( + coalesce( + one([for ip in google_sql_database_instance.instances[tostring(floor(idx / var.logical_shards_count))].ip_address : ip.ip_address if ip.type == "PRIVATE"]), + google_sql_database_instance.instances[tostring(floor(idx / var.logical_shards_count))].ip_address[0].ip_address + ), + "127.0.0.1" + ) + port = tostring(var.database_port != null ? var.database_port : (length(regexall(".*POSTGRES.*", upper(var.database_provider))) > 0 ? 5432 : 3306)) + user = try(google_sql_user.users[tostring(floor(idx / var.logical_shards_count))].name, var.database_user) + password = null + dbName = "${var.logical_shard_prefix}_${idx}" + namespace = "public" + secretManagerUri = try("${google_secret_manager_secret.db_passwords[0].id}/versions/latest", "projects/${var.project_id}/secrets/placeholder/versions/latest") + connectionProperties = var.connection_properties + } + ] + + bulk_shards = { + shardConfigurationBulk = { + dataShards = [ + for p_idx in range(var.physical_shards_count) : { + host = try( + coalesce( + one([for ip in google_sql_database_instance.instances[tostring(p_idx)].ip_address : ip.ip_address if ip.type == "PRIVATE"]), + google_sql_database_instance.instances[tostring(p_idx)].ip_address[0].ip_address + ), + "127.0.0.1" + ) + port = var.database_port != null ? var.database_port : (length(regexall(".*POSTGRES.*", upper(var.database_provider))) > 0 ? 5432 : 3306) + user = try(google_sql_user.users[tostring(p_idx)].name, var.database_user) + password = null + secretManagerUri = try("${google_secret_manager_secret.db_passwords[0].id}/versions/latest", "projects/${var.project_id}/secrets/placeholder/versions/latest") + connectionProperties = var.connection_properties + namespace = "public" + databases = [ + for l_idx in range(var.logical_shards_count) : { + dbName = "${var.logical_shard_prefix}_${p_idx * var.logical_shards_count + l_idx}" + databaseId = "shard-${p_idx * var.logical_shards_count + l_idx}" + } + ] + } + ] + } + } +} + +resource "local_file" "shard_config" { + content = jsonencode(local.shards) + filename = "${path.module}/shard-config.json" +} + +resource "local_file" "bulk_shard_config" { + content = jsonencode(local.bulk_shards) + filename = "${path.module}/bulk-config.json" +} \ No newline at end of file diff --git a/v2/spanner-common/terraform/samples/infra-setup/outputs.tf b/v2/spanner-common/terraform/samples/infra-setup/outputs.tf new file mode 100644 index 0000000000..bd151d818f --- /dev/null +++ b/v2/spanner-common/terraform/samples/infra-setup/outputs.tf @@ -0,0 +1,49 @@ +output "spanner_instance_id" { + description = "The ID of the provisioned Spanner instance" + value = google_spanner_instance.spanner_instance.name +} + +output "spanner_database_id" { + description = "The ID of the provisioned Spanner database" + value = google_spanner_database.spanner_database.name +} + +output "cloudsql_instance_names" { + description = "The names of the provisioned physical Cloud SQL database instances" + value = [for inst in google_sql_database_instance.instances : inst.name] +} + +output "cloudsql_instance_ips" { + description = "A map of physical Cloud SQL database instances and their assigned IP addresses" + value = { + for inst in google_sql_database_instance.instances : + inst.name => try( + coalesce( + one([for ip in inst.ip_address : ip.ip_address if ip.type == "PRIVATE"]), + inst.ip_address[0].ip_address + ), + "unknown" + ) + } +} + +output "shard_config_file" { + description = "The filesystem path of the generated shard config JSON file" + value = local_file.shard_config.filename +} + +output "shard_config_content" { + description = "The JSON configuration of the generated shard config matching Shard.java" + value = jsondecode(local_file.shard_config.content) +} + + +output "bulk_shard_config_file" { + description = "The filesystem path of the generated bulk shard config file" + value = local_file.bulk_shard_config.filename +} + +output "bulk_shard_config_content" { + description = "The JSON configuration of the generated bulk shard config" + value = jsondecode(local_file.bulk_shard_config.content) +} \ No newline at end of file diff --git a/v2/spanner-common/terraform/samples/infra-setup/schema.sql b/v2/spanner-common/terraform/samples/infra-setup/schema.sql new file mode 100644 index 0000000000..e30ad1e283 --- /dev/null +++ b/v2/spanner-common/terraform/samples/infra-setup/schema.sql @@ -0,0 +1 @@ +create table if not exists xyz(id int); \ No newline at end of file diff --git a/v2/spanner-common/terraform/samples/infra-setup/scripts/connect_vpc_peering.sh b/v2/spanner-common/terraform/samples/infra-setup/scripts/connect_vpc_peering.sh new file mode 100755 index 0000000000..73b1864926 --- /dev/null +++ b/v2/spanner-common/terraform/samples/infra-setup/scripts/connect_vpc_peering.sh @@ -0,0 +1,18 @@ +#!/usr/bin/env bash +set -euo pipefail + +echo "=================================================================" +echo "Creating VPC Service Networking Connection via gcloud..." +echo "=================================================================" + +if [ -z "${NETWORK_NAME:-}" ] || [ -z "${RANGE_NAME:-}" ] || [ -z "${PROJECT_ID:-}" ]; then + echo "[ERROR] Missing required environment variables: NETWORK_NAME, RANGE_NAME, PROJECT_ID must be set." + exit 1 +fi + +gcloud services vpc-peerings connect \ + --service="servicenetworking.googleapis.com" \ + --network="$NETWORK_NAME" \ + --ranges="$RANGE_NAME" \ + --project="$PROJECT_ID" \ + --quiet diff --git a/v2/spanner-common/terraform/samples/infra-setup/scripts/delete_spanner_backups.sh b/v2/spanner-common/terraform/samples/infra-setup/scripts/delete_spanner_backups.sh new file mode 100755 index 0000000000..60956825db --- /dev/null +++ b/v2/spanner-common/terraform/samples/infra-setup/scripts/delete_spanner_backups.sh @@ -0,0 +1,31 @@ +#!/usr/bin/env bash +set -euo pipefail + +if [ -z "${INSTANCE_NAME:-}" ] || [ -z "${PROJECT_ID:-}" ]; then + echo "[ERROR] Missing required environment variables: INSTANCE_NAME, PROJECT_ID must be set." + exit 1 +fi + +echo "=================================================================" +echo "Listing and deleting Spanner backups for instance '$INSTANCE_NAME'..." +echo "=================================================================" + +BACKUPS=$(gcloud spanner backups list \ + --instance="$INSTANCE_NAME" \ + --project="$PROJECT_ID" \ + --format="value(BACKUP)" 2>/dev/null || true) + +if [ -n "$BACKUPS" ]; then + for backup in $BACKUPS; do + echo "[INFO] Deleting Spanner backup: $backup" + gcloud spanner backups delete "$backup" \ + --instance="$INSTANCE_NAME" \ + --project="$PROJECT_ID" \ + --quiet || true + done + echo "[SUCCESS] Finished deleting Spanner backups." +else + echo "[INFO] No Spanner backups found." +fi + +exit 0 diff --git a/v2/spanner-common/terraform/samples/infra-setup/scripts/import_schema.sh b/v2/spanner-common/terraform/samples/infra-setup/scripts/import_schema.sh new file mode 100755 index 0000000000..f33d6f5ba3 --- /dev/null +++ b/v2/spanner-common/terraform/samples/infra-setup/scripts/import_schema.sh @@ -0,0 +1,60 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Imports the schema file into every logical database of a SINGLE Cloud SQL +# instance. Imports within an instance run sequentially +# because Cloud SQL permits only one import operation per instance at a time. + +if [ -z "${PROJECT_ID:-}" ] || [ -z "${BUCKET_NAME:-}" ] || [ -z "${OBJECT_NAME:-}" ] || \ + [ -z "${INSTANCE_NAME:-}" ] || [ -z "${DATABASE_NAMES:-}" ]; then + echo "[ERROR] Missing required environment variables: PROJECT_ID, BUCKET_NAME, OBJECT_NAME, INSTANCE_NAME, DATABASE_NAMES." + exit 1 +fi + +echo "=========================================" +echo "Importing schema into instance '$INSTANCE_NAME'..." +echo "=========================================" + +# Convert the comma-separated database list into a bash array +IFS=',' read -ra DATABASES <<< "$DATABASE_NAMES" + +for db_name in "${DATABASES[@]}"; do + echo "[INFO] Instance $INSTANCE_NAME: importing schema into '$db_name'..." + + success=false + max_attempts=10 + base_delay=10 + + for attempt in $(seq 1 $max_attempts); do + if gcloud sql import sql "$INSTANCE_NAME" "gs://$BUCKET_NAME/$OBJECT_NAME" \ + --database="$db_name" \ + --project="$PROJECT_ID" \ + --quiet; then + success=true + break + fi + + # Calculate exponential backoff with a cap of 60 seconds + delay=$(( base_delay * 2 ** (attempt - 1) )) + if [ $delay -gt 60 ]; then + delay=60 + fi + + # Add a random jitter of 1-5 seconds to prevent thundering herds across parallel shards + jitter=$(( RANDOM % 5 + 1 )) + total_delay=$(( delay + jitter )) + + echo "[WARN] Instance $INSTANCE_NAME: import failed (Rate limit or IAM eventual consistency). Retrying in ${total_delay}s (attempt $attempt/$max_attempts)..." + sleep $total_delay + done + + if [ "$success" = false ]; then + echo "[ERROR] Instance $INSTANCE_NAME: failed to import schema into '$db_name' after $max_attempts attempts." + exit 1 + fi +done + +echo "=========================================" +echo "Schema import completed for instance '$INSTANCE_NAME'." +echo "=========================================" +exit 0 \ No newline at end of file diff --git a/v2/spanner-common/terraform/samples/infra-setup/scripts/teardown_vpc_peering.sh b/v2/spanner-common/terraform/samples/infra-setup/scripts/teardown_vpc_peering.sh new file mode 100755 index 0000000000..3c4a1a37ee --- /dev/null +++ b/v2/spanner-common/terraform/samples/infra-setup/scripts/teardown_vpc_peering.sh @@ -0,0 +1,28 @@ +#!/usr/bin/env bash +set -euo pipefail + +echo "=================================================================" +echo "Starting Service Networking Connection teardown via gcloud..." +echo "=================================================================" + +if [ -z "${NETWORK_NAME:-}" ] || [ -z "${PROJECT_ID:-}" ]; then + echo "[ERROR] Missing required environment variables: NETWORK_NAME, PROJECT_ID must be set." + exit 1 +fi + +if gcloud services vpc-peerings delete \ + --service="servicenetworking.googleapis.com" \ + --network="$NETWORK_NAME" \ + --project="$PROJECT_ID" \ + --quiet; then + echo "=================================================================" + echo "[SUCCESS] Service Networking Connection deleted successfully!" + echo "=================================================================" +else + echo "=================================================================" + echo "[INFO] Active Producer services (orphans) are still tied to this VPC in GCP." + echo "[INFO] Safely preserving the connection in GCP to avoid blocking Terraform destroy." + echo "=================================================================" +fi + +exit 0 diff --git a/v2/spanner-common/terraform/samples/infra-setup/terraform.tf b/v2/spanner-common/terraform/samples/infra-setup/terraform.tf new file mode 100644 index 0000000000..8589f2baa0 --- /dev/null +++ b/v2/spanner-common/terraform/samples/infra-setup/terraform.tf @@ -0,0 +1,55 @@ +terraform { + required_providers { + google = { + source = "hashicorp/google" + version = "~> 6.0" + } + google-beta = { + source = "hashicorp/google-beta" + version = "~> 6.0" + } + random = { + source = "hashicorp/random" + version = "~> 3.0" + } + null = { + source = "hashicorp/null" + version = "~> 3.0" + } + local = { + source = "hashicorp/local" + version = "~> 2.0" + } + external = { + source = "hashicorp/external" + version = "~> 2.0" + } + } + required_version = ">= 1.2" +} + +provider "google" { + project = var.project_id + region = var.region +} + +provider "google-beta" { + project = var.project_id + region = var.region +} + +# Enable needed GCP APIs for Spanner, Cloud SQL, Storage, and IAM. +resource "google_project_service" "enabled_apis" { + for_each = toset([ + "iam.googleapis.com", + "sqladmin.googleapis.com", + "spanner.googleapis.com", + "storage.googleapis.com", + "secretmanager.googleapis.com", + "compute.googleapis.com", + "servicenetworking.googleapis.com" + ]) + service = each.key + project = var.project_id + disable_on_destroy = false +} \ No newline at end of file diff --git a/v2/spanner-common/terraform/samples/infra-setup/terraform.tfvars b/v2/spanner-common/terraform/samples/infra-setup/terraform.tfvars new file mode 100644 index 0000000000..0cc5d9b47c --- /dev/null +++ b/v2/spanner-common/terraform/samples/infra-setup/terraform.tfvars @@ -0,0 +1,71 @@ +# ============================================================================== +# Comprehensive Terraform configuration for Spanner Common Infrastructure Setup +# Every available variable is listed below. Required variables must be set. +# Optional variables show their default value; lines that are commented out use +# the module default (and, where noted, trigger automatic name generation). +# ============================================================================== + +# ------------------------------------------------------------------------------ +# REQUIRED +# ------------------------------------------------------------------------------ +project_id = "" # GCP project where resources are created +region = "" # e.g. "us-central1" +local_schema_file_path = "./schema.sql" # Local SQL schema imported into every shard + +# ------------------------------------------------------------------------------ +# NAMING (optional). Leave commented to auto-generate a unique "smt--" +# name. An empty string ("") is treated the same as unset. +# ------------------------------------------------------------------------------ +# instance_prefix = "my-inst" # Prefix for Cloud SQL + Spanner instance names +# migration_prefix = "my-mig" # Prefix for VPC, subnet, secret, and bucket names + +# ------------------------------------------------------------------------------ +# SOURCE DATABASE (Cloud SQL) +# ------------------------------------------------------------------------------ +database_provider = "MYSQL" # MYSQL or POSTGRES +database_version = "8_0" # MySQL: 8_0, 5_7 | Postgres: 14, 15, 16 +physical_shards_count = 1 # Number of physical Cloud SQL instances +logical_shards_count = 2 # Logical databases per physical instance +logical_shard_prefix = "shard_db" # DB name prefix -> shard_db_0, shard_db_1, ... +cloudsql_tier = "db-f1-micro" # Machine tier for each instance +database_user = "migration_user" # DB user created on every shard +database_password = "" # Empty -> a random password is generated +# database_port = 3306 # Optional. Defaults: 3306 (MySQL) / 5432 (Postgres) + +# ------------------------------------------------------------------------------ +# NETWORKING / ACCESS +# ------------------------------------------------------------------------------ +enable_public_ip = true # Set false to use private IP only (more secure) + +# Provide an existing VPC self-link to skip creating a new network + peering. +# Leave commented to create a dedicated VPC, subnet, and private service connection. +# vpc_network_id = "projects//global/networks/" + +# CIDRs allowed to reach Cloud SQL over public IP (only used when enable_public_ip = true). +authorized_networks = [ + { + name = "user-ip" + value = "" # e.g. "192.0.2.1/32" + } +] + +# JDBC connection properties embedded in the generated shard config. +connection_properties = "jdbcCompliantTruncation=true" + +# ------------------------------------------------------------------------------ +# TARGET SPANNER +# ------------------------------------------------------------------------------ +spanner_config = "regional-" # e.g. "regional-us-central1"; should match `region` +spanner_display_name = "SMT Spanner Instance" +spanner_processing_units = 100 # Positive multiple of 100 (100 = 0.1 node) +spanner_database_dialect = "GOOGLE_STANDARD_SQL" # GOOGLE_STANDARD_SQL or POSTGRESQL +# spanner_instance_name = "my-spanner" # Optional. Unset -> derived from instance_prefix +# spanner_database_name = "my-db" # Optional. Unset -> derived from migration_prefix + +# ------------------------------------------------------------------------------ +# LABELS +# ------------------------------------------------------------------------------ +resource_labels = { + "env" = "dev" + "template" = "sharded-migration" +} \ No newline at end of file diff --git a/v2/spanner-common/terraform/samples/infra-setup/terraform_simple.tfvars b/v2/spanner-common/terraform/samples/infra-setup/terraform_simple.tfvars new file mode 100644 index 0000000000..c43ebdfcd9 --- /dev/null +++ b/v2/spanner-common/terraform/samples/infra-setup/terraform_simple.tfvars @@ -0,0 +1,25 @@ +# ============================================================================== +# Simple Terraform configuration for Spanner Common Infrastructure Setup +# Configure the core required and most common variables here. +# ============================================================================== + +# Google Cloud Project ID and Region +project_id = "" +region = "" + +# Naming prefixes to keep resources unique (both are optional) +instance_prefix = "" +migration_prefix = "" + +# Local SQL schema file path to initialize databases +local_schema_file_path = "./schema.sql" + +# Cloud SQL Database Setup +database_provider = "MYSQL" +database_version = "8_0" +physical_shards_count = 1 +logical_shards_count = 2 + +# Target Spanner config. Should match `region` above, otherwise Spanner is +# created in a different region than Cloud SQL. Default is regional-us-central1. +spanner_config = "regional-" \ No newline at end of file diff --git a/v2/spanner-common/terraform/samples/infra-setup/variables.tf b/v2/spanner-common/terraform/samples/infra-setup/variables.tf new file mode 100644 index 0000000000..fc2954c823 --- /dev/null +++ b/v2/spanner-common/terraform/samples/infra-setup/variables.tf @@ -0,0 +1,177 @@ +variable "project_id" { + description = "The GCP Project ID where resources will be created" + type = string +} + +variable "region" { + description = "The GCP Region where resources will be created" + type = string +} + +variable "instance_prefix" { + description = "A prefix to apply to physical database and Spanner instance names" + type = string + default = null +} + +variable "migration_prefix" { + description = "A prefix to apply to other migration resources to ensure name uniqueness" + type = string + default = null +} + +variable "database_provider" { + description = "The Cloud SQL database engine provider. Supported providers: MYSQL, POSTGRES" + type = string + default = "MYSQL" + + validation { + condition = contains(["MYSQL", "POSTGRES"], upper(var.database_provider)) + error_message = "database_provider must be either MYSQL or POSTGRES." + } +} + +variable "database_version" { + description = "The Cloud SQL database engine version. Supported versions: for MYSQL ('8_0', '5_7'), for POSTGRES ('14', '15', '16')" + type = string + default = "8_0" +} + +variable "physical_shards_count" { + description = "The number of physical Cloud SQL database instances to create" + type = number + default = 1 + + validation { + condition = var.physical_shards_count >= 1 + error_message = "physical_shards_count must be at least 1." + } +} + +variable "logical_shards_count" { + description = "The number of logical databases (shards) to create per physical Cloud SQL instance" + type = number + default = 1 + + validation { + condition = var.logical_shards_count >= 1 + error_message = "logical_shards_count must be at least 1." + } +} + +variable "logical_shard_prefix" { + description = "The database name prefix for each logical database shard" + type = string + default = "shard_db" +} + +variable "cloudsql_tier" { + description = "The machine type/tier for the Cloud SQL instance" + type = string + default = "db-f1-micro" +} + +variable "database_user" { + description = "The username of the database user created on all physical shards" + type = string + default = "migration_user" +} + +variable "database_password" { + description = "The password for the database user. If empty, a random password is created automatically" + type = string + default = "" + sensitive = true +} + +variable "database_port" { + description = "The connection port to define in the shard config output. Defaults to 3306 for MySQL or 5432 for PostgreSQL." + type = number + default = null +} + +variable "enable_public_ip" { + description = "Whether to enable public IP on the Cloud SQL instances" + type = bool + default = true +} + +variable "vpc_network_id" { + description = "The full self-link of the VPC network if using private IP configurations for Cloud SQL" + type = string + default = null +} + +variable "authorized_networks" { + description = "Authorized CIDR networks that can access Cloud SQL via public IP (e.g., [{name = \"all\", value = \"0.0.0.0/0\"}])" + type = list(object({ + name = string + value = string + })) + default = [] +} + +variable "local_schema_file_path" { + description = "The local path to the SQL schema file which will be imported into each logical database shard" + type = string +} + +variable "connection_properties" { + description = "Database connection properties for JDBC string to include in the shard config output" + type = string + default = "jdbcCompliantTruncation=true" +} + +variable "spanner_instance_name" { + description = "The name/ID of the Spanner instance to create. If empty, derived from instance_prefix" + type = string + default = null +} + +variable "spanner_display_name" { + description = "The display name for the Spanner instance" + type = string + default = "SMT Spanner Instance" +} + +variable "spanner_config" { + description = "The Spanner instance config name (e.g. regional-us-central1, multi-region-us)" + type = string + default = "regional-us-central1" +} + +variable "spanner_processing_units" { + description = "The number of processing units to allocate for the Spanner instance (100 units = 0.1 node)" + type = number + default = 100 + + validation { + condition = var.spanner_processing_units > 0 && var.spanner_processing_units % 100 == 0 + error_message = "spanner_processing_units must be a positive multiple of 100." + } +} + +variable "spanner_database_name" { + description = "The name of the Spanner database to create. If empty, derived from migration_prefix" + type = string + default = null +} + +variable "resource_labels" { + description = "Key/Value labels to tag and organize GCP resources created by this module" + type = map(string) + default = { + "migration_prefix" = "smt-migration" + } +} + +variable "spanner_database_dialect" { + description = "The dialect for the target Spanner database. Supported: GOOGLE_STANDARD_SQL, POSTGRESQL" + type = string + default = "GOOGLE_STANDARD_SQL" + + validation { + condition = contains(["GOOGLE_STANDARD_SQL", "POSTGRESQL"], upper(var.spanner_database_dialect)) + error_message = "spanner_database_dialect must be GOOGLE_STANDARD_SQL or POSTGRESQL." + } +} \ No newline at end of file