Skip to content

feat(Spanner): integrate SourceConfigParser to centralize shard configuration loading for SpannerToSourceDb pipelines#3840

Open
pratickchokhani wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
pratickchokhani:shard-config-reverse
Open

feat(Spanner): integrate SourceConfigParser to centralize shard configuration loading for SpannerToSourceDb pipelines#3840
pratickchokhani wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
pratickchokhani:shard-config-reverse

Conversation

@pratickchokhani
Copy link
Copy Markdown
Contributor

No description provided.

@pratickchokhani pratickchokhani added the addition New feature or request label May 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.47%. Comparing base (d8c6ecc) to head (9014e52).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...cloud/teleport/v2/templates/SpannerToSourceDb.java 62.06% 8 Missing and 3 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##               main    #3840       +/-   ##
=============================================
+ Coverage     39.94%   55.47%   +15.52%     
- Complexity      684     6509     +5825     
=============================================
  Files           208     1102      +894     
  Lines         14902    67463    +52561     
  Branches       1528     7567     +6039     
=============================================
+ Hits           5953    37425    +31472     
- Misses         8451    27591    +19140     
- Partials        498     2447     +1949     
Components Coverage Δ
spanner-templates 87.80% <66.66%> (∅)
spanner-import-export 68.61% <ø> (∅)
spanner-live-forward-migration 90.22% <100.00%> (∅)
spanner-live-reverse-replication 82.87% <66.66%> (∅)
spanner-bulk-migration 92.62% <100.00%> (∅)
gcs-spanner-dv 89.08% <100.00%> (∅)
Files with missing lines Coverage Δ
...er/migrations/utils/CassandraConfigFileReader.java 100.00% <100.00%> (ø)
...cloud/teleport/v2/templates/SpannerToSourceDb.java 19.51% <62.06%> (ø)

... and 917 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pratickchokhani pratickchokhani force-pushed the shard-config-reverse branch 3 times, most recently from 7a2cdc1 to 6436182 Compare May 21, 2026 09:37
ResultSet mockRs = mock(ResultSet.class);

when(mockConn.createStatement()).thenReturn(mockStmt);
when(mockStmt.executeQuery("SHOW VARIABLES LIKE 'read_only'")).thenReturn(mockRs);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's removed as mokito logs show that these mocks are not used in the test.

@pratickchokhani pratickchokhani force-pushed the shard-config-reverse branch 19 times, most recently from 08bd358 to 213e4aa Compare May 26, 2026 08:41
@pratickchokhani pratickchokhani marked this pull request as ready for review May 26, 2026 08:41
@pratickchokhani pratickchokhani requested a review from a team as a code owner May 26, 2026 08:41
@gemini-code-assist
Copy link
Copy Markdown

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

// cassandra is always a single sharded migration.
// for JDBC, shards size and IsShardedMigration option is used below.
String shardingMode =
options.getSourceType().equals(CASSANDRA_SOURCE_TYPE)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to the respective config parsing flow rather than having it as a top level decision

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the non-sharded workflow work here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shardingMode: this is used in AssignShardIdFn to identify if default shard id is needed to be assigned to stream events.

I could have used the size of shards list to identify shardingMode.

The primary concern in doing so was the logic below (at line 672) where this is also dependent on the user input getIsShardedMigration.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can AND that flag here to get to the output. In any case, not a blocker to letting this through. Lets take it up as a follow up

List<Shard> shards;
if (sourceConnectionConfig instanceof JdbcShardConfig) {
shards = ((JdbcShardConfig) sourceConnectionConfig).getShardConfigs();
LOG.info("JDBC config is: {}", shards);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not log this, this might have PII

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

shards =
cassandraConfigFileReader.getCassandraShard(
((CassandraConnectionConfig) sourceConnectionConfig).getOptionsMap());
LOG.info("Cassandra config is: {}", shards.get(0));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not log. It might have PII

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
</exclusion>
<exclusion>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this excluded ? Is it a change in beam ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this exclusion, there is a version conflict with:
org.apache.cassandra
java-driver-core
${cassandra-java-driver-core.version}

This dependency is test scoped. It was not causing any issue earlier as we didn't have tests which was using this package.

Comment thread v2/spanner-to-sourcedb/README.md Outdated
9. Configuration Files Upload
- **For MySQL:**
[Source shards file](./RunnigReverseReplication.md#sample-source-shards-file-for-MySQL) already uploaded to GCS.
[source shards file](#sample-source-shards-file-for-MySQL) already uploaded to GCS.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the capitalization here for uniformity

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* @param sourceShardsFilePath The GCS path to the source shards configuration file.
* @return A list of shards.
*/
public static List<Shard> getShardList(String sourceType, String sourceShardsFilePath) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally this method should move to spanner-common

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic below is specific to Reverse replication. There is no synergy on this between reverse and bulk.

Also, the plan is to remove this in phase 2. I have added a comment to reflect this.

@pratickchokhani pratickchokhani force-pushed the shard-config-reverse branch 3 times, most recently from f52391a to f6ce311 Compare May 27, 2026 10:13
…guration loading for SpannerToSourceDb pipelines
Copy link
Copy Markdown
Contributor

@bharadwaj-aditya bharadwaj-aditya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some follow up work here. But marking as LGTM as those items can be taken up in subsequent PRS

// cassandra is always a single sharded migration.
// for JDBC, shards size and IsShardedMigration option is used below.
String shardingMode =
options.getSourceType().equals(CASSANDRA_SOURCE_TYPE)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can AND that flag here to get to the output. In any case, not a blocker to letting this through. Lets take it up as a follow up

Copy link
Copy Markdown
Contributor

@bharadwaj-aditya bharadwaj-aditya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

addition New feature or request size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants