Skip to content

fix(consensus,framework,actuator): use Locale.ROOT for case-insensitive#6698

Open
halibobo1205 wants to merge 1 commit intotronprotocol:developfrom
halibobo1205:fix/locale-sensitive-case
Open

fix(consensus,framework,actuator): use Locale.ROOT for case-insensitive#6698
halibobo1205 wants to merge 1 commit intotronprotocol:developfrom
halibobo1205:fix/locale-sensitive-case

Conversation

@halibobo1205
Copy link
Copy Markdown
Collaborator

Summary

Fix a locale-dependent code hygiene issue: no-arg String.toLowerCase() / toUpperCase() calls use the JVM default locale, and under a minority of locales (tr/az) the folding of 'I' diverges from every other locale. Switching uniformly to Locale.ROOT decouples the behavior from the deployment environment, backed by a compile-time guard and a one-time data normalization.

This PR does three things:

  1. Correctness. Replace every no-arg toLowerCase() / toUpperCase() call with its Locale.ROOT counterpart so that string folding is independent of the JVM's startup locale.
  2. Defense-in-depth. Enable ErrorProne's StringCaseLocaleUsage checker at ERROR level, and add a custom StringCaseLocaleUsageMethodRef checker to cover the String::toLowerCase method-reference form that upstream misses. Any future no-arg call fails compilation.
  3. Data hygiene. MigrateTurkishKeyHelper runs once at Manager.init() to normalize any non-ROOT legacy keys that may exist in AccountIdIndexStore.

What this fix really does

Make the case-insensitive index behavior a pure function of the input bytes, no longer implicitly dependent on the JVM's startup locale. Before this fix, AccountIdIndexStore was, in principle, "case-insensitive relative to whichever locale this JVM happens to use" rather than truly locale-independent. After the fix it is the latter.

This is a data-structure-level correctness convergence — remove an implicit environment dependency and align the implementation with the intended semantics.

Scope

Only one site reaches persistent storage: AccountIdIndexStore.getLowerCaseAccountId — the lowercased result becomes a DB key. All other touched call sites (DB engine selection, disabled-API list, log topic classification, hex display, OS detection, etc.) are in-process runtime comparisons — they neither write to disk nor cross nodes.

The input domain itself is tightly constrained: input to setAccountId is gated by validReadableBytes to printable ASCII (0x21–0x7E). Within this domain, the locale divergence in lowercase only manifests on a single character; every other character (letters, digits, symbols) folds identically under any locale.

Why MigrateTurkishKeyHelper is bundled in

A code-only fix does not fully cover "nodes that were ever started under a non-default locale" — such nodes may carry legacy-format keys in their DB. The cleanest way to keep the code fix and the data state in sync is a one-shot normalization at startup.

Design notes:

  • Follows the MoveAbiHelper pattern, gated by a TURKISH_KEY_MIGRATION_DONE flag in DynamicPropertiesStore so it runs at most once.
  • Full-table scan (AccountIdIndexStore is a sparse index with very low cardinality — only 14 entries on mainnet today; scan cost is negligible).
  • For each non-ROOT-form key: write a new entry under its ROOT-form key, then delete the legacy one.
  • Conflict policy: if the ROOT-form key already exists, keep the ROOT entry and drop the legacy value — simple, auditable, and equivalent to the state of a node that was never affected.

For nodes that were never affected, the scan yields zero candidates and the helper is a no-op.

Why the simplified normalization strategy

In principle one could enumerate every possible legacy variant of each ROOT-canonical key and merge them, but that enters a combinatorial space (per-key variant count can reach 2^k, where k is the occurrence count of the divergent character). Even with full enumeration, picking the correct value across conflicts requires knowing the historical insertion order — information the DB layer does not retain.

The conservative strategy chosen here:

  • Simple and auditable
  • Post-migration local semantics equal those of a node that only ever ran under ROOT
  • In the extremely rare conflict case, one legacy value may be dropped — an acceptable cost under case-insensitive index semantics

Impact

  • Mainnet / Nile. Safe to include in any normal release. The code changes produce no byte-level behavioral difference in the expected locale environment; the migration scans a trivially small table and is a no-op for nodes that were never affected.
  • Node operators. You'll see a one-time migration line in the startup log; it won't run again on subsequent restarts.
  • Long term. The ErrorProne rule ensures no future patch can reintroduce no-arg lowercase/uppercase.

Note for downstream fork chains

A downstream fork chain needs to treat this PR as a non-transparent upgrade only if both of the following hold:

  1. All of its consensus nodes have historically run under the same minority locale that diverges in lowercase (e.g. all on tr/az).
  2. Its history includes a committed SetAccountId transaction whose accountId contains a character that triggers the locale divergence (e.g. I).

On such a chain, the AccountIdIndexStore historical state was produced under non-ROOT folding; after this PR, has() lookups for the same accountId may yield different verdicts, which requires a coordinated hard-fork activation height and operator-defined historical data migration strategy.

Mainnet and Nile are not in this category: condition (1) has never held historically.

Release scope

  • Ship as a normal release — no extra coordination required.
  • The ErrorProne rule takes effect with this PR; any no-arg lowercase/uppercase in future patches will be blocked at compile time.

String.toLowerCase()/toUpperCase() without an explicit Locale uses
Locale.getDefault(), which on Turkish (tr) or Azerbaijani (az) systems
folds 'I' to dotless-ı (U+0131) instead of 'i' (U+0069).
Changes:
- Fix all toLowerCase()/toUpperCase() calls to use Locale.ROOT
- Enable the ErrorProne StringCaseLocaleUsage checker at ERROR level
  to prevent future regressions at compile time
- Add one-time data migration (MigrateTurkishKeyHelper) to normalize
  all Turkish legacy keys (ı → i) at startup.
addresses[i] = randomBytes(21);
String turkishLower = accountIds[i].toLowerCase(TURKISH);
turkishKeys[i] = turkishLower.getBytes(StandardCharsets.UTF_8);
accountIdIndexStore.put(turkishKeys[i], new BytesCapsule(addresses[i]));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test design is really crisp — 14 realistic accountIds covering both divergent (I-containing) and non-divergent names, symmetric pre-/post-migration assertions, and using Locale.forLanguageTag("tr") to faithfully reproduce legacy behavior is lovely. 🎯

One branch that doesn't get exercised today: the if (ArrayUtils.isEmpty(existing)) guard in MigrateTurkishKeyHelper.doWork(). Because this setup only seeds Turkish-form keys and never seeds a pre-existing ROOT-form key for the same accountId, the conflict-recovery path (where a ROOT entry already exists and the legacy Turkish value is silently dropped) is always skipped. That branch encodes the conservative conflict policy you describe in the PR body — would you consider adding a small case that:

  • puts a ROOT-form key for "ISSRWallet" with one address,
  • then puts the Turkish-form key "ıssrwallet" with a different address,
  • runs doWork(),
  • asserts the ROOT value wins and the Turkish-form key is gone?

It would also make the intentional policy visible to future readers who might otherwise wonder whether a "keep legacy" policy would be safer.

Copy link
Copy Markdown
Collaborator

@lxcmyf lxcmyf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review findings:

* <p>Uses raw {@link IRevokingDB} access to bypass the fallback lookup
* logic in {@link AccountIdIndexStore#has(byte[])} — we need exact-match
* checks to determine whether the ROOT key already exists in the DB.
*/
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NIT] The justification here is inaccurate. AccountIdIndexStore#has(byte[]) has no fallback lookup path — it just calls getLowerCaseAccountId (an idempotent ROOT-lowercase) and does a single revokingDB.getUnchecked. There's nothing to "bypass". Using raw IRevokingDB is still reasonable (iteration avoids the BytesCapsule wrap), but the comment should say that instead, so the next reader isn't left looking for a fallback branch that doesn't exist. Also worth noting: other startup migrations (MoveAbiHelper, AssetUpdateHelper) go through the high-level store API — consider aligning for consistency, or call out explicitly why this one is different.

.getBytes(StandardCharsets.UTF_8);
// Only write if ROOT key doesn't already exist
byte[] existing = revokingDB.getUnchecked(rootKey);
if (ArrayUtils.isEmpty(existing)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD] ArrayUtils.isEmpty(byte[]) returns true for both null and zero-length arrays. If a ROOT-form entry ever holds a zero-length value, this branch would treat it as "ROOT key doesn't exist" and overwrite it with the Turkish-form value — which directly contradicts the conflict policy documented in the PR description ("keep the ROOT entry and drop the legacy value"). In practice AccountIdIndexStore stores 21-byte addresses so the case is unlikely, but the contract of this migration should match what the PR body claims. Suggest if (existing == null) so the semantics are exactly "key is absent".

byte[] topicHash = Hash.sha3(ByteArray.fromString(topic));
if (Arrays.equals(topicsBytes, topicHash)) {
if (topic.toLowerCase().contains("mint")) {
if (topic.toLowerCase(Locale.ROOT).contains("mint")) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NIT] The same topic string is lowercased up to 5 times in this branch chain (minttransferburnleaftoken). This is on the event-subscribe topic-classification path that runs for every emitted log, so extracting it once is both clearer and cheaper:

String lower = topic.toLowerCase(Locale.ROOT);
if (lower.contains("mint")) return 1;
else if (lower.contains("transfer")) return 2;
else if (lower.contains("burn")) {
  if (lower.contains("leaf")) return 3;
  else if (lower.contains("token")) return 4;
}

String method = request.getMethod();

if (HttpMethod.GET.toString().toUpperCase().equalsIgnoreCase(method)) {
if (HttpMethod.GET.toString().toUpperCase(Locale.ROOT).equalsIgnoreCase(method)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NIT] These two lines contain pre-existing redundancy that's worth cleaning up while we're here:

  • Line 529: HttpMethod.GET.toString() is always the literal "GET" (uppercase); .toUpperCase(Locale.ROOT) is a no-op; and .equalsIgnoreCase(method) already folds case. The chain collapses to "GET".equalsIgnoreCase(method).
  • Line 532: same deal — HttpMethod.POST.toString().toUpperCase(Locale.ROOT).equals(method) reduces to "POST".equals(method) (if case-sensitivity is intended) or "POST".equalsIgnoreCase(method) (if symmetric with GET).

The locale fix is correct in isolation, but the toUpperCase call here isn't really doing anything — better to drop it than keep it guarded by Locale.ROOT.

}

// Phase 3: delete old Turkish keys
for (byte[] key : keysToDelete) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD]The put and delete operations can be performed directly in the first loop; there's no need for two loops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants