Skip to content

[EN Currency] ISO 4217 codes not recognised as currency prefixes; crore/lakh multipliers missing from currency context #3212

@nikitabuxy

Description

@nikitabuxy

Is your feature request related to a problem? Please describe.

Two gaps in English currency recognition that affect financial document
processing (SEC filings, annual reports, earnings releases):

  1. ISO 4217 codes are recognised as currency suffixes (27 million GBP
    works) but not as currency prefixes (GBP 27 million returns no
    result). Leading ISO codes are the dominant notation in financial
    documents from the UK, EU, Scandinavia, Southeast Asia, and most
    non-US markets.

  2. crore and lakh are already correctly defined with numeric values
    in English-Numbers.yaml (crore: 10000000, lakh: 100000) but
    are absent from MultiplierRegex in English-NumbersWithUnit.yaml,
    so they are never applied when parsing currency amounts. crore is
    the standard unit in Indian financial reporting and appears in SEC
    filings from Indian subsidiaries of multinational companies.

To Reproduce

from recognizers_text import Culture
from recognizers_number_with_unit import NumberWithUnitRecognizer

model = NumberWithUnitRecognizer(Culture.English).get_currency_model()

# Gap 1 — ISO prefix not recognised
model.parse("GBP 27 million")    # []  — no result
model.parse("USD 20 million")    # []  — no result
model.parse("SEK 60,500,000")    # []  — no result
model.parse("JPY 50 billion")    # []  — no result
model.parse("CAD$1,700,000")     # []  — no result

# Gap 2 — crore/lakh multiplier not applied
model.parse("Rs 660 crore")      # [{'value': '660', 'unit': 'Rupee'}]
# Expected: {'value': '6600000000', 'unit': 'Rupee'}

Describe the solution you'd like

Gap 1ISO codes as currency prefixes

Add ISO 4217 codes to CurrencyPrefixList in
Patterns/English/English-NumbersWithUnit.yaml. For currencies that
already have a prefix entry (symbol-based), merge the ISO code alongside
the existing patterns. For currencies with no prefix entry, add a new
entry.

Expected behaviour after fix:

model.parse("GBP 27 million")
# [{'value': '27000000', 'unit': 'British pound', 'isoCurrency': 'GBP'}]

model.parse("SEK 60,500,000")
# [{'value': '60500000', 'unit': 'Swedish krona'}]

model.parse("CAD$1,700,000")
# [{'value': '1700000', 'unit': 'Canadian dollar', 'isoCurrency': 'CAD'}]

ISO codes to cover (commonly seen as prefixes in financial documents):
GBP, USD, EUR, JPY, CAD, AUD, CHF, HKD, SGD, KRW, INR, MXN, BRL, ZAR,
NOK, SEK, DKK, VND, CNY, RMB, and short-form variants A$ (AUD) and SG$.

Gap 2crore/lakh in currency context

Add lakh and crore to MultiplierRegex in
Patterns/English/English-NumbersWithUnit.yaml:

# before
MultiplierRegex: !simpleRegex
  def: \s*\b(thousand|million|billion|trillion)s?\b

# after
MultiplierRegex: !simpleRegex
  def: \s*\b(thousand|million|billion|trillion|lakh|crore)s?\b

This is a one-line changethe numeric values are already defined in
English-Numbers.yaml and only need to be included in the unit-context
multiplier pattern.

Expected behaviour after fix:

model.parse("Rs 660 crore")
# [{'value': '6600000000', 'unit': 'Rupee'}]

model.parse("Rs 5 lakh")
# [{'value': '500000', 'unit': 'Rupee'}]

Describe alternatives you've considered

The normalisation for ISO prefixes could alternatively be applied in
QueryProcessor.preprocess() (shared across all recogniser types), but
that risks unintended side-effects on date, dimension, and other models.
Scoping it to CurrencyModel.parse() is safer.

Additional context

- The related bug (wrong values when ISO code is concatenated to a
digit, e.g. USD34.6 million6000000) is filed as issue #3211 
Both issues share the same root area but have different fixesthe
bug requires a normalisation step in CurrencyModel.parse(); this
enhancement requires pattern additions to the YAML resource.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions