Is your feature request related to a problem? Please describe.
Two gaps in English currency recognition that affect financial document
processing (SEC filings, annual reports, earnings releases):
-
ISO 4217 codes are recognised as currency suffixes (27 million GBP
works) but not as currency prefixes (GBP 27 million returns no
result). Leading ISO codes are the dominant notation in financial
documents from the UK, EU, Scandinavia, Southeast Asia, and most
non-US markets.
-
crore and lakh are already correctly defined with numeric values
in English-Numbers.yaml (crore: 10000000, lakh: 100000) but
are absent from MultiplierRegex in English-NumbersWithUnit.yaml,
so they are never applied when parsing currency amounts. crore is
the standard unit in Indian financial reporting and appears in SEC
filings from Indian subsidiaries of multinational companies.
To Reproduce
from recognizers_text import Culture
from recognizers_number_with_unit import NumberWithUnitRecognizer
model = NumberWithUnitRecognizer(Culture.English).get_currency_model()
# Gap 1 — ISO prefix not recognised
model.parse("GBP 27 million") # [] — no result
model.parse("USD 20 million") # [] — no result
model.parse("SEK 60,500,000") # [] — no result
model.parse("JPY 50 billion") # [] — no result
model.parse("CAD$1,700,000") # [] — no result
# Gap 2 — crore/lakh multiplier not applied
model.parse("Rs 660 crore") # [{'value': '660', 'unit': 'Rupee'}]
# Expected: {'value': '6600000000', 'unit': 'Rupee'}
Describe the solution you'd like
Gap 1 — ISO codes as currency prefixes
Add ISO 4217 codes to CurrencyPrefixList in
Patterns/English/English-NumbersWithUnit.yaml. For currencies that
already have a prefix entry (symbol-based), merge the ISO code alongside
the existing patterns. For currencies with no prefix entry, add a new
entry.
Expected behaviour after fix:
model.parse("GBP 27 million")
# [{'value': '27000000', 'unit': 'British pound', 'isoCurrency': 'GBP'}]
model.parse("SEK 60,500,000")
# [{'value': '60500000', 'unit': 'Swedish krona'}]
model.parse("CAD$1,700,000")
# [{'value': '1700000', 'unit': 'Canadian dollar', 'isoCurrency': 'CAD'}]
ISO codes to cover (commonly seen as prefixes in financial documents):
GBP, USD, EUR, JPY, CAD, AUD, CHF, HKD, SGD, KRW, INR, MXN, BRL, ZAR,
NOK, SEK, DKK, VND, CNY, RMB, and short-form variants A$ (AUD) and SG$.
Gap 2 — crore/lakh in currency context
Add lakh and crore to MultiplierRegex in
Patterns/English/English-NumbersWithUnit.yaml:
# before
MultiplierRegex: !simpleRegex
def: \s*\b(thousand|million|billion|trillion)s?\b
# after
MultiplierRegex: !simpleRegex
def: \s*\b(thousand|million|billion|trillion|lakh|crore)s?\b
This is a one-line change — the numeric values are already defined in
English-Numbers.yaml and only need to be included in the unit-context
multiplier pattern.
Expected behaviour after fix:
model.parse("Rs 660 crore")
# [{'value': '6600000000', 'unit': 'Rupee'}]
model.parse("Rs 5 lakh")
# [{'value': '500000', 'unit': 'Rupee'}]
Describe alternatives you've considered
The normalisation for ISO prefixes could alternatively be applied in
QueryProcessor.preprocess() (shared across all recogniser types), but
that risks unintended side-effects on date, dimension, and other models.
Scoping it to CurrencyModel.parse() is safer.
Additional context
- The related bug (wrong values when ISO code is concatenated to a
digit, e.g. USD34.6 million → 6000000) is filed as issue #3211
Both issues share the same root area but have different fixes — the
bug requires a normalisation step in CurrencyModel.parse(); this
enhancement requires pattern additions to the YAML resource.
Is your feature request related to a problem? Please describe.
Two gaps in English currency recognition that affect financial document
processing (SEC filings, annual reports, earnings releases):
ISO 4217 codes are recognised as currency suffixes (
27 million GBPworks) but not as currency prefixes (
GBP 27 millionreturns noresult). Leading ISO codes are the dominant notation in financial
documents from the UK, EU, Scandinavia, Southeast Asia, and most
non-US markets.
croreandlakhare already correctly defined with numeric valuesin
English-Numbers.yaml(crore: 10000000,lakh: 100000) butare absent from
MultiplierRegexinEnglish-NumbersWithUnit.yaml,so they are never applied when parsing currency amounts.
croreisthe standard unit in Indian financial reporting and appears in SEC
filings from Indian subsidiaries of multinational companies.
To Reproduce