Skip to content

Acronym with a number as a name #1251

@leky40

Description

@leky40

I was trying to annotate some names with an English acronym and a number which are used in Thai texts. The names are written without whitespace between the acronym and number, as in: N3 and L6. These are for the types of lottery in Thailand.

I remember that #1136 discussed the similar subject. And I checked how this kind of name should be treated in the guideline.

And the acronym and the number of the 2 names are not delimited. It is confusing somehow because I have been dealing with this challenge of Thai for the whole time. Plus, these English names are created from Thai context.

So should each of these names is made one token as represented?

I went through the English treebank to see how the similar names are treated. They are made one token. I am not sure if this is specific only in English, or if this should be done in all languages for UD.

If each name is made one token, it will be tagged PROPN. This is very simple.

If separated tokens, the number should be tagged PROPN, not NUM, regarding to the guideline mentioned above. And the relation between the acronym and the number should be flat. Right?

As for the relation flat, it is understandable for names. But differentiating between cardinal and ordinal numbers in Thai is done by the position placement. That is, placing it before a noun expresses quantities, while placing after a noun expresses sequences mostly but not always. No inflectional markers on numbers and spelled-out numbers are used.

My question for this relation decision is if it should be followed in all languages, or if each language should make their own decision.

As known, Thai names for people and places are long from compounding words. I make a person name one word with PROPN no matter how long it is.

And I annotate place names in Thai 2 ways:

First, the names for well-known and officially-set places in Thailand are made one token with PROPN. This will facilitate some language processing reasons.

Second, the place names which are created arbitrarily are separated into single words. And they are each syntactically tagged and annotated with ExtPos=PROPN at the head.

So if these 2 names of the lottery or any other names with a number in Thai are treated with separation, should it be flat or nmod? And a number should be tagged NUM or PROPN? As mentioned, place names are created from Thai context (both meaning and syntactic structure). I am then not sure if I should follow what I have set for Thai, or if I should follow the UD guidelines cited above.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions