Change pattern for tCode and tGenCode

The current patterns for codes in the codelists are:

InfrastructureManagers: any xs:string
Registers and TrainProtectionSystems: tCode = (\w|[-]){2,}
TrainClearanceGauges: tGenCode = (\w|\S){1,} = \S+

Having three different patterns for the different codelists can be confusing.

The pattern for tCode is (\w|[-]){2,}, which means at least two characters, allowing "word characters" and hyphens. See the description of \w in version3#630 (closed), where it was replaced by [A-Za-z0-9\-_] for tOtherEnumerationValue. tCode notably does not allow _, which seems like an unnecessary restriction.

The definition of tGenCode is partially redundant, since every character in \w is also in \S. The pattern boils down to at least one character, no whitespace but anything else.

Looking at the current values in the codelists, the following characters apart from the Latin upper and lower case letters A-Z and digits 0-9 are used:

InfrastructureManagers: Two codes have spaces and two codes use the letter Ö
Registers: One code uses a hyphen (-)
TrainProtectionSystems: Eight codes use a hyphen (-)
TrainClearanceGauges: Two codes use a hyphen (-), one code uses a period (.), one code uses a plus sign (+) and one code uses square brackets ([])

As explained in version3#630 (closed), \w includes a very broad set of letters, diacritical marks, numbers and symbols, while it excludes common connecting punctuation such as hyphen and underscore. The upside of such an inclusive pattern is that codes can be equal to the native names or abbreviations for the items they represent, without requiring latinisation. On the other hand, not all systems may expect such a wide set of characters to be used.

Questions:

Do we want a common pattern for the four codelists?
Do we want all codes to be latinised, or allow native characters such as Ö and Å?
Do we need any resticting patterns? The codelists are maintained by railML.org, so we still control what we add and not. The attributes in the railML schemas where the codes are used are already unrestricted xs:string.

Links:

Edited Dec 02, 2025 by CO Coordination