Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • Shared resources Shared resources
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 6
    • Issues 6
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Deployments
    • Deployments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • railML.orgrailML.org
  • Shared resourcesShared resources
  • Issues
  • #8
Closed
Open
Issue created Jul 25, 2025 by CO Coordination@coordination.COMaintainer

Change pattern for tCode and tGenCode

The current patterns for codes in the codelists are:

  • InfrastructureManagers: any xs:string
  • Registers and TrainProtectionSystems: tCode = (\w|[-]){2,}
  • TrainClearanceGauges: tGenCode = (\w|\S){1,} = \S+

Having three different patterns for the different codelists can be confusing.

The pattern for tCode is (\w|[-]){2,}, which means at least two characters, allowing "word characters" and hyphens. See the description of \w in version3#630 (closed), where it was replaced by [A-Za-z0-9\-_] for tOtherEnumerationValue. tCode notably does not allow _, which seems like an unnecessary restriction.

The definition of tGenCode is partially redundant, since every character in \w is also in \S. The pattern boils down to at least one character, no whitespace but anything else.

Looking at the current values in the codelists, the following characters apart from the Latin upper and lower case letters A-Z and digits 0-9 are used:

  • InfrastructureManagers: Two codes have spaces and two codes use the letter Ö
  • Registers: One code uses a hyphen (-)
  • TrainProtectionSystems: Eight codes use a hyphen (-)
  • TrainClearanceGauges: Two codes use a hyphen (-), one code uses a period (.), one code uses a plus sign (+) and one code uses square brackets ([])

As explained in version3#630 (closed), \w includes a very broad set of letters, diacritical marks, numbers and symbols, while it excludes common connecting punctuation such as hyphen and underscore. The upside of such an inclusive pattern is that codes can be equal to the native names or abbreviations for the items they represent, without requiring latinisation. On the other hand, not all systems may expect such a wide set of characters to be used.

Questions:

  1. Do we want a common pattern for the four codelists?
  2. Do we want all codes to be latinised, or allow native characters such as Ö and Å?
  3. Do we need any resticting patterns? The codelists are maintained by railML.org, so we still control what we add and not. The attributes in the railML schemas where the codes are used are already unrestricted xs:string.

Links:

  • Forum: railml.common » [Codelists] Change pattern for tCode in codelists
  • version3#630 (closed)
Edited Dec 02, 2025 by CO Coordination
Assignee
Assign to
Time tracking

railML.org e.V. (Registry of Associations: VR 5750) Phone: +49 351 47582911 Altplauen 19h; 01187 Dresden; Germany