Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • railML 3 railML 3
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Graph
    • Compare revisions
  • Issues 91
    • Issues 91
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Deployments
    • Deployments
    • Releases
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • railML.orgrailML.org
  • railML 3railML 3
  • Issues
  • #630
Closed
Open
Issue created Feb 11, 2025 by CO Coordination@coordination.COMaintainer

Change pattern for tOtherEnumerationValue

The current pattern for tOtherEnumerationValue is other:\w{2,}, which means other: followed by at least two characters, allowing "all characters except the set of "punctuation", "separator" and "other" characters". These character categories are called "General Category" and defined per character in the Unicode Character Database (a list of all characters and their General Category is given in UnicodeData.txt).

If we look at the characters of the commonly used Windows-1252 character set, the following characters are included in \w:

  • Letters: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿŒœŠšŸŽžƒˆ
  • Numbers: 0123456789²³¹¼½¾
  • Symbols: $+<=>^`|~¢£¤¥¦¨©¬®¯°±´¸×÷˜€™

In the same character set, the following characters are excluded from \w:

  • Punctuation: !"#%&'()*,-./:;?@[]_{}¡§«¶·»¿–—‘’‚“”„†‡•…‰‹›
  • Separators: space and no-break space
  • Other: control characters and soft hyphen

\w also allows any other unicode character that is not categorised as either "punctuation", "separator" or "other". Here are some more examples of unicode characters included in \w:

  • Letters: ČĐIJɱΘИفफ़นኯ駱𝕱𝛘
  • Marks: (diacritical marks that combine with other characters)
  • Numbers: ٠١٢٣٤٥٦٧٨٩௦௧௨௩௪௫௬௭௮௯௰௱௲
  • Symbols: ˂˃˄˅℃⇐⇑⇒⇓√∛∜🀉🅰🅱🎡🎢🎣🎤💯💰💱

It seems a bit random to allow other:🥳🎉 but not values with hyphens or underscores. The railML documentation says "minimum two characters, white space not allowed", which would indicate the pattern other:\S{2,}, or more generally other:\P{Z}{2,}.

Recommended solution

  • Change the pattern of tOtherEnumerationValue to other:[A-Za-z0-9\-_]{2,}
  • Update documentation of previous versions (see related issues) to recommend using only letters A-Z, a-z and digits 0-9.

Related issues

Documentation updates in previous versions:

  • #651 (closed) railML 3.1
  • version2#485 (closed) railML 2.5
  • #626 (closed) railML 3.2
  • #625 (closed) railML 3.3
Edited Dec 02, 2025 by CO Coordination
Assignee
Assign to
Time tracking

railML.org e.V. (Registry of Associations: VR 5750) Phone: +49 351 47582911 Altplauen 19h; 01187 Dresden; Germany