From https://www.w3.org/International/questions/qa-bidi-controls.en.html
Unicode control codes are not useful for bidi formatting when working with structural or paragraph-level markup.
For inline content we recommend that, wherever possible, you use markup in HTML and XML, rather than the Unicode control characters.
One practical reason for it is those control characters will end up in user text copy paste of typical users who are unaware and clueless and it breaks string matching logic. e.g. T27277
And bidi control includes many different characters (LRI, RLI, FSI, LRE, RLE, LRO, RLO, PDI, PDF) which all should be avoided but one main place generates them in MediaWiki is
- https://gerrit.wikimedia.org/g/mediawiki/core/+/0b58fa96021b833e172fd44016751819c09e9588/includes/language/Language.php#2978 https://codesearch.wmcloud.org/deployed/?q=getDirMarkEntity
- https://gerrit.wikimedia.org/g/mediawiki/core/+/0b58fa96021b833e172fd44016751819c09e9588/includes/language/Language.php#2996 https://codesearch.wmcloud.org/deployed/?q=getDirMark
We should reduce uses of them and eventually if there aren't any plain text uses of the methods, make them deprecated and remove them from MediaWiki or at least adding to their documentation that most of the time they should be avoided.
To help developers not familiar with bidi, what is the propose of such things in MediaWiki at the first place anyway? Consider the following, that $title can be a username or a page title with
abc $title (123)
If $title = 'title'; this will be turned into,
abc title (123)
but if $title = '1شسی'; the very same code results in
abc 1شسی (123)
Note where 123 has went, between parts of the title string! It's still logically after the $title but it's displayed before it because bidi algorithm is mislead and to solve this on plain text one can put a LRM which results in,
abc ۱شسی (123)
Which has the hidden character but if one copies the title it will contain the character which is undesirable so a better solution can be use of <bdi> which results in,
abc <bdi>1شسی</bdi> (123)
which is good but the placement of that 1 is changed so the better solution would be to add dir="ltr" (the LTR or RTL here should match site or user language direction depending on the context, for page title of mono language wikis, better to wiki's content direction, for usernames however not user of enforcing any direction and letting default direction to be used can be better):
abc <bdi dir="ltr">1شسی</bdi> (123)
For example in the use https://gerrit.wikimedia.org/g/mediawiki/extensions/ProofreadPage/+/c30b1e384ab694161fa10665636eb3f2f4c4349a/includes/Special/SpecialProofreadPages.php#357 just wrapping $plink with <bdi> should be enough.
Generally to understand and review these changes keep this rule of thumb in mind that whenever a user generated content such as username, page title, summary and content is used in one line with text and messages that are part of software UI of MediaWiki, those user generated content should be wrapped with <bdi>. It's like XSS and SQL Injection but for bidi and the antidote is an appropiate wrapping with HTML <bdi> tag. (but this is just a simplification with leaving some details out)
Also related suggestions from W3C to use HTML tag and attribute over CSS styles from https://drafts.csswg.org/css-writing-modes-3/
Because HTML UAs can turn off CSS styling, we recommend HTML authors to use the HTML dir attribute and <bdo> element to ensure correct bidirectional layout in the absence of a style sheet. Authors should not use direction in HTML documents.
Because HTML UAs can turn off CSS styling, we recommend HTML authors to use the HTML dir attribute, <bdo> element, and appropriate distinction of text-level vs. grouping-level HTML element types to ensure correct bidirectional layout in the absence of a style sheet. Authors should not use unicode-bidi in HTML documents.