Tuesday, October 29, 2024

Script Encoding and Cultural Identity: Navigating Digital Exclusion

By Maroua Bezzaoui, SILICON Intern

During the summer of 2024, Unicode’s internship program included interns from Stanford University, Northeastern University, and Google’s Summer of Code. Several of the interns have shared their experiences. The second featured piece is from Maroua Bezzaoui at Stanford University.


At 14, I was already fluent in three languages, arguably four. While it could have been something to be proud of, it was incredibly humbling to find out that none of them had any significant value in the digital world. As an ambitious high schooler living in Africa, eager to be part of the digital landscape and contribute to it, I found myself disappointingly still excluded from it, facing yet another barrier added to the many my identities already posed.

We often hear that “programming is empowerment”, but how can a non-English speaker be empowered, then, when Python, C, Java, etc. are all in English? Eventually, I did find a way to learn English, but only then could I explore coding, and pursue my education and career in Computer Science. There is no denying the ongoing exclusivity of technology; coding aside, a lot of languages are still underrepresented digitally even for something as simple as the date on a smartphone or texting a friend. This underrepresentation does little to help preserve these languages, many of which belong to indigenous communities.

The preservation and recognition of minority languages depend heavily on their inclusion in digital platforms. One crucial aspect of this process is script encoding, which ensures that languages can be represented accurately in digital formats. However, the path to script encoding is fraught with technical, cultural, and political challenges, particularly in multilingual contexts. In my work on the Unicode-Begin project, I got to hear many stories of the encoding of digitally disadvantaged languages. This post delves into these challenges through the lens of the Sunuwar script in Nepal, drawing on insights from experts and practitioners involved in the process.

The Sunuwar Alphabet

Script encoding is not merely a technical task; it is a cultural and political endeavor that carries significant implications for the communities involved. In Nepal, where 16 different scripts are used, the inclusion of these scripts in digital formats is essential for preserving linguistic diversity. The Sunuwar script, one of these scripts, was included in the The Unicode® Standard, Version 16.0 this year. This milestone represents a significant achievement for the Sunuwar-speaking community, who have long sought to have their language digitally recognized.


However, the journey to this point has been anything but straightforward. As Dev Kumar Sunuwar, a journalist and law practitioner from the Koits-Sunuwar Indigenous Peoples, explains, the process of encoding the Sunuwar script has required extensive technical effort, particularly in the creation of fonts and keyboards that can accommodate the script. Issues with translation and “the need for technical support” have been persistent challenges, highlighting the need for a comprehensive approach that goes beyond mere encoding.


One of the primary challenges in script encoding is ensuring that the digital tools developed for these scripts are both accurate and culturally appropriate. Dev’s experience underscores the importance of this, as the Sunuwar script requires specialized fonts and keyboards to be used effectively. Without these tools, the script’s digital presence remains limited, hindering its use in both personal and professional contexts.



Moreover, the cultural sensitivity required in script encoding cannot be overstated. The inclusion of a script in Unicode or any other digital standard can have significant political implications, particularly in multilingual societies where language is often tied to identity and power. This also applies to the Amazigh community: While it only has the standard Tifinagh script (and the Ancient Berber script it derived from, which is yet to be included in Unicode) we find different Amazigh dialects: Tashelhit, Tamazight, Tarifit, Kabyle. Some of them might have words they write using Arabic abjad, and I recently learned there are also some Berber texts in the Hebrew abjad written by Berber Jews. 

In the case of the Sunuwar script, there are concerns about how its digital recognition might affect the broader linguistic landscape in Nepal. “Political sensitivity in adding characters can occur,” Dr. Debbie Anderson notes, adding that Unicode and ISO must navigate these sensitivities carefully to avoid exacerbating tensions within and between communities.


Case Study: The Sunuwar Script


The story of the Sunuwar script offers a compelling case study of the complexities of script encoding. Drafted by Dev Sunuwar and his team in 2011, the script represents a crucial aspect of the Koits-Sunuwar people’s cultural heritage. As the script’s inclusion in Unicode has occurred, it is essential to reflect on the challenges that have been overcome and the work that remains to be done.


One of the key lessons from the Sunuwar case is the importance of providing ongoing technical support. While the script’s inclusion in Unicode is a significant step, it is only the beginning of a longer journey. The creation of digital tools, such as fonts and keyboards, must be prioritized to ensure that the script can be used effectively in various digital contexts. Additionally, the community’s relationship with Unicode and other relevant organizations must be nurtured over time to address any emerging issues and to support the script’s continued use, taking us to what SILICON Intern, Sam Minev-Benzecry discusses in his post on time and trust: The Unicode Blog: Time and Trust



As the digital world continues to expand, the need for inclusive and culturally sensitive script encoding becomes ever more urgent. The case of the Sunuwar script illustrates the complexities involved in this process and underscores the importance of long-term relationships and trust in achieving meaningful outcomes. For those involved in script encoding, the lessons from Nepal and other multilingual contexts are clear: the work does not end with the inclusion of a script in a digital standard. Ongoing support, cultural sensitivity, and a commitment to the communities involved are essential for ensuring that script encoding truly benefits those it is meant to serve.


Awareness of these challenges is the first step towards achieving the digital inclusion of underrepresented languages and cultures. I wonder if there could ever be a time when people can build applications to be used universally, programmed in their first language, allowing them to focus more on exploring new ideas and deepening their knowledge and skills. This possibility is already being explored by Ramsey Nasser, the founder of Qalb, a programming language entirely in Arabic. On his website, Nasser states: 

“All modern programming tools are based on the ASCII character set, which encodes Latin characters and was originally based on the English language. As a result, programming has become tied to a single written culture. It carries with it a cultural bias that favors those who grew up reading and writing in that culture.” 

Nasser’s work challenges the status quo and opens the door to a future where programming is not confined to a single language or cultural framework. It’s a vision that resonates deeply with the idea of a more inclusive digital world, where diverse linguistic and cultural backgrounds are not just acknowledged but celebrated and integrated into the very tools that shape our technological future.


By approaching script encoding with both technical competence and cultural empathy, we can help preserve the linguistic diversity that is so vital to our shared global heritage.


Maroua is a junior at Stanford University majoring in Computer Science with a focus on computer systems and products. She was born and raised in Casablanca, Morocco, and she is passionate about leveraging technology to drive inclusivity and create positive social impact! In her free time, she enjoys acting and working on theatre projects, with a special interest in French Dramatic Arts.