On disappearing scripts

Medium is a new online forum or format that I’ve been seeing more and more writing of note on, Quinn Norton’s essay collection is an example of some of the most interesting online writing at the moment. Smart, savvy, independent, thoughtful, nuanced.

This week I stumbled across another piece of note for language nerds, about the potential demise of the Urdu script nastaliq - one of the Persian scripts of note, still found in parts of Afghanistan, Western China, Pakistan and India:

…Urdu, a South Asian language spoken by anywhere between 100 — 125 million people in Pakistan and India, and one of Pakistan’s two official languages. Urdu is traditionally written in a Perso-Arabic script callednastaliq, a flowy and ornate and hanging script. But when rendered on the web and on smartphones and the entire gamut of digital devices at our disposal, Urdu is getting depicted in naskh, an angular and rather stodgy script that comes from Arabic. And those that don’t like it can go write in Western letters.

Here’s a visual comparison taken from Wikipedia.

Nastaliq v. Naskh. Courtesy Wikpedia.

Looking at the picture, the discerning eye may immediately realize why naskh trumps nastaliq on digital devices. With its straightness and angularity, naskh is simply easier to code, because unlike nastaliq, it doesn’t move vertically and doesn’t have dots adhering to a strict pattern. And we all know how techies opt for functionality.

I’m glad the writer goes further, finding the fascination of a language Romanized (the romance of a language romanized), although makes the following claim which I found odd, emphasis mine:

Writing in Roman letters also makes it easier to switch in and out of English. As an example, take a recent Tweet by the human rights activist Sana Saleem: “If you’ve read my tweets, or my work, I hardly ever cuss. Sorry about that, par bus boat hogaya, buss kardo bass.”

To me, as a writer, that is an astonishing piece of text. Not only are we looking at two languages collapsed into one, but the Romanized part is a language that has not yet been formalized; it is literally under construction due to the pressure exerted by the exigencies of the internet.

The implication that the English language is somehow fully formalized and is protected from the vagaries of the internet is just incorrect - it has been three years since Superlinguo dropped I can has language play on us – but even further, English is still being contested offline. Online is just giving younger people greater sway in that contest.

It’s also not that surprising or astonishing a concept to almost anyone that speaks a second or third language – I presume anyway. As someone that speaks small amounts of three or four other languages, inter lingual word play has always been a source of humour, power and poetry.

Despite this minor quibble, it’s a fascinating insight into the deep search that humans go on when confronted with so much knowledge, leading the author unsuccessfully to the doors of Apple and the, surprisingly successfully, to the doors of Microsoft.

It’s a great reminder of how fragile a language or culture can be - despite the ubiquity of information and knowledge online.

A font for all occassions

Google have announced Noto:

Noto is Google’s font family that aims to support all the world’s languages. Its design goal is to achieve visual harmonization across languages. Noto fonts are under Apache License 2.0.

The comprehensive set of fonts and some of the tools used in our development are available at noto.googlecode.com.


New!
 Noto Sans CJK is released with full support for Simplified ChineseTraditional ChineseJapanese, and KoreanLearn more

The page has a neat map to click for languages, where you can choose a subset per nation – for Kenya you can choose from 20 languages; for Australia, a mere three – English, Italian and Traditional Chinese. I’m not surprised there are so few – we barely see any Greek or Vietnamese writing on the streets, but I guess we don’t see much Italian either. Pity there’s not more South East Asian languages in our curricula.

Localised Malware

Trendmicro are reporting seen in the wild localised malware.

The malware strain known as VOBFUS works by copying itself onto removable media like USB sticks with names like porn.exe or sexy.exe. 

This variant also uses file names written in these languages:

  • Arabic
  • Bosnian
  • Chinese
  • Croatian
  • Czech
  • French
  • German
  • Hungarian
  • Italian
  • Korean
  • Persian
  • Polish
  • Portuguese
  • Romanian
  • Slovak
  • Spanish
  • Thai
  • Turkish
  • Vietnamese

While the languages may differ, they all translate to I love youNakedPassword, and Webcam.

I’m surprised that Malware is still a thing at times but then I remember that the whole world is online these days – as this development shows.

Google Translate reaching further

Somehow I missed it at the end of last year, but Google Translate has added nine new languages – four from the African continent, three from Asia, and Maori.

In Africa, we’re adding Somali, Zulu, and the 3 major languages of Nigeria.
  • Hausa (Harshen Hausa), spoken in Nigeria and neighboring countries with 35 million native speakers
  • Igbo (Asụsụ Igbo) spoken in Nigeria with 25 million native speakers
  • Yoruba (èdè Yorùbá) spoken in Nigeria and neighboring countries with 28 million native speakers
  • Somali (Af-Soomaali) spoken in Somalia and other countries around the Horn of Africa with 17 million native speakers
  • Zulu (isiZulu) spoken in South Africa and other south-western African countries with 10 million native speakers
Throughout Asia, we’re launching languages spoken in Mongolia and South Asia.
  • Mongolian (Монгол хэл), official language in Mongolia and also spoken in parts of China with 6 million native speakers
  • Nepali (नेपाली), spoken in Nepal and India with 17 million native speakers
  • Punjabi language (ਪੰਜਾਬੀ) (Gurmukhi script), spoken in India and Pakistan with 100 million native speakers
Thanks to the volunteer effort of passionate native speakers in New Zealand, we’re adding the language of the Maori people.
  • Maori (Te Reo Māori), spoken in New Zealand with 160 thousand speakers

Unbabel – Translation as a Service

How quickly things change. It’s been a while since I’ve had a chance to look at the state of translation and translation tech, and now it seems that all the latest trends have come together.

Unbabel combines the brash young entrepreneur, the youth in turn brings something akin to ignoratio elenchi - the byline is “Translation as a Service”

Human corrected machine translation service that enables businesses to communicate globally

dutifully adhering to the modern “X as a Service” line so necessary for venture capital funding without understanding the nature of translation (it’s always been a service), and as happens with this style of disruptive tech, poorly paid contractors making management rich.

Despite my reservations about the motivations of Unbabel’s direction and management, and my knowledge of what this will do to the translation industry, this is not unexpected. I’ve written before many times about the coming changes and the shake up the industry should by now be expecting. I would suggest that this is the final ramping up of this process, the next step will be a combination of the collapse of the industry. This will lead to two distinct results – a massive increase in the number of translated texts and a dramatic shrinkage of the employment prospects, but increase in the financial returns for those translators that stick at it long enough.

TechCrunch manages to say a lot

Unbabel’s secret sauce leverages artificial intelligence software and its stable of over 3,100 editors (or translators) to translate a website’s content from one language into its customer’s language of choice. First, its machine learning technology translates the text from source into the target language, at which point it uses its Mechanical Turk-style distribution system to assign editing tasks to the right translators, who then check the translation for errors and for stylistic inconsistencies.

Unbabel editors work remotely, via their laptops or mobile phones, on translations, which co-founder Vasco Pedro says provides the key to faster translations. This, combined with the efficiency of its task distribution and administration algorithms, provides a level of efficiency that allows editors to earn up to $10/hour working for Unbabel.

but without much analysis – the technology sector and it’s loyal heralds have never been good at analysis that didn’t revolve around profit and where it’s coming from

Human translation is really the gold standard as far as online translation goes, but for most companies, paying real, live humans to translate their content is an expensive proposition. In most cases, it’s either pony up the funds to pay for humans, or make due with machines (like publicly available tools akin to the unreliable Google Translate) and automated services. By combining both machine translation and human curation, the Unbabel founders not only believe they’ve created a novel solution to a persistent problem, but that they can offer a product that’s on par with pure human translation, faster, and at a fraction of the cost.

Note here the only mention is a “expensive proposition” and “fraction of the cost”. This was to be expected, and I lectured the translation industry that they should expect it. I did not expect the young turks to dismiss the expensive past without even an acknowledgement of the history, theory or purveyors of that industry. I guess that’s why they call them the blues.