Coded messages to China in Lorem Ipsum?

acb has pointed me to an interesting tale about Google Translate seemingly hiding coded signals in Lorem Ipsum translations. While it all seems a little far fetched or conspiratorial, this story is from Krebs on Security, a well respected blog on all things crypto and security.

I recently discovered that the simple Lorem has been transformed into what can only be described as a post internet: Vegan Ipsum (veggie ipsum), Lorem Bacon, Hipster Ipsum, Lorizzle (Gangster Ipusm), Beer Ipsum, Samuel L Ipsum and teh best of the lot – Picksomeipsum in which you can pick one of the actors Clint Eastwood, Michael Caine, Jim Carrey and or Morgan Freeman as a source, or pit two of the actors against each other:

…ing is about respect. getting it for yourself, and taking it away from the other guy. you want a guarantee, buy a toaster. what you have to ask yourself is, do i feel lucky. well do ya’ punk? cities fall but they are rebuilt. heroes die but they are remembered. multiply your anger by about a hundred, kate, that’s how much he thinks he loves you. no, this is mount everest. you should flip on the discovery channel from time to time. but i guess you can’t now, being dead and all. cities fall but they are rebuilt. heroes die but they are remembered. are you feeling lucky punk don’t p!ss down my back and tell me it’s raining. here. put that in your report!” and “i may have found a way out of here. that tall drink of water with the silver spoon up his ass. ever notice how sometimes you come across somebody you shouldn’t have f**ked with? well, i’m that guy.

Rehabilitated? well, now let me see. you know, i don’t have any idea what that means. it only took me six days. same time it took the lord to make the world. ever notice how sometimes you come across somebody you shouldn’t have f**ked with? well, i’m that guy. man’s gotta know his limitations. mister wayne, if you don’t want to tell me exactly what you’re doing, when i’m asked, i don’t have to lie. but don’t think of me as an idiot. cities fall but they are rebuilt. heroes die but they are remembered. circumstances have taught me that a man’s ethics are the only possessions he will take beyond the grave. this is my gun, clyde! this is the ak-47 assault rifle, the preferred weapon of your enemy; and it makes a distinctive sound when fired at you, so remember it. well, do you have anything to say for yourself? the man likes to play chess; let’s get him some rocks. that tall drink of water with the silver spoon up his ass…

There is, of course, a fairly comprehensive list of Lorem Ipsum generators, and generators available in German, Chinese, Russian and Spanish Ipsum.

Localised Malware

Trendmicro are reporting seen in the wild localised malware.

The malware strain known as VOBFUS works by copying itself onto removable media like USB sticks with names like porn.exe or sexy.exe. 

This variant also uses file names written in these languages:

  • Arabic
  • Bosnian
  • Chinese
  • Croatian
  • Czech
  • French
  • German
  • Hungarian
  • Italian
  • Korean
  • Persian
  • Polish
  • Portuguese
  • Romanian
  • Slovak
  • Spanish
  • Thai
  • Turkish
  • Vietnamese

While the languages may differ, they all translate to I love youNakedPassword, and Webcam.

I’m surprised that Malware is still a thing at times but then I remember that the whole world is online these days – as this development shows.

Google Translate reaching further

Somehow I missed it at the end of last year, but Google Translate has added nine new languages – four from the African continent, three from Asia, and Maori.

In Africa, we’re adding Somali, Zulu, and the 3 major languages of Nigeria.
  • Hausa (Harshen Hausa), spoken in Nigeria and neighboring countries with 35 million native speakers
  • Igbo (Asụsụ Igbo) spoken in Nigeria with 25 million native speakers
  • Yoruba (èdè Yorùbá) spoken in Nigeria and neighboring countries with 28 million native speakers
  • Somali (Af-Soomaali) spoken in Somalia and other countries around the Horn of Africa with 17 million native speakers
  • Zulu (isiZulu) spoken in South Africa and other south-western African countries with 10 million native speakers
Throughout Asia, we’re launching languages spoken in Mongolia and South Asia.
  • Mongolian (Монгол хэл), official language in Mongolia and also spoken in parts of China with 6 million native speakers
  • Nepali (नेपाली), spoken in Nepal and India with 17 million native speakers
  • Punjabi language (ਪੰਜਾਬੀ) (Gurmukhi script), spoken in India and Pakistan with 100 million native speakers
Thanks to the volunteer effort of passionate native speakers in New Zealand, we’re adding the language of the Maori people.
  • Maori (Te Reo Māori), spoken in New Zealand with 160 thousand speakers

Unbabel – Translation as a Service

How quickly things change. It’s been a while since I’ve had a chance to look at the state of translation and translation tech, and now it seems that all the latest trends have come together.

Unbabel combines the brash young entrepreneur, the youth in turn brings something akin to ignoratio elenchi – the byline is “Translation as a Service”

Human corrected machine translation service that enables businesses to communicate globally

dutifully adhering to the modern “X as a Service” line so necessary for venture capital funding without understanding the nature of translation (it’s always been a service), and as happens with this style of disruptive tech, poorly paid contractors making management rich.

Despite my reservations about the motivations of Unbabel’s direction and management, and my knowledge of what this will do to the translation industry, this is not unexpected. I’ve written before many times about the coming changes and the shake up the industry should by now be expecting. I would suggest that this is the final ramping up of this process, the next step will be a combination of the collapse of the industry. This will lead to two distinct results – a massive increase in the number of translated texts and a dramatic shrinkage of the employment prospects, but increase in the financial returns for those translators that stick at it long enough.

TechCrunch manages to say a lot

Unbabel’s secret sauce leverages artificial intelligence software and its stable of over 3,100 editors (or translators) to translate a website’s content from one language into its customer’s language of choice. First, its machine learning technology translates the text from source into the target language, at which point it uses its Mechanical Turk-style distribution system to assign editing tasks to the right translators, who then check the translation for errors and for stylistic inconsistencies.

Unbabel editors work remotely, via their laptops or mobile phones, on translations, which co-founder Vasco Pedro says provides the key to faster translations. This, combined with the efficiency of its task distribution and administration algorithms, provides a level of efficiency that allows editors to earn up to $10/hour working for Unbabel.

but without much analysis – the technology sector and it’s loyal heralds have never been good at analysis that didn’t revolve around profit and where it’s coming from

Human translation is really the gold standard as far as online translation goes, but for most companies, paying real, live humans to translate their content is an expensive proposition. In most cases, it’s either pony up the funds to pay for humans, or make due with machines (like publicly available tools akin to the unreliable Google Translate) and automated services. By combining both machine translation and human curation, the Unbabel founders not only believe they’ve created a novel solution to a persistent problem, but that they can offer a product that’s on par with pure human translation, faster, and at a fraction of the cost.

Note here the only mention is a “expensive proposition” and “fraction of the cost”. This was to be expected, and I lectured the translation industry that they should expect it. I did not expect the young turks to dismiss the expensive past without even an acknowledgement of the history, theory or purveyors of that industry. I guess that’s why they call them the blues.

Instruments of the orchestra

Recently I had a lovely page bought to my attention – The Names of Instruments and Voices in English, French, German, Italian, Russian1, and Spanish. Hosted by Yale (presumably giving it a longevity), it’s not 100% complete – computer (under electronic instruments) only comes in French (ordinateur) and German (Computerklänge), cowbells is only in French (cloches à vache), but Tubular bells comes in a number of languages: French (cloches tubulaires), German (Rohrenglocke), Italian (campane tubolari) and Spanish (campanas tubulares).

Not being native to any of those languages, I’m not completely sure on the translations – the page looks old, pre Google Translate at least, and may not be as correct as we’d all like.

None the less, it’s great to see someone has put in the effort for the international orchestral scene!

Khmer goes alpha on Google Translate

Google Translate has added Khmer as an alpha status language – meaning that their translations are good but not great.

Today’s Khmer launch comes with these useful features: virtual keyboard (in case you want to type in Khmer but do not have Khmer keyboard handy) and ability to read Khmer text phonetically for users who don’t read Khmer alphabet.
Khmer is a challenging language for translation systems for two reasons: There isn’t a lot of Khmer data on the web and words are not usually separated by spaces; so in addition to teaching our translation system a new language, it also has to learn how to separate words (what we call segmentation).