Swedish Idioms

Humour seems to be a natural by-product of cultural difference – there is just no other way to describe the number of posts I’ve made to this end. And in my youth there was Japlish – one of the earliest humour themes on the internet iirc, and before that books were written. And for the record, never let it be said that it’s not reflexive on the English language too.

In this vein I recently I stumbled upon Swedish idioms in painfully literal translation – I present the editors top ten below – click through for the Swedish translations:

It shrieks to clear one self out of the road with the waist intact
The important thing [here] is to get away alive.

Hello jump in the blueberry forest
A cheerful expression to be used when you are a bit surprised.

Now the boiled pork is fried
Equal to american phrase “now you’re in deep shit”

Is it possible over the head taken
“Is it possible at all?” would be the correct phrase.

We are visible
See you later

Hear you you you! What are you holding on with?
Hey you! What are you doing?!

Now you have shit in the blue cupboard
When you really have made a fool out of yourself.

How much yawns the cracker
Stockholm slang for “what time is it?”

That was like the cat
Idiomatic expression – that’s amazing!

With his beard in the mailbox
Caught by surprise; caught with his pants down.

Lictionary: shipping is everything

Yesterday I wrote about Lictionary, “a localization dictionary that presents information repository which is constituted by free softwares” and noted that one of it’s strengths was that it had shipped. What does that mean? In software and technology, it’s generally understood that “release early, release often” gives distinct advantages – failure comes earlier and easier, feedback loops with users are smaller and quicker, the software project will look “alive” to developers and users, and no uses software or sites that never made it out the door. There are plenty of articles about the advantages – here’s one by Matt Mullenweg, a founding developer of the software this site runs, WordPress.

Anyway – after the write up I gave them yesterday I popped over to their site to let them know what I thought and of the most glaring errors – the CDATA and the incorrect language attribution. Pleasingly there was a response in my inbox almost immediately from Türker at TSDesigns. The language attribution error has been fixed already (at least for Indonesian, I’ve not done any further testing), and they certainly are shipping – from what they have said, they only started collecting data last week:

We have started to collect our data last week. We are choosing and indexing a lot of repositories first time. After this first phase completed, we will identify problematic issues and eliminate these problems. CDATA problem is a sample of these situations. We are discussing about this. We can parse CDATA or skip them.

Manually choosing best translation is so hard. There are too many entries in system and there are several “best” translation for some strings in different contexts. So we added voting system. Translations are sorted by vote count. And we hide translations which have many negative votes. We will show best translations at the top with support of our users in the future. And also there is a trick in voting system. We add a positive vote to translations for each file. So mostly used translations have a head start.

Nothing makes me more excited than responsive developers. Can’t wait to see where this goes.

Translation and cryptography: linguistic cousins

Translation techniques like word frequency were used to crack a 17th Century code:

…a team of Swedish and American linguists has applied statistics-based translation techniques to crack one of the most stubborn of codes: the Copiale Cipher, a hand-lettered 105-page manuscript that appears to date from the late 18th century. They described their work at a meeting of the Association for Computational Linguistics in Portland, Ore.

Discovered in an academic archive in the former East Germany, the elaborately bound volume of gold and green brocade paper holds 75,000 characters, a perplexing mix of mysterious symbols and Roman letters. The name comes from one of only two non-coded inscriptions in the document.

Kevin Knight, a computer scientist at the Information Sciences Institute at the University of Southern California, collaborated with Beata Megyesi and Christiane Schaefer of Uppsala University in Sweden to decipher the first 16 pages. They turn out to be a detailed description of a ritual from a secret society that apparently had a fascination with eye surgery and ophthalmology.

Without a doubt the greatest part of being interested in language and science involves watching two of things you love – cryptography and translation in this case – come together in a way that gives you a “why didn’t I think of that” moment. Eureka!

Ask/Tell: A carnival of things

My friend Richard is frankly amazing – his breadth of reading and understanding, thoughtfulness and imagination are quite simply unparalleled by anyone else I know, or know of. Having lived together, on and off, for the best part of the last decade we know each other, and each other’s likes and dislikes, well. Inevitably, this leads to a couple of email’s a week between us, with links of note, youtube videos, memetic images and the like. Occasionally they all come together in one big pastafarian mess – late last week he pointed me to a blogpost I’ve only just got around to reading, an interview with the metaphysical philosopher Graham Harman (Harman’s Wikipedia entry) on a blog called Ask/Tell. The interview is quite long, but worth the effort – covering a number of ideas like language, translation, understanding and the philosophical/political interface with deft finesse. It’s well worth the read:

TB: The world is not made of propositions. Yet any person’s experience can be conceived as being made of language. What is your sense of the limits of language in terms of your practice as a philosopher?

GH: The only limits of language in philosophy are the same limits found everywhere else– language cannot make the things directly present. The things cannot be transmuted directly into language. The attempt to set up rules for how to use language logically to refer to the real world rather than referring to mere illusions is hopeless. We need to be as inventive in our language as Picasso in his depiction of solid objects.

Different personality types dominate philosophy in different eras, as new needs come to the fore. The dominant personality type of recent decades has been the precise and assertive arguer who speaks clearly and likes to call people out on “nonsense.” It’s a personality that holds itself not to believe in very much, but to undercut the gullibility of other people’s beliefs.

My view is that the era of this personality has now run its course, and has become a pestilence of sorts. What we need now is something more like the artist type, given to new ways of staging problems. We need to find the equivalent of “philosophy installations,” whatever that might be.

There are too many calls in philosophy for clear writing, but rarely any calls for vivid writing. I agree that writing should be clear, but if this is your first priority, it means that you think the real problem with most philosophy is obfuscation, muddiness, evasiveness, and so forth. But the real problem with much philosophy is that it simply takes a position in some pre-existing trench war without innovating as to the terms of the problem. The result is an increasing supply of rational but boring assertions, not a fresh rethinking of the problem.

Philosophical language should be primarily vivid, and only secondarily clear. We should be clear when things are clear, but when we reach the edge of what is known, why pretend to know more than we do? I like a philosopher with a sense of when to use chiaroscuro. There are shadows in the world, and good writing should contain corners of shadow as well.

While this isn’t easy reading, there is encouragement from within the text:

TB: I tend to read a difficult long poem—Pound’s Cantos, say, Zukofsky’s “A” , or Stein’s Stanzas in Meditation—in the same way that I read a challenging philosophy text. I suspend any pretence of total understanding and forge ahead. I’m “studiously unprepared” to borrow a phrase from William Carlos Williams. I’m most engaged when I’m at least somewhat textually uncertain. I like having room for improvisational thought. But, you’re right, what sustains me as reader in such situations is vividness.

And this text has vividness in spades:

(GH) To answer your second question, the reason to focus on objects rather than on “language, social change, sexuality or animals” is because philosophy is obliged to be global in scope. If philosophy were to give one of these other entities a starring role, it would have to reduce the rest of the universe to them. “Language is the root of everything.” Here, you are choosing one specific kind of entity to be the root of all others, and there is no basis for this. Sociology tends to view all reality in terms of its emergence from human societies and belief-systems. Psychology treats all reality as made up primarily of mental phenomena. Physics deals with tiny physical objects and says that everything is made out of them, except that physics is useless when trying to explain things like metaphors, the Italian Renaissance, the meaning of dreams, and so forth.

All these other disciplines focus on one kind of object as the root of all else in the world. Only philosophy can be a general theory of objects, describing Symbolist poetry and the interaction of cartoon characters just as easily as the slamming together of two comets in distant space.

TB: … I find myself stuck on your idea of “philosophy installations” and imagining a room full of simultaneous translators amidst a giddy “carnival of things.” …

(GH) … language may still loom large in object-oriented philosophy even though it must be stripped of its transcendental-ontological constitutive power for everything else that exists…

But at the end of the interview, the philosophy is still grounded in the here and now, despite, it could be said, demanding an end to “the trench warfare” of the concept of “here and now” in some ways – a refreshing, self deprecating, perspective:

(GH) There is undeniably a certain banality to the world in our time, a demoralizing commercial hustle. But I’m extremely suspicious of the near-unanimity that prevails in political views in world intellectual circles right now. The price of admission to these circles is a series of expected denunciations that reassure everyone that you’re on their team. This is why I don’t respond immediately to demands to provide a politics of OOO, because I suspect that I’m just being asked to provide the usual, predictable denunciations, just as if I were being ordered to wear a flannel shirt and beard stubble at a grunge music party. That’s not intellectual debate, that’s just group solidarity, and I don’t care how good you think your group is– group solidarity is not a form of thinking.

For example, I’m writing this response from Istanbul, where I saw the 2011 Biennale yesterday. The theme was art and politics, and I was disappointed to find that all the political messages were exactly the same! Everything is America’s fault, Israel’s fault, capitalism’s fault. So, is the answer really that easy, and all we need to do is join forces to fight all the stupid and greedy corporate interests that prevent the truth from prevailing? Maybe, but this smells too much like trench war to me. It looks too much like the very “failure of imagination” of which everyone is so quick to accuse the current system.

There’s a wise old saying: don’t become worse than what you’re fighting. I would put a twist on that and say: don’t become less imaginative than what you’re fighting. This is the big danger for the political Left right now. I’m not interested in its moralistic self-congratulation, but only in what it can build. This is why I loved Žižek’s speech at the Occupy Wall Street protest; he hit the spot and said exactly what needed to be said. Maybe this Left will be able to build quite a lot. We will soon find out, because they are probably on the verge of seizing the upper hand. What is now called neo-liberalism is a little over thirty years old: the California property tax revolt in 1978, Thatcher in 1979, Reagan in 1980. Like any way of looking at the world, it has turned into a robotic application of clichés and no longer seems to be up to the challenge. We are about to undergo a big Leftward swing. When that happens, let’s see what people can do other than critique and oppose. They’ll have about thirty years of leeway before they start to become completely banal themselves, and then we’ll swing in the other direction again in about 2045, just as my own life is coming to a close.

Lictionary: a localisation catalogue

I was thinking about different ways to aggregate all the GPL’d localisation data available online just last week when an email landed in my box via the Django localisation email list informing that Lictionary was now live.

For today, Lictionary.in contains ~160.000 unique strings and ~2.4 million translations in dozens of different languages and grows day by day.

The front page has the now ubiquitous search box, in which you enter a string, choose the language you wish it translated to in the drop down box next to the search bar, and hit go. I started with the simple “Enter” into Indonesian, and soon noted a couple of errors – large chunks of CDATA and a line noting that “201 result(s) found for “Enter” in Bengali“.

Is it perfect? No. Has it shipped? Yes – and that’s the most important thing, presuming they can keep it up. I’d be very interested to see if it would be possible to integrate with Tatoeba since they are delivering a similar product. Lictionary has the advantage of the thousands of translation files available with FLOSS software, but Tatoeba has a nicer interface.

Another concern is that Lictionary depends upon the correctness of the underlying files – any mistakes now need to go through Lictionary, then onto the software project from which they came. The FAQ briefly touches on this, but not enough to fill me with confidence just yet:

– Some translations seems wrong, what can i do?

You can give negative vote for this translation. We inform translators or translation teams periodically about negative voted translations. If you want to inform translator immediately, you can contact the translator or translation team directly or may be file a bug in bufg tracker of related project.

Also, the next most useful thing would certainly be to submit a selection of strings and have the best translations returned – localising software one string at a time would be tiresome for the monolingual software engineer. This is also addressed in the FAQ:

– Is there any other way except webpage to use Lictionary?

Unfortunately, no. You can only use our website to search in our database. We are developing web service interfaces for developers. Soon, we will publish technical details and documentation about these.

I look forward to seeing how this project develops and will be sure to report in as it improves.

Preserving the Balinese language

I was recently contacted by Alissa from BasaBali.org about that organisation’s attempts to preserve the Balinese spoken language using some interesting multimedia resources:

Although Balinese is not an endangered language, it is on sharp decline in the increasing shadow of English and Indonesian.  It is an incredibly rich language (something akin to 13th century Yiddush or Shakespearean English) but with only a million speakers left out of a population of 3-4 million, it is quickly losing traction.

Balinese script, as was brought to much acclaim by Tim Brookes’ Endangered Script Project (which I have written about before, Ed.) , is already endangered. We will have a chapter to teach the script using animation (sample on my website at http://basabali.org/balinese-language-preservation/).

We started a kickstarter campaign (http://kck.st/rpeM26) to try to raise fund to pay the Balinese linguists, videographers, animators, and anthropologists who are working with us.

Why is this important? Well, I think any endeavor to preserve knowledge is important – and this one is particularly so due the the fact that, as noted on the site “(e)xcept for a few print books, there are almost no language materials for Balinese, anywhere in the world.”

As someone that’s had the pleasure of spending time in Indonesia, I think the need is probably widespread – my time on Java showed me that the Javanese language changed from city to city if only in increments – and again, this is not the Indonesian that is the official language of the archipelago.

If you can spare a few dollars, I think this is a great cause. Given the technological advances of the last (insert timeframe here) there is no reason why any language should disappear, in any way. There may no longer be any native speakers or writers of a script – but the ability to record this information is now available at a reasonable price – almost free – and the relevant institutions should be doing all they can to preserve them.

Google updates Translate for Android

Google have announced an update to the Google Translate app for Android, including an expansion of Conversation mode, released earlier this year with only English<->Spanish translations:

We began with just English and Spanish, but today we’re expanding to 14 languages, adding Brazilian Portuguese, Czech, Dutch, French, German, Italian, Japanese, Korean, Mandarin Chinese, Polish, Russian and Turkish.

We’ve also added some other features to make it easier to speak and read as you translate. For example, if you wanted to say “Where is the train?” but Google Translate recognizes your speech as “Where is the rain?”, you can now correct the text before you translate it. You can also add unrecognized words to your personal dictionary.

The application doesn’t loose it’s previous translation functions:

text translation among 63 languages, voice input in 17 of those languages, and text-to-speech in 24 of them.

Apple announced Siri last week to much adoration from fans and mocking from those with Android phones that have had this ability for at least a year. Siri only works in three languages – English, German, French – and has a slightly different focus, but convergence between the two functions can’t be far off.


YouTube’s Audio Transcription

Given the subject of my last post was Google and voice recognition, I thought that this video is timely. Titled Caption Fail 2 (the original Caption Fail is also available) it uses YouTube’s Auto Transcription mechanism to “play Telephone” (aka Chinese Whispers) – the game in which a message is passed from person to person in serial, and is transformed into something completely new – and often surreal – just like a bad machine translation.

Personally I believe that this game is always affected by the fact that the subject’ss knowledge of the study or outcome causes them to alter their behaviour – in this case, to deliberately change what they have heard (like the Hawthorne Effect, but not quite). While YouTube isn’t doing this to the performers, Rhett and Link, they have obviously chosen scripts that are a mouth full in order to trip the software up. I wonder how many takes they needed to shoot and script re-writes it took to get a sufficiently entertaining result?