Mostly about language

I don’t like blogging like this, but it’s hard to find the time with an intermittent Internet. I find titbits, but I rarely follow links – I’ve not watched an online video in almost a year and my inbox has an email thread containing 276 emails with over 400 links to “revisit” once I return to the land of faster bandwidth. As though anyone on the Internet has time for 400+ old links.

However as someone that is interested in language, it behooves me to relay this content that I’ve found.

I don’t know why I have a low opinion of Will Self, but I do. As a self important anarchist I think that I rub up against other self important *ists. Despite this I found his latest piece for the BBC, In defence of obscure words, a rollicking good skewering of the stupid, the vapid, the empty. Be it expressing a love of words and language and using them:

I’d point out that my texts were as full of resolutely Anglo-Saxon slang as they were the flowery and the Latinate. I’d observe that English, being a mishmash of several different languages, had a large and exciting vocabulary, and that it seemed a shame not to use it – especially given that it went on growing all the time, spawning argot and specialist terminology as freely as an oyster does its milt.

or the end result of a culture built by the risk adverse:

But now that all formerly difficult subject matter is, if not exactly permitted, readily accessible, cultural artificers have no need to aim high. The displacement of aesthetically and intellectually difficult art as the zenith has resulted in all sorts of sad and interrelated phenomena.

In the literary world, books intended for child readers are repackaged and sold to kidult ones, while even notionally highbrow arbiters – such as Booker judges – are obsessed by that nauseous confection “a jolly good read”. That Shakespeare remains our national writer is, frankly, bizarre, given that with his recondite vocabulary, myriad historical references, and convoluted metaphorical language, were he to be seeking publication in the current milieu, his sonnets and plays would undoubtedly also be branded as ‘too difficult’.

As for visual arts, the current Damien Hirst retrospective at Tate Modern is a perfect opportunity to see what becomes of an artificer whose impulse towards difficult subject matter was unsupported by any capacity for hard cogitation or challenging artistry. The early works – the stuffed animals and fly-bedizened carcasses – retain a certain – albeit recherché – shock value, while the subsequent ones degenerate steadily to the condition of knocked-off merchandise, making the barrier between the gift shop and the exhibition space evaporate in a puff of consumerism.

But the most disturbing result of this retreat from the difficult is to be found in arts and humanities education, where the traditional set texts are now chopped up into boneless nuggets of McKnowledge, and students are encouraged to do their research – such as it is – on the web.

I quite enjoyed the brief moment of intellectual challenge that he poses.

Which is why I now turn to more a phenomena that really only exists because of the Internet but grew from the old style newsprint tropes “Word of the day”, maybe combined with “What in the world” – the longer form list of obscure, obtuse, unused, hard to translate or extinct words. Usually in groups of five, eight or ten. I’m not immune to posting links these lists here on Pineapple Donut, but it’s not often that it’s done anew – as an infographic and without the pronunciation of the words. And to stick it up to Mr Self, I found it though the most internet of ways – in RSS from a tumblr called this isn’t happiness, via mentalfloss, and then PopSci, to the original artist’s site, 21 Emotions with No English Word Equivalents.

At first I was put off by the filter of emotive words, but I came around as I thought about it – not only was Pei-Ying’s choice considered in that it provided a focus that’s easy to explain, empathise with and understand, but it gave her the opportunity to explore feelings that don’t have words in English, or any other language presumably, but are unique and identifiable to the (ahem, current) internet age. Unfortunately the artist’s site was so popular after the various postings that their broadband limit has been blown, or 509′d in tech speak.

I didn’t know that the Talkly awards even existed, but the Crikey language blog, FullySic, noted that last year it given to Ingrid Piller. Awarded for an individual who has done the most to increase public knowledge about language, she sounds like the person we would most like to be sitting next to on the 6am flight from Nadi to Tarawa.

Cory Doctorow fires up more passion in people than I’d expect – I find him interesting, intelligent and sometimes even enthralling, but the argy bargy that follows him is hard for me to comprehend. He writes for the Guardian on the difference between value and price in the internet era, largely focusing on positive externalities and their exploitation. Most interesting to me is his use of Google and it’s approach to translating.

A positive externality arises when you do something you want to do that also makes life better for someone else. For example, if you drive your car slowly and carefully to avoid a wreck, a positive externality is that other users of the road have a safer time of it, too. If you keep up your front garden because it pleases you, your neighbours get the positive externality of slightly buoyed-up property values from living on a nicely kept street.

Positive externalities — virtuous cycles — are all around us. Your kid learns to speak because of all the people around her who carry on conversations and because of the TV shows and radio programmes where speaking occurs (as do immigrants like my grandmother, whose English fluency owes much to daytime TV after she came to Canada from Russia).

Google is a case-study in harvesting positive externalities. It offered a free, voice-based directory assistance number, and used the interactions users had with its software to build a corpus of common phrases, expressed in multiple accents and under a wide range of field conditions. Then it used this to train the voice-recognition software that powers its Android-based phone-search. Likewise, it mined all the publicly available translations on the web – EU documents that appeared in multiple languages, fan-based translations for subtitles on cult cartoons, and everything else it could find – and used this to train its automated translation engine, providing it with the context that it needed to figure out the nuance and sense of ambiguous phrases.

He contends that the defining mania of the internet era is

resentment over positive externalities. Many people and companies have concluded that if someone, somewhere, is getting value from their labour, that they should get a cut of that value… Many people have accused Google of “ripping off” the public by indexing content, or analysing it, or both. Jaron Lanier recently accused Google of misappropriating translators’ labour by using online translated documents as a training set for its machine-translation engine – an extreme version of many labour-oriented critiques of online business.

leading to

the infectious idea of internalising externalities turns its victims into grasping, would-be rentiers. You translate a document because you need it in two languages. I come along and use those translations to teach a computer something about context. You tell me I owe you a slice of all the revenue my software generates. That’s just crazy. It’s like saying that someone who figures out how to recycle the rubbish you set out at the kerb should give you a piece of their earnings. Harvesting positive externalities involves collecting billions of minute shreds of residual value – snippets of discarded string –and balling them up into something big and useful.

While I enjoy his take, either he or Lanier has missed the mark. If Lanier’s critique was purely about the Google Translation Toolkit it would be understandable, but as is pointed out in the comments – the EU have made the translations available for exactly that purpose. Similarly, all the Free and Open Source software translation files have been there in the public domain waiting to be harvested since the movement started in the early 1990s – it was just a matter of someone thinking to harvest the files, and having the hardware and technical expertise to do so. And indeed, those files remain open source – someone else is welcome to harvest the same files. Google hasn’t locked them up. The Translation service on the other hand, asking for Translator’s Translation Memories and storing them – that is taking other people’s work. I guess the question then becomes can Google guarantee that they haven’t used those TMs in their translation service.

Finally, for the real language nerds, Matt Might’s The language of languages is a healthy, if slight, refresher on context free grammars:

Languages form the terrain of computing.

Programming languages, protocol specifications, query languages, file formats, pattern languages, memory layouts, formal languages, config files, mark-up languages, formatting languages and meta-languages shape the way we compute.

So, what shapes languages?

Grammars do.

Grammars are the language of languages.

Behind every language, there is a grammar that determines its structure.

This article explains grammars and common notations for grammars, such as Backus-Naur Form (BNF), Extended Backus-Naur Form (EBNF) and regular extensions to BNF.

The discussion on context sensitive grammars and parsing is poorly explained to my mind, in need of more explanation  and the article in general could be more interesting to the non computer scientist with a little more work. A primer only really.

Google Translate: the written word

Over at Google, the New Years present for 2012 is titled Sometimes it’s just easier to write. An update to the Google Translate app for Android in which one can enter characters via the touch screen:

Our goal is to break down the language barrier, all the time, everywhere. By adding handwriting input directly into our Android app we hope to help you get translation done even more quickly and easily. Sometimes you don’t know how to say what you want translated, sometimes you can’t type it, and sometimes it’s easier just to write it. We think of handwriting on the touchscreen as another natural input…

This is still an experimental feature. It’s available in Chinese and Japanese, and you can enable it for English, French, Italian, German, and Spanish if you like. (We currently only support single-character input for Chinese and Japanese.) Just as with speech recognition and our translations themselves, our handwriting recognition happens in the cloud, allowing us to continually improve accuracy without requiring you to download new versions of the app.

A list of almost every writing system

Boingboing recently posted a link to Omniglot, “an online encyclopedia of writing systems and languages” that looks very interesting. There’s the page of translated phrases (eg My hovercraft is full of eelsOne language is never enoughIt’s all Greek to me), the list of language names, an index of languages by writing system (who knew the Canadian aboriginals had a writing system, or that it was strikingly sharp), and a long list of curated articles about language.

On the fascism of Grammar

I don’t know who put me onto this two part essay on grammar yesterday (I feel like it was Superlinguo, but I could be wrong), but I’ve enjoyed reading/chewing on it. It starts as a piece on why grammar purism is annoying, distracting and misplaced:

When my father is interacting with people who find out he is a doctor, he often hears, “I have a medical question for you.” My sister, an accountant gets, “I have a tax question for you.” I feel particularly bad for my brother-in-law, who is both an accountant and a lawyer and who probably not only has to field general tax and legal questions but the questions of people who are in legal trouble because of their taxes. But when people find out I’m an English teacher, they often say, “I have a grammar question for you…

A big part of the problem, in my estimation is that we as a society–even the most overeducated among us–have a poor grasp of what grammar actually is and what role it plays in writing. So here it is: grammar is a set of standards that we as a linguistic group have agreed upon to help us understand one another. Those rules tend to be culturally and regionally specific and change over time. No one descended from a mountain with two stone tablets reading, “Though shalt not use a preposition at the end of a sentence.” Adhering to grammar guidelines is about making sure that you are understood. It’s also about self-presentation, but it’s not about adhering to some sort of moral code.

Grammar too often gets confused with what it is designed to produce, which is fluency. Fluency here is defined not just by your ability to speak or write in a particular language but by a certain facility with that language, the ability to make words do exactly what you want them to do, to make them sparkle and titillate and inspire, to not just say the right thing but to sound good doing it. And that may or may not include utilizing proper grammar. Often fluency means learning precisely when to follow the rules and when to break them, to tune the correctness of your usage to the expectations of your audience (idiom!). Or to use non-standard constructions for effect (Iseewhatyoudidthere). Fluency is the ability to say exactly what you mean exactly how you want, which is harder than it sounds.

I’ve written previously on language mutability in the case of Indonesian punk rock band Punkasila and why I think it’s important. In Punkasila’s case we see language and art sitting side by side – and we while we see language moving, when the art doesn’t move, it loses all power to effect change. This piece attributed to Mark Twain, and Valerie Yule’s long career as an educator have been my two go to references, this will be my third.

As I write this, the music of artist Dual Core has come on and realise that hip hop threw grammar out the window over twenty years ago and hasn’t seen a reduction in popularity as a result. Criticisms of the genre have never been “that was poorly articulated”, quite the opposite in fact – when an MC can “make the words flow”, or express meaning in a clever and unique way, they are lauded.

While the headline I’ve chosen is overblown, my essential concern is one of conservative thought versus progressive thought. If we don’t sculpt our language in such a way that we can express new ideas, or old ideas and beauty in new ways, we run the risk of stagnation. A rusting on of ideas, an increasing boredom with beauty and difference. And that’s not the world I want to live in.

Part two of this essay is less rant, more literature – but has it’s own beauty. In particular, it address the idea of language formation moving between languages, in relation to Rushdie’s The Satanic Versus, and the richness that it provides

However, you also have to account for the fact that Rushdie often uses the speech patterns of Central Asian English speakers in his prose, and that is part of what de-familiarizes it, though in an intriguing way, I think. There is an aural quality to his writing that makes for great out-loud reading. As an Indian man who grew up in the wake of the British Raj and inhabits a globalizing society, he is interested in how linguistic groups from the former colonies have adapted the language of their colonizers. But he isn’t exactly doing dialect, which has historically been used as a kind of literary black-face. He isn’t trying to convey a character’s accent through non-standard spelling. Instead, he reproduces the idiom and cadence of those speech patterns, which is really effing cool.

It is for this reason that I don’t believe that translators and interpretors need worry about their working futures – computing has a long way to go before it can weave this magic.

Superlinguo – all things linguistic-y

I’ve been reading the Superlinguo blog recently, as one of those responsible, Georgia, also appears on Triple R‘s (my local indy radio station) tech show, Byte Into It.

I’ve been meaning to bring them up for a while – there’s an interesting post about the World Oral Literature Project that I thought was very interesting – they have helped fund a dictionary for Lamjung Yolmo, a Nepalese language or dialect.

The World Oral Literature Project are working with communities and linguists all over the world to try and make accessible records of languages that are dying out at a rapid rate. Their particular interest is in capturing those stories, poems and bits of cultural lore that are often lost when a community no longer speak their ancestral language. They have small grants to help people work towards recording these stories and tales, but they also do public lectures and workshops in developed countries to show people who might have never contended with the loss of a language exactly what is as stake.

Then of course, they forced my hand today by posting a cornucopia of linkage to all sorts of language and linguistic goodness that will no doubt make it hard for me to get any work done today. As you can see, a lot of our (well, my) favourite topics are covered – X number of words for Y, untranslatable phrases, crowd sourced translations and endangered languages:

Lynneguist spent a month looking at words that don’t really translate very well between American and British English, as an Australian I’m unsurprised that we share so many commonalities with both – but also amused at how many words from either language haven’t found their way to Australia. Johnson also investigated the great Atlantic linguistic divide, looking at just how Brits living in the USA have adapted to local pronunciation. Results come in colourful pie charts.

Fritinancy reintroduced us to some tech jargon already lost to history, and some that still survives. Stan Carey gives us an introduction to how Klingon was invented, and while still on something of a Scifi theme introduced us to the Spaceage Portal of Sentence Discovery. And while we are looking into the future, the folk at MacMillan reported on the future of dictionaries from the 2011 eLEX conference.

While the internet is having affects on the way dictionaries are being used, Piers Kelly at Fully (Sic) also showed us that crowd-sourcing can be great, with a project currently underway to translate ancient Greek texts. You don’t even need to know any Greek to help out. And on the topic of the internet making research more wonderful, the Australian Society for Indigenous Linguistics have made a large segment of their collection publicly accessible – Thanks to Jane Simpson at the PARADISEC Endangered Languages and Cultures blog for letting us know the good news.

Some quick links – Language Hat asked about the history of movie pidgins, Arnold Zwicky puzzles over some tricky alphabetising and that guy over at Dialect blog talks about guy, as do those guys at Lingua Franca. Ben Zimmer discovers that Kate Bush shows remarkable creativity in her list of 50 words for snow but as Geoff Pullum, over at Lingua Franca, discovers not everyone is as well educated when it comes to knowing how many words Eskimos have for snow (clue: it’s not fifty).

It’s now a solid part of my morning reading ritual through my RSS reader. Recommended reading.

Artatak: Remapping words

While I was living in Yogyakarta in 2008 I had the pleasure of sharing the space at the now defunct (sads!) Mes56. Katerina Valdivia was also staying there at the time. I will always remember the Argentinian/German New Year’s eve feast she cooked on my first night there  - I was just recovering from Dengue fever, having spent Xmas in a fevered stupor – and it was one of the greatest feasts I’d ever tasted.

I got an email from Katerina last week advertising her latest show, titled Remapping Words. Instant attention grabbing headline in my book:

Sometimes words become the staging of symbolic spaces, that attempt to change reality. This is one of the aims of the piece Resignation by Lisha, a work that follows a strategy of redirecting a meaning by altering or adding words. The artist intervenes in the public space subverting the rules that organise it.

With the work Investir, Valeria Schwarz created a participatory and dialogic piece based on three month of Facebook and online chats with people from North African countries. Taking some of the  phrases of their conversations, the artist inserted them in daily life situations in the city of Murcia, Spain. With this, these sentences acquired another meaning through the new geographical context in which they were presented.

Using subtitles, Stine Eriksen creates in the video Choreography #1 a tension between the word and its display, showing the impossibility of words to fill the absence on which language is based.

I can’t make it (wrong side of the planet), but if you are near Berlin – check it out and let me know!

 

 

Translating large numbers

Is trillion the new billion? in the BBC news magazine looks at various aspects of the large numbers. Historically, they were first documented in French:

The words billion and trillion, or variations on them, were first documented by French mathematicians in the 15th Century.

Then bought to England via John Locke in 1690:

as a useful term for avoiding “the often repeating of millions of millions of millions etc”. The French had purposely coined “billion” a 100 or so years earlier to denote the second power of a million (“bi” being the standard prefix for two)

But it’s usage was morphed separately by the British and Americans to mean the second power of a million and one thousand million respectively until

in 1974, Harold Wilson pledged that the British government would adopt the “short scale” naming system used in the US to avoid ambiguity. As a result, the value of billion is now generally understood to mean a thousand millions.

Most probably because it’s hard to imagine even needing a number as large as a million million. But of course, with the advent of advances in computing in particular, things that were once easily measured in the thousands – like MBs of storage available for instance – are now measured in much much larger numbers.

One way to enhance understanding is to divide a big number by the number of people affected, he says, so if the population of the eurozone is about 330m, then a trillion shared represents about 3,000 euros for each person. Another way is to count the numbers one at a time, one per second. A million seconds is 11 days, a billion seconds is about 32 years and a trillion seconds is 32,000 years.

Traditionally trillion was used as a euphemism for “a shockingly large number”, but that usage no longer has resonance given that it can be used in regular conversation without batting an eye – as seems to happen when discussion trade deficits or governmental budgets these days.

As well as the mathematical reality that numbers really are getting bigger, there is also a wilful repetition of words like trillion, says lexicographer Susie Dent.

“The use of ‘trillions’ in our general conversation is part of a trend towards linguistic inflation or ‘bigging up’.

“Some words are used to the point of exhaustion and need replacing with others in order to maintain the strength of expression. So ‘heroes’ are now ‘superheroes’, we’re not just angry any more, we are ‘incandescent with rage’, and ‘tragedy’ is losing its power because it’s used for less than tragic events.

And words which previously had sufficient power in themselves are attracting prefixes such as uber- or mega- in order to re-energise them, she adds.

For reference, the short scale represents numbers as follows:

The new trillions

  • Trillion – 1 + 12 zeros (1 000 000 000 000, the “long scale” “billion”)
  • Quadrillion – 1 + 15 zeros (1 000 000 000 000 000)
  • Quintillion – 1 + 18 zeros (1 000 000 000 000 000 000)
  • Sextillion – 1 + 21 zeros (1 000 000 000 000 000 000 000)
  • Septillion – 1 + 24 zeros (1 000 000 000 000 000 000 000 000)
  • Octillion – 1 + 27 zeros (1 000 000 000 000 000 000 000 000 000)
  • Nonillion – 1 + 30 zeros (1 000 000 000 000 000 000 000 000 000 000)
  • Decillion – 1 + 33 zeros (1 000 000 000 000 000 000 000 000 000 000 000)