Words and language – the world has changed

It’s been a while between posts, but I have recently come across a number of language related posts that I think are worth sharing.

For a while there lists of hot new words were all over the internet. That’s slowed down significantly, but I found two recently that are worth sharing. I particularly liked the list of 216 non-English words “referring to emotional states from the world’s languages that have no correlate in English”. I think what I like most about this list – hell, it’s the reason anyone finds it interesting – is because the emotional states referred to are states we can all empathize with, and because of that lack of correlate, the translations sound like poetry.

* Aware (哀れ) (Japanese): the bittersweetness of a brief, fading moment of transcendent beauty.
* Sabi (侘寂) (Japanese): aged beauty.
* Mono no aware (物の哀れ) (Japanese): pathos of understanding the transiency of the world and its beauty.

哀れ, pronounced a-wa-rey (I’m not a phonetician, sorry), is something that I feel regularly – reflections in puddles, the graceful elderly (侘寂) , my partner singing quietly while cooking. I’m also fascinated by the racial profiling I give these words – do I find that these words are interesting

* Dadirri (Australian Aboriginal): a deep, spiritual act of reflective and respectful listening.
* Koselig (Norwegian): cosy, warm, intimate, enjoyable.
* Mbuki-mvuki (Bantu): to shed clothes to dance uninhibited.
* On (恩) (Japanese): a feeling of moral indebtedness, relating to a favour or blessing given by others.
* Peiskos (Norwegian): sitting in front of a crackling fireplace enjoying the warmth.

because of what they convey or because they seem to be culturally perfect in a stereotypical understanding of their respective cultures?

* Cafune (Portuguese): tenderly running one’s fingers through a loved one’s hair.
* Desenrascanço (Portuguese): to artfully disentangle oneself from a troublesome situation.
* Estrenar (Spanish): to use or wear something for the first time.
* Fernweh (German): the ‘call of faraway places,’ homesickness for the unknown.
* Fingerspitzengefühl (German): ‘fingertip feeling,’ the ability to act with tact and sensitivity.
* Gjensynsglede (Norwegian): (noun) The joy of meeting someone you haven’t seen in a long time.
* Guān xì (關係) (Chinese): building up good social karma.
* Janteloven (Norwegian/Danish): a set of rules which discourages individualism in communities.
* Jugaad (जुगाड) (Hindi): the ability to ‘make do’ or ‘get by’.
* Kvell (Yiddish): to feel pride and joy in someone else’s accomplishment.
* Tîeow (เที่ยว) (Thai): to roam around in a carefree way.
* Ubuntu (Nguni Bantu): being kind to others on account of one’s common humanity.

The Dictionary of Fantastic Vocabulary on the other hand, is a list of completely made up words. By the look of it, the words have been created programmatically (ok, proof: look at the end of section for the letter E/e, just up from H) and meanings have been applied later. The beauty here is the recognition that the words don’t exist for a reason – very few of them are easy to say, they *look* clunky. But you could imagine over time being able to introduce some into everyday usage. The idea is greater than the execution, but I think it’s a noble failure. I really should do some analysis on the distribution across the alphabet… dammit, I just went and did it. As you can see, there are only 15 letters represented at all, and over half start with E, A or S. I presume this is a combination of prevalence in English, prevalence of prefixes starting with those letters, and the author’s internalized biases.

Total 1516 Percent of total
E 334 22.03
A 284 18.73
S 148 9.76
D 142 9.37
I 127 8.38
O 109 7.19
C 108 7.12
P 78 5.15
U 53 3.50
H 52 3.43
R 20 1.32
M 19 1.25
T 16 1.06
B 15 0.99
N 11 0.73

The final explicitly word based interest is Helen Zaltzman‘s The Allusionist podcast – not only is it a great short podcast about words, the latest is actually about dictionary.com’s word of the day – which is a mail out with 13 million subscribers. People really do love words.

The final post is an interesting linguistic article I stumbled across titled Your Ability to Can Even: A Defense of Internet Linguistics that starts with “I can’t even” and a friends recent claim “I have lost all ability to can” as a riff on the former:

Loose translation: “This link is so amazing that I have lost my ability to express my appreciation for it in fully formed sentences. All speech has been reduced to this ill-formed sentence. Thus is the depth of my excitement about this. Click on it. Click on it if you too would like to experience this level of incoherent excitement.”

While the article doesn’t address the contemporary obsession with communication via emoji (and its sometimes miscommunication, depending on OS) it does address the new field of “internet linguistics” and a new wave of conservative backlash from those that would have language stagnate. There is also some great gender analysis of the roles played in language creation:

In short, this dialect results when people who already share a language are given new tools. The result isn’t a butchering of English language but a creative experiment with it. Am I claiming that the Internet as a whole is operating on a level of postmodernism that would make Joseph Heller, Kurt Vonnegut and Thomas Pynchon seem like novices? maybe i am maybe im not u punk wut of it like who r u to tell me otherwise

Dr. Tannen does the interesting work of examining gender and tech language. In studying sample text messages, she found that women were much more likely to use enthusiasm markers like exclamation points and add emphasis via capitalization. Most linguists emphasize the lack of understanding that can take place between men and women as a result of the different value that each gender places on conveying emotions. Supposedly, women perceive men’s lack of enthusiasm markers and capitalization as coldness and men perceive women’s use of them to be unnecessary.

However, what I find most fascinating about the Internet Language is that it is making language less, not more, gendered. Men and women on the Internet use many of the same tropes, enthusiasm markers and emphasizers in order to communicate. In the world of blogging and Internet writing, women are the creators of language. It is a realm in which women are not being socialized with already existing language but are doing the work of socializing and creating a community. Women dominate every important social media platform. Women outnumber men on Facebook, Twitter and Pinterest and account for 72% of all social media users. On Tumblr, where the number of men and women is roughly equal, women dominate the conversation.

The Whorf Hypothesis all wrong

I was put onto the podcast Lexicon Valley by pal Fiona Tweedie. One of the first that I listened to was No, Your Language Doesn’t Influence How You Experience the World in which they talk to the linguist John McWhorter about the Whorf Hypothesis (aka linguistic relativity):

the notion that the language you speak affects the way you think, and even influences how you experience reality itself. It’s an attractive idea, and one that makes some visceral sense. English, with its unique structure and grammar and vocabulary, will necessarily bestow a particular worldview that is different from that of Russian, say, or any of the other roughly 6,000 languages still spoken on Earth, right?

McWhorter makes a strong case – interesting in light of The Drama and the Invented Language in which neo-fascists hijacked John Quijada’s conlang Ithkuil in order to “think different” – like a/the master race.

Non Human Languages?

According to the surprisingly still active Slashdot, Researchers Discover New Plant “Language”:

Westwood examined the plants’ mRNA, the molecule in cells that instructs organisms how to code certain proteins that are key to functioning. MRNA helps to regulate plant development and can control when plants eventually flowers. He found that the parasitic and the host plants were exchanging thousands of mRNA molecules between each other, thus creating a conversation.

Ah! Clarity. For a loose definition of language and conversation. I’m ok with loose definitions, a good analogy can help open the mind to new possibilities and potentialities. But I’m still happy to mock a little.



Coded messages to China in Lorem Ipsum?

acb has pointed me to an interesting tale about Google Translate seemingly hiding coded signals in Lorem Ipsum translations. While it all seems a little far fetched or conspiratorial, this story is from Krebs on Security, a well respected blog on all things crypto and security.

I recently discovered that the simple Lorem has been transformed into what can only be described as a post internet: Vegan Ipsum (veggie ipsum), Lorem Bacon, Hipster Ipsum, Lorizzle (Gangster Ipusm), Beer Ipsum, Samuel L Ipsum and teh best of the lot – Picksomeipsum in which you can pick one of the actors Clint Eastwood, Michael Caine, Jim Carrey and or Morgan Freeman as a source, or pit two of the actors against each other:

…ing is about respect. getting it for yourself, and taking it away from the other guy. you want a guarantee, buy a toaster. what you have to ask yourself is, do i feel lucky. well do ya’ punk? cities fall but they are rebuilt. heroes die but they are remembered. multiply your anger by about a hundred, kate, that’s how much he thinks he loves you. no, this is mount everest. you should flip on the discovery channel from time to time. but i guess you can’t now, being dead and all. cities fall but they are rebuilt. heroes die but they are remembered. are you feeling lucky punk don’t p!ss down my back and tell me it’s raining. here. put that in your report!” and “i may have found a way out of here. that tall drink of water with the silver spoon up his ass. ever notice how sometimes you come across somebody you shouldn’t have f**ked with? well, i’m that guy.

Rehabilitated? well, now let me see. you know, i don’t have any idea what that means. it only took me six days. same time it took the lord to make the world. ever notice how sometimes you come across somebody you shouldn’t have f**ked with? well, i’m that guy. man’s gotta know his limitations. mister wayne, if you don’t want to tell me exactly what you’re doing, when i’m asked, i don’t have to lie. but don’t think of me as an idiot. cities fall but they are rebuilt. heroes die but they are remembered. circumstances have taught me that a man’s ethics are the only possessions he will take beyond the grave. this is my gun, clyde! this is the ak-47 assault rifle, the preferred weapon of your enemy; and it makes a distinctive sound when fired at you, so remember it. well, do you have anything to say for yourself? the man likes to play chess; let’s get him some rocks. that tall drink of water with the silver spoon up his ass…

There is, of course, a fairly comprehensive list of Lorem Ipsum generators, and generators available in German, Chinese, Russian and Spanish Ipsum.

The drama and the invented language

Fascinating read in the New Yorker about invented languages – most of which fail -and the other dramas surrounding them. The main focus is Ithkuil, a language invented by John Quijada, but broadly describes conlangs (constructed languages) and their inventors and adherents, sprinkled with interesting linguistic or language facts (George Soros is a native speaker of Esperanto!)

Unlike earlier philosophers and idealists, who believed that their languages could perfect humanity, modern conlangers tend to create their languages primarily as a hobby and a form of self-expression. Jim Henry, a retired software developer from Stockbridge, Georgia, keeps a diary and prays in his constructed language, gjâ-zym-byn. If there is a god paying attention, he is the language’s only other speaker.

Many conlanging projects begin with a simple premise that violates the inherited conventions of linguistics in some new way. Aeo uses only vowels. Kēlen has no verbs. Toki Pona, a language inspired by Taoist ideals, was designed to test how simple a language could be. It has just a hundred and twenty-three words and fourteen basic sound units. Brithenig is an answer to the question of what English might have sounded like as a Romance language, if vulgar Latin had taken root on the British Isles. Láadan, a feminist language developed in the early nineteen-eighties, includes words like radíidin, defined as a “non-holiday, a time allegedly a holiday but actually so much a burden because of work and preparations that it is a dreaded occasion; especially when there are too many guests and none of them help.”

The underlying structure of the language is largely glossed over, although the broad brush strokes are compelling. Most languages have cool tools, little aspects that make it more interesting than other languages, be it situational or grammatical or in lexicon. In Ithkuil Quijada attempted to bring together all of these linguistic wonders into a single language – and then, having read the cognitive linguists George Lakoff and Mark Johnson’s “Metaphors We Live By,” attempted to make a language precise, to remove the need for metaphor.

Quijada opened his presentation the next morning by showing an image of Marcel Duchamp’s “Nude Descending a Staircase, No. 2,” a seminal work of Cubist painting, which captures a figure in motion with abstract lines and planes. It’s not an easy work to describe in any language, but Quijada wanted to demonstrate how one would attempt the task in Ithkuil.

He began with several of the language’s root words: -QV- for person, -GV- for clothing, -TN- for an implement that counters gravity, and -GW- for ambulation, and showed how to transform those roots through each of the language’s twenty-two grammatical categories to arrive at the six-word sentence “Aukkras êqutta ogvëuļa tnou’elkwa pal-lši augwaikštülnàmbu,” which translates roughly to “An imaginary representation of a nude woman in the midst of descending a staircase in a step-by-step series of tightly integrated ambulatory bodily movements which combine into a three-dimensional wake behind her, forming a timeless, emergent whole to be considered intellectually, emotionally, and aesthetically.”

When Quijada is invited to the conference “Creative Technology: Perspectives and Means of Development,” to speak on Ithkuil, he discovers that it is now being used by an odd sect of quasi intellectuals based in a Buddhist state, influential on anti Semitic Ukrainian terrorists and using Ithkuil to literally think different.

“We think that when a person learns Ithkuil his brain works faster,” Vishneva told him, in Russian. She spoke through a translator, as neither she nor Quijada was yet fluent in their shared language. “With Ithkuil, you always have to be reflecting on yourself. Using Ithkuil, we can see things that exist but don’t have names, in the same way that Mendeleyev’s periodic table showed gaps where we knew elements should be that had yet to be discovered.”

Really makes Esperanto seem so run of the mill, doesn’t it?

You can read Quijada’s text online Ithkuil: A Philosophical Design for a Hypothetical Language or purchase the 450 page book from the same site.

A font for all occassions

Google have announced Noto:

Noto is Google’s font family that aims to support all the world’s languages. Its design goal is to achieve visual harmonization across languages. Noto fonts are under Apache License 2.0.

The comprehensive set of fonts and some of the tools used in our development are available at noto.googlecode.com.

 Noto Sans CJK is released with full support for Simplified ChineseTraditional ChineseJapanese, and KoreanLearn more

The page has a neat map to click for languages, where you can choose a subset per nation – for Kenya you can choose from 20 languages; for Australia, a mere three – English, Italian and Traditional Chinese. I’m not surprised there are so few – we barely see any Greek or Vietnamese writing on the streets, but I guess we don’t see much Italian either. Pity there’s not more South East Asian languages in our curricula.

Localised Malware

Trendmicro are reporting seen in the wild localised malware.

The malware strain known as VOBFUS works by copying itself onto removable media like USB sticks with names like porn.exe or sexy.exe. 

This variant also uses file names written in these languages:

  • Arabic
  • Bosnian
  • Chinese
  • Croatian
  • Czech
  • French
  • German
  • Hungarian
  • Italian
  • Korean
  • Persian
  • Polish
  • Portuguese
  • Romanian
  • Slovak
  • Spanish
  • Thai
  • Turkish
  • Vietnamese

While the languages may differ, they all translate to I love youNakedPassword, and Webcam.

I’m surprised that Malware is still a thing at times but then I remember that the whole world is online these days – as this development shows.

Google Translate reaching further

Somehow I missed it at the end of last year, but Google Translate has added nine new languages – four from the African continent, three from Asia, and Maori.

In Africa, we’re adding Somali, Zulu, and the 3 major languages of Nigeria.
  • Hausa (Harshen Hausa), spoken in Nigeria and neighboring countries with 35 million native speakers
  • Igbo (Asụsụ Igbo) spoken in Nigeria with 25 million native speakers
  • Yoruba (èdè Yorùbá) spoken in Nigeria and neighboring countries with 28 million native speakers
  • Somali (Af-Soomaali) spoken in Somalia and other countries around the Horn of Africa with 17 million native speakers
  • Zulu (isiZulu) spoken in South Africa and other south-western African countries with 10 million native speakers
Throughout Asia, we’re launching languages spoken in Mongolia and South Asia.
  • Mongolian (Монгол хэл), official language in Mongolia and also spoken in parts of China with 6 million native speakers
  • Nepali (नेपाली), spoken in Nepal and India with 17 million native speakers
  • Punjabi language (ਪੰਜਾਬੀ) (Gurmukhi script), spoken in India and Pakistan with 100 million native speakers
Thanks to the volunteer effort of passionate native speakers in New Zealand, we’re adding the language of the Maori people.
  • Maori (Te Reo Māori), spoken in New Zealand with 160 thousand speakers