<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jungle light speed</title>
	<atom:link href="http://www.junglelightspeed.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.junglelightspeed.com</link>
	<description>language and the desire to connect</description>
	<lastBuildDate>Thu, 22 Mar 2012 15:23:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Take English, add awesome</title>
		<link>http://www.junglelightspeed.com/take-english-add-awesome/</link>
		<comments>http://www.junglelightspeed.com/take-english-add-awesome/#comments</comments>
		<pubDate>Thu, 22 Mar 2012 15:23:11 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Sociolinguistics]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=659</guid>
		<description><![CDATA[We may not get many more languages in the world. New languages usually form in linguistic isolation and in a connected world this possibility is greatly reduced. All Languages constantly evolve, so change is inevitable, but a language branching into two will be unlikely. This has mostly been true for English. While the Roman empire [...]]]></description>
			<content:encoded><![CDATA[<p>We may not get many more languages in the world. New languages usually form in linguistic isolation and in a connected world this possibility is greatly reduced. All Languages constantly evolve, so change is inevitable, but a language branching into two will be unlikely. This has mostly been true for English. While the Roman empire had collapsed for long enough that Latin branched to become French, Italian, Spanish, etc, there wasn&#8217;t enough time between the dismantling of the British empire and the proliferation of telecommunications, only distributing long enough for a few minor variations across the Americas, Asia, Africa and the Pacific; perhaps not even enough to be considered different dialects.  </p>
<p><div id="attachment_663" class="wp-caption alignleft" style="width: 193px"><a href="http://www.junglelightspeed.com/files/curious_pigeon.jpg"><img src="http://www.junglelightspeed.com/files/curious_pigeon.jpg" alt="" title="curious_pigeon" width="183" height="221" class="size-full wp-image-663" /></a><p class="wp-caption-text">No, not you.</p></div>An exception to the absence of new languages can be found in Creoles. When speakers of different languages come together and need to communicate they will often create a very simple vocabulary and grammar for interacting, known as a <a href="http://en.wikipedia.org/wiki/Pidgin" target="_new">Pigin</a>. Sometimes it will be based on one language, or sometimes a blend of two like &#8216;<a href="http://en.wikipedia.org/wiki/Russenorsk" target="_new">Russenorsk</a>&#8216; a pigin drawing from both Russian and Norwegian vocabulary to enable trade in the fishing industry, with about 400 words; mostly fish-related words (there were few famous Russenorsk poets). When speakers maintain separate primary languages then the trade languages often remain a pigin. When the speakers adopt the pigin as a primary language, it quickly develops into a Creole &#8211; a full mixed-language &#8211; and achieves independent language status in just a generation or two, often retaining the name &#8216;Creole&#8217; while continually evolving. </p>
<p><div id="attachment_662" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/russenorsk.jpg"><img src="http://www.junglelightspeed.com/files/russenorsk-300x247.jpg" alt="" title="russenorsk" width="300" height="247" class="size-medium wp-image-662" /></a><p class="wp-caption-text">The renowned Russenorsk Poet, Radost Severormorsk, wooed his wife with his ballad &quot;I enjoy you like a herring enjoys krill&quot;. Her joy is clear.</p></div> This has happened to English a number of times. I&#8217;ve just returned from Sierra Leone where the linga franca is <a href="http://en.wikipedia.org/wiki/Sierra_Leone_Krio_language" target="_new">Krio</a>. As a linguist it has always been fascinating to listen to such a familiar sounding language, especially when you listen closely and realize that you understand almost nothing that is said. When I l<a href="http://www.junglelightspeed.com/the-smallest-signal/">ast blogged about communication in Sierra Leone I spoke of drop-dialing</a>, or &#8216;flashing&#8217;, which is still alive and well. But when back there it seemed unfair to be writing only about non-verbal communication, so I thought I&#8217;d add an article about Krio.</p>
<p>In North America in the 17th century, slaves from West and Central Africa begun learning an English-based pigin that quickly evolved into a Creole used throughout the region. It can still be found today in <a href="http://en.wikipedia.org/wiki/Jamaican_Patois">Jamaican Patois</a> and, as I stumbled upon by accident a few years ago, in the islands off Belize (although the standard greetings are different, which I also stumbled over accidentally, to everyone&#8217;s confusion). There are even a few speakers left in the USA (now all elderly) in South Carolina and Georgia where it is called <a href="http://en.wikipedia.org/wiki/Gullah_language">Gullah</a>. For years people thought that Gullah was simply an impoverished English, not realizing that it was a full, rich language, one of the few to ever have been derived from English, and a unique window into one of the most important times in America&#8217;s history.</p>
<p><div id="attachment_670" class="wp-caption alignleft" style="width: 310px"><a href="http://www.junglelightspeed.com/files/IMG_2185.jpg"><img src="http://www.junglelightspeed.com/files/IMG_2185-300x224.jpg" alt="" title="IMG_2185" width="300" height="224" class="size-medium wp-image-670" /></a><p class="wp-caption-text">A Krio house in York on the Freetown Peninsula</p></div>Freetown, the capital of Sierra Leone, was primarily settled by freed slaves from the Americas, also known as &#8216;Krios&#8217;. It draws on influences from dozens of languages, including many West African and European languages and (to confuse things a little) a separate English-based trade pigin that was already in use in West Africa. To this day, the Krios are recognized as both an ethnic and linguistic group in their own right. There is even a certain style of &#8216;<a href="http://www.chronicleworld.org/archive/krioarch.htm">Krio architecture</a>&#8216;. The language is widely spoken across the region, helped greatly by being the language of choice for much of the music broadcast across the nation (see below).</p>
<p>That new languages can spring up so quickly from pigins, and that they come to independently develop grammars so similar to existing languages, has been the source of much debate among linguists, especially as evidence for the possible universal qualities of human language and its cognitive roots. While the jury is still out on the grammar, all linguists agree on the universal that the most interesting idioms in every language are to do with sex, relationships and insults (or all three). So I&#8217;m taking the chance to add 10 favorites to my previous report on &#8216;flashing&#8217; (from memory so excuse the spellings):</p>
<p>1. <em>id de han bag.</em> A little hand-bag (one of those tiny just-under-the-armpit ones). It took me a while to work out the insult- if you go out only carrying a very small handbag then you are expecting a man to pay for everything. In other words, you use this phrase to acuse someone of being a bit prostitutey. Sounds odd, I know, but I was able to use the fallout from someone using this phrase to negotiate a better rental deal for our NGO there, <a href="http://www.energyforopportunity.org/en/home/">Energy for Opportunity</a> (long story).</p>
<p><div id="attachment_671" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/IMG_2055.jpg"><img src="http://www.junglelightspeed.com/files/IMG_2055-300x224.jpg" alt="" title="IMG_2055" width="300" height="224" class="size-medium wp-image-671" /></a><p class="wp-caption-text">The view from my balcony office in Aberdeen, Freetown. On the bottom right is my neighbor singing loud covers of Elton John hits in his underwear. He sung mainly in English and was half Russian ... I should have asked if he new Russenorsk</p></div>2. <em>e geh blue tooth</em>. He/she has blue tooth (&#8216;e&#8217; means &#8216;he&#8217; or &#8216;she&#8217;). Internet is not widely used, so data is transfered by sms, memory stick, or (most openly and dangerously) via blue-tooth. So if you accuse someone of having blue-tooth (man or woman) you are accusing them of being too &#8216;open&#8217; to others. In other words, a slut. (I love that this term is based on a data-transfer analogy).</p>
<p>3. <em>keep a stick behind door</em>. As it sounds. Wild (or wild-ish) dogs can be a problem, so it&#8217;s a good idea to keep a stick handy by the door when you go out, just in case. The implication has nothing to do with dogs, of course &#8211; it means that a backup boyfriend or girlfriend should be kept at the ready.</p>
<p>4. <em>take more than one man to fill box</em>. Similar to the previous but just about keeping a spare (or extra) man around. Don&#8217;t make me spell it out&#8230;</p>
<p>5-8. <em>dry, super slim, straight cut, old stock</em>. Female figure types, meaning (in order): too thin/leathery, attractive, elegant, and (a friend carefully explained) someone who has &#8216;earned their body&#8217;. </p>
<p>9. <em>boxer</em>. Someone who is tight with their cash &#8211; the closed fist of a boxer not letting go of money.</p>
<p>10. <em>greased</em>. Damaged. I can&#8217;t work out the history of why &#8216;greased it&#8217; means &#8216;damaged it&#8217;, but it&#8217;s just so much fun to use: &#8220;I greased the car&#8221;, &#8220;I greased the test&#8221;, etc.</p>
<p>I&#8217;m sure I&#8217;m barely scratching the surface. Beyond the idioms, the grammar patterns differently too. One thing that stood out was the use of &#8216;for&#8217; as an infinitive. So instead of saying &#8220;I want to go&#8221; you would say &#8220;I want for go&#8221;. The &#8216;to&#8217; / &#8216;for&#8217; alternation is pretty arbitrary, but for some reason the little changes in the actual grammar stood out more than the changes in the words themselves. You can see this in (Lady) Laurish&#8217;s &#8216;Lose your love&#8217; video clip<br />
where she says &#8220;so try for understand I don&#8217;t want for lose your love&#8221;</p>
<p><iframe width="420" height="315" src="http://www.youtube.com/embed/NeDZ5-BQ2fs" frameborder="0" allowfullscreen></iframe></p>
<p>It&#8217;s a pretty good intro to Krio, although she speaks more on the English side of things.<br />
For a more deep Krio, check out the Krio figures of speech:</p>
<p><iframe width="560" height="315" src="http://www.youtube.com/embed/_wW2oCHFjsk" frameborder="0" allowfullscreen></iframe></p>
<p>Did you notice the (cleaner) variation on the &#8216;stick behind door&#8217; idiom? Don&#8217;t believe for a second that&#8217;s the first thing that people will think <img src='http://www.junglelightspeed.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>And just to balance out the languages, here is Sorie Kondi singing &#8220;Without Money, No Family&#8221;</p>
<p><iframe width="420" height="315" src="http://www.youtube.com/embed/5o8dl-E9K_I" frameborder="0" allowfullscreen></iframe></p>
<p>The first verse is sung in Loko, the second in Krio, and the third in﻿ Temne. There are at least a dozen commonly spoken languages across the country, all with their unique ways to insinuate and insult &#8211; wish I could share more about them all!</p>
<p>Rob<br />
22 Mar 2012</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/take-english-add-awesome/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Published!</title>
		<link>http://www.junglelightspeed.com/published/</link>
		<comments>http://www.junglelightspeed.com/published/#comments</comments>
		<pubDate>Tue, 20 Dec 2011 21:12:26 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Fieldwork]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[Linguistics]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=632</guid>
		<description><![CDATA[It&#8217;s out! After ten years of publishing only at conferences (which is more standard in computer science) I have an actual journal article: Robert Munro, Rainer Ludwig, Uli Sauerland and David Fleck. 2012. Reported Speech in Matses: perspective persistence and evidential narratives. International Journal of American Linguistics. 78:1, 41-75. (http://www.jstor.org/pss/10.1086/662637) I wrote a previous blog [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s out! After ten years of publishing only at conferences (which is more standard in computer science) I have an actual journal article:</p>
<p>Robert Munro, Rainer Ludwig, Uli Sauerland and David Fleck. 2012. Reported Speech in Matses: perspective persistence and evidential narratives. <em>International Journal of American Linguistics</em>. 78:1, 41-75. (<a href="http://www.jstor.org/pss/10.1086/662637">http://www.jstor.org/pss/10.1086/662637</a>)</p>
<p><div id="attachment_639" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/amazon_dugout.jpg"><img src="http://www.junglelightspeed.com/files/amazon_dugout-300x224.jpg" alt="" title="amazon_dugout" width="300" height="224" class="size-medium wp-image-639" /></a><p class="wp-caption-text">On the way to work</p></div>I wrote a <a href="http://www.junglelightspeed.com/fieldworking/" title="Fieldworking">previous blog entry about one day of fieldwork</a> so I&#8217;m going to devote this one to the combination of linguistic theory and travel that got me there. How exactly did I end up spending eight days traveling to study just one aspect of such a remote language? </p>
<p>Noam Chomsky. </p>
<p>Of course.</p>
<p>Along with two other language researchers he published <a href="http://www.sciencemag.org/content/298/5598/1569.short">an article in <em>Science</em> that suggested that every human language (and no non-human language) contained recursion</a>. It is an issue that goes right to the heart of human existence &#8211; what mental capacities separate us from other animals (if any) and how is this manifested in language, the most complex interaction of cognitive and social action?</p>
<p>For Chomsky, the primary distinguishing feature is recursion. Recursion, put simply, means any process or structure that self-embeds. The idea that human languages have recursion is easy to demonstrate. In fact, I just did. The last sentence had another one embedded in it. That is, &#8220;human languages have recursion&#8221; was embedded within &#8220;the idea is easy to demonstrate&#8221; (I know, very meta). The theory can be disproved in two ways: finding an animal-language with recursion or finding a human language without recursion.</p>
<p><div id="attachment_643" class="wp-caption alignleft" style="width: 252px"><a href="http://www.junglelightspeed.com/files/images.jpeg"><img src="http://www.junglelightspeed.com/files/images.jpeg" alt="" title="images" width="242" height="208" class="size-full wp-image-643" /></a><p class="wp-caption-text">Unlike the European Starling, no-one doubts the recursion of birds in turducken.</p></div>Proving that only humans have recursion in their language has to led to a lot of monkey-time, with people coercing poor apes in to admitting &#8220;the idea that bananas are very tasty is correct&#8221; (or something along those lines). There have also been some very heated debates with <a href="http://www.nature.com/nature/journal/v440/n7088/abs/nature04675.html">a counterargument in <em>Nature</em> that claimed that the European Starling has a recursive song</a>. Yes, grown men and women have argued about the recursive nature of bird-songs.</p>
<p>Proving that <em>all</em> human languages have recursion means studying every possible outlier. Not every language embeds sentences like those above and would be limited to sequential ordering. For example: &#8220;humans languages have recursion; this idea is easy to demonstrate&#8221;. You could say (arguably) that this chain is purely sequential and not recursively embedded. However, it was thought that you could always embed other people&#8217;s utterances within your own. For example, you could say, in any language, something like &#8220;Rob said that bananas are tasty yesterday&#8221;. It is often on the existence of this type of &#8216;reported speech&#8217; that the recursion claim is tested.</p>
<p>Far from the European Starling, the Matses have been living in the Peruvian and Brazilian Amazon for &#8230; well, no one really knows how long. They were living their traditional lives until the 1960s. As they mainly kept to non-navigable headwater regions deep in the forest, they were largely left alone, but throughout the 1900s they had increasingly been fighting the governments of each country. In the late 60s and early 70s they took amnesty, with most of the Matses population taking refuge at two missions, one each in Peru and Brazil. The rest of the Matses are still &#8216;uncontacted&#8217; although it&#8217;s not a very accurate word as they regularly bump into the &#8216;contacted&#8217; people of the region as each go about hunting trips (<a href="http://newswatch.nationalgeographic.com/2011/04/01/uncontacted-tribes-the-last-free-people-on-earth/">the well-known <em>National Geographic</em> photos might be Matses as it is the same valley</a>). In the 1980s the Matses left the missions and resettled along rivers in the region. Their language was studied for the first time during this period.</p>
<p>Cue David Fleck (one of my coauthors). He moved to the region and in 2003 he wrote a paper that mentioned, in passing, that Matses only had direct speech. Direct speech is simply when you quote someone (near) verbatim. Imagine that yesterday I told you: &#8220;I will go there tomorrow&#8221;. You could then quote me directly:</p>
<dl>
<dd>Rob said &#8220;I will go there tomorrow&#8221;</dd>
</dl>
<p>In English (and people have theorized, every language) you can also use &#8216;indirect speech&#8217;, and rephrase what I said from your own point of view:</p>
<dl>
<dd>Rob said he is coming here today.</dd>
</dl>
<p><div id="attachment_649" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/recording.jpg"><img src="http://www.junglelightspeed.com/files/recording-300x224.jpg" alt="Working" title="recording" width="300" height="224" class="size-medium wp-image-649" /></a><p class="wp-caption-text">Working with David (left) and Abraham (center) in our &#039;office&#039;. Note my collared shirt - I don&#039;t care where we are, people, this is still a work day!</p></div>With indirect speech you can alternate everything to your own personal, temporal, spatial and directional point of view. &#8220;I&#8221;->&#8221;he&#8221;, &#8220;will _&#8221;->&#8221;-ing&#8221;, &#8220;tomorrow&#8221;->&#8221;today&#8221;, &#8220;there&#8221;->&#8221;here&#8221;, and &#8220;go&#8221;->&#8221;come&#8221;. In this example, not a single word is actually repeated in the report. You do this all the time without even thinking about it. Indirect speech is considered to be a form of recursion, while direct speech is not. This is because for direct speech, the part you are repeating is a like a single unalterable unit. We can see this by manipulating the sentences a little. For example, you can paraphrase only in indirect speech:</p>
<dl>
<dd>Rob said that he would be rocking up around now</dd>
</dl>
<p>You can say this without people thinking that I really say silly expressions like &#8220;rock up&#8221;, but you could not rephrase it in the case of direct speech. For indirect speech, too, you can also &#8216;extract&#8217; some of the embedded sentence in ways that are not possible for direct speech. This is most obvious in questions:</p>
<dl>
<dd>Where did Rob say he was going tomorrow?</dd>
</dl>
<p>In linguistics, we usually treat the &#8216;where&#8217; as being extracted from the embedded sentence. This is not possible with direct speech:</p>
<dl>
<dd>Where did Rob say &#8220;I am going tomorrow?&#8221;</dd>
</dl>
<p>This doesn&#8217;t sound like it is from Rob&#8217;s point of view. It actually makes it sound like indirect speech that is talking about where <em>you</em> are going. In other contexts/configurations, it would sound completely ungrammatical. There are a few other well-documented differences, too, but they are more technical/linguistic, so I won&#8217;t going into them here.</p>
<p>Matses is one of the languages that does not generally allow one sentence to be embedded within another, so the test for whether it did not contain recursion relied on whether it really did only possess direct speech.</p>
<p><div id="attachment_650" class="wp-caption alignleft" style="width: 310px"><a href="http://www.junglelightspeed.com/files/matses_football.jpg"><img src="http://www.junglelightspeed.com/files/matses_football-300x225.jpg" alt="Estiron vs San Roce. Yes, it really is the world sport." title="matses_football" width="300" height="225" class="size-medium wp-image-650" /></a><p class="wp-caption-text">Estiron vs San Roce. Yes, it really is the world sport.</p></div>As a quick aside, it needs to be pointed out that the lack of sentence embedding has nothing to do with overall complexity. For example, Matses has an obligatory evidential system that marks each sentence for how the speaker came to know about an event. In English, we obligatorily mark verbs for tense: &#8220;I kick-ed&#8221;, &#8220;I am kick-ing&#8221;, &#8220;I will kick&#8221;. Not all languages require this, like Chinese, where you encode the time separate from the verb. In English, we code evidentiality separately. For example, you might say &#8220;I saw him kick the ball&#8221;, &#8220;I inferred from seeing the ball move that he kicked it&#8221;, &#8220;I guess he kicked the ball&#8221;, &#8220;someone told me he kicked the ball&#8221;. In Matses, these form part of the verb suffix. There is an obligatory suffix that expresses that the speaker either knows something from direct observation, by inference, or that it is conjecture (if you know because someone told you, you have to quote them). Put simply, you can&#8217;t create a grammatical sentence in Matses without first considering how you came to know about the information you are sharing. It&#8217;s actually a little more complicated than just that, as there are different suffixes for different evidential/tense combinations, and more complicated than English in this, as there are also different past tenses for recent past, distant past and remote past events. To borrow the suffix into English, you might say something like &#8220;he kick-ond the ball&#8221; (I saw him kick the ball some time ago) or &#8220;he kick-ak the ball&#8221; (I infer that he recently kicked the ball). If fact, it is even <em>more</em> complicated than that, too, but it took 10 pages to explain in the paper so I&#8217;m going to leave it out here. It is a unique and beautiful language, just one that doesn&#8217;t allow productive embedding.</p>
<p>Back to recursion and how it took me to the Amazon.</p>
<p>A researcher in Europe looks up from his paper (a fascinating treatise on bird-song structure) to see an email from David Fleck, inviting him to study the Matses. The researcher couldn&#8217;t make it, but it bounced through other people and the invitation found its way to Uli Sauerland. Uli was a visiting scholar at Stanford at the time and while he could not himself make it, due to other fieldwork commitments, his student back in Germany, Rainer Ludwig, was keen to go (the four of us rounding out the authors on the final paper). </p>
<p><div id="attachment_654" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/rainer.jpg"><img src="http://www.junglelightspeed.com/files/rainer-300x225.jpg" alt="Rainer and Daniel" title="rainer" width="300" height="225" class="size-medium wp-image-654" /></a><p class="wp-caption-text">Rainer (left) with Daniel, who was our most active language consultant and helped us run many of the experiments.</p></div>Uli sent out an email to Stanford&#8217;s Linguistics Department asking if anyone wanted to go in his place. I focussed carefully on writing a reply as quickly as possible, balancing out the need to make a solid case to be accepted, with wanting to be the first to reply (assuming this would count in a tie-breaker). It was my most nervous part of the whole project. After frantically reading and making many tiny edits to the email, I tentatively hit send, fingers crossed that I would be invited. I followed this up by making excuses to lurk around Uli&#8217;s office over the next few days. Eventually we talked and he agreed that I could go (to this day, I&#8217;m not sure if anyone else even expressed interest).</p>
<p>After stashing my bicycle in Lima (I followed my fieldwork by <a href="http://www.robertmunro.com/bike/peru/">cycling across the Andes</a>) I met Rainer in Iquitos, clutching a print-out of David&#8217;s instructions. David&#8217;s email was all in capital letters and was a long list of instructions for how to get to Estiron, the village he was based in: negotiating passage on a cargo plane from Iquitos, the nearest city, to Angamos, the nearest village where a small craft could land; finding someone with a dugout canoe; obtaining enough fuel for an attached motor; and how to first contact him via short-wave radio when close. (I keep meaning to ask David why the email was all-caps &#8211; he&#8217;s out of contact again right now). I kept the print out close for the week it took us to get there. ALmost everything had changed in the time since he sent it, but we were able to piece everything together.</p>
<p><div id="attachment_651" class="wp-caption alignleft" style="width: 310px"><a href="http://www.junglelightspeed.com/files/estiron1.jpg"><img src="http://www.junglelightspeed.com/files/estiron1-300x225.jpg" alt="Estiron" title="estiron" width="300" height="225" class="size-medium wp-image-651" /></a><p class="wp-caption-text">Estiron</p></div>Iquitos is the world&#8217;s largest city that cannot be reached by land. Sitting where two rivers join to form the Amazon, the only way in or out is by plane or boat. Some 4000kms from the ocean, the river is still almost too wide to see across. We spent several days here negotiating our way onto a boat-plane (including two days just sitting beside it waiting for the weather to clear). Short-wave radio contact with the village was complicated. The Matses only turned on the radio for an hour a day and because of the storms (I think) there was a lot of static. I can&#8217;t remember if it was in Iquitos or when we tried again in Angamos, but after several attempts and a lot of slow-talk-shouting that only received static in reply, we heard one sentence come through clearly &#8220;we are expecting you&#8221;. It had been more than a year since David had first sent the email, so with this small warning of our arrival and invitation we were on our way.</p>
<p>Angamos was more developed than I anticipated &#8211; it had a long paved walking path along a line of well-built houses. It was a border town, although I never saw the sister village which was down river a little way on the Brazilian side. After a few more days of collecting enough fuel from different villages, and a guide/boatman in Alleandro, we set off in our dugout for the 10 hours to Estiron. It was the rainy season and the rivers were flooded, so we kept near the shore to avoid the stronger currents. One small hole in the bow lapped water into the boat and we sat low, but it was sturdy enough. I was comfortable enough to nap, safe in the knowledge that Rainer was keeping everything afloat through sheer willpower. </p>
<p><div id="attachment_657" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/kid_parrot_monkey.jpg"><img src="http://www.junglelightspeed.com/files/kid_parrot_monkey-300x225.jpg" alt="Kid, parrot and monkey" title="kid_parrot_monkey" width="300" height="225" class="size-medium wp-image-657" /></a><p class="wp-caption-text">Kid, parrot and monkey in a huts doorway.</p></div>Eight days from when I left San Francisco, and six from when I arrived in the Amazon, we finally rocked up in Estiron, our main fieldsite. The village is nestled beside two bends of the Chobayacu Creek, with two rows of thatch huts near the water and several more scattered back into the forest. The area is surrounded by farms, but they are not visible from the water or village, and sometimes not even when you are in them &#8211; they are partially-cleared pockets of forest planted with a scattering of fruit and yams. Between the farms, fish, and a regular supply of bush-meat, no-one had to work very hard to stay healthy. A few found-objects aside, the huts were made according to traditional practices &#8211; wood and thatch from the forest. Most of the huts were large airy buildings low to the ground (we set up home and office in one) but a handful of smaller ones were on longer stilts several meters in the air. It was explained to us that this was a recent fashion for some of the younger Matses (I recently built a raised platform in my own sunroom &#8211; I think I get it). The football field and volleyball net were also more recent additions. </p>
<p>So did we discover whether or not Matses contained recursion? I think we did find an answer, but our paper leaves it to the reader, simply laying out our observations, examples and experimental methods. You are welcome to read it and decide for yourself!</p>
<p>Rob<br />
December 20, 2011</p>
<p>Acknowledgments: Thanks to the Characterizing Human Language by Structural Complexity (CHLaSC) Project, who funded much of the research, and <a href="http://people.ucsc.edu/~ardeal/Deal-indexicals.pdf">Amy Rose Deal of Harvard</a> for citing us (already!)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/published/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Open datanarchy</title>
		<link>http://www.junglelightspeed.com/open-datanarchy/</link>
		<comments>http://www.junglelightspeed.com/open-datanarchy/#comments</comments>
		<pubDate>Mon, 07 Nov 2011 03:39:55 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Crisis Response]]></category>
		<category><![CDATA[Crowdsourcing]]></category>
		<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=605</guid>
		<description><![CDATA[Open data is increasingly claimed to be &#8220;democratizing&#8221;. It is not clear to me where the &#8220;democracy&#8221; part is. If 99% of people decide to keep information private but 1% person disagrees, that 1% can still make that information publicly available. This is more like anarchy. There is a place for anarchy in the world [...]]]></description>
			<content:encoded><![CDATA[<p>Open data is increasingly claimed to be &#8220;democratizing&#8221;. It is not clear to me where the &#8220;democracy&#8221; part is. If 99% of people decide to keep information private but 1% person disagrees, that 1% can still make that information publicly available. This is more like anarchy. There is a place for anarchy in the world &#8211; it is freedom at its most extreme. I am an overwhelming proponent of open data, but the price of data freedom is data vigilance.</p>
<p>The phrase &#8220;democratizing data&#8221; came up more than once at the recent &#8220;<a href="http://blog.peoplebrowsr.com/blog/?p=1514" title="Big Open Data">Big Open Data Panel</a>&#8221; panel at PeopleBrowsr Labs and on the &#8220;<a href="http://www.crowdconf.com/">Philanthropy Panel</a>&#8221; at CrowdConf, where I was a panelist, and the &#8220;<a href="https://www.rightscon.org/">Silicon Valley Humans Rights Conference</a>&#8220;, where I was a regular participant. By &#8220;democratizing&#8221;, people simply meant &#8220;publishing online&#8221;, but by calling it &#8220;democratizing&#8221; it carries the implication of inherent good (and more divisively, that any opponent to publishing data is somehow non-democratic.) I question whether many people calling for open data really have the resources to also support the needed vigilance, or simply use the &#8220;democratizing&#8221; tag to absolve themselves from the consequences of publishing or republishing information.</p>
<p><div id="attachment_618" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/occupy_oakland.png"><img src="http://www.junglelightspeed.com/files/occupy_oakland-300x169.png" alt="Occupy Oakland" title="Occupy Oakland" width="300" height="169" class="size-medium wp-image-618" /></a><p class="wp-caption-text">Occupy Oakland - this is what democracy looks like? Or this is what it looks like when people are no longer sure they are in a democracy</p></div>Returning to the 1% who make data public when the 99% disagree, that 1%-person can take part in further open data sharing with the others among the 1%, and the 99% can simply opt out of that channel of information. I joined the Occupy Wall Street protests in Oakland following <a href="http://articles.sfgate.com/2011-10-28/news/30335416_1_protesters-canister-police-and-hundreds">the shooting of a former marine by Oakland Police</a>. I hadn&#8217;t been closely involved with Occupy Wall Street until that point, but the decision to turn weapons upon a peaceful protest (especially against someone who had served two tours of duty for their country) was too much to ignore. When I was there, I wondered how much the 99% had previously opted out of financial channels long ago? Did everything get too far gone (at least in part) because the 1% had become so egregious and removed that the 99% had let them operate unchecked for too long?</p>
<p>Perhaps the same is happening with open data online. Cisco predicts that <a href="http://socialtimes.com/cisco-predicts-that-90-of-all-internet-traffic-will-be-video-in-the-next-three-years_b82819">90% of all web traffic will be video in the next three years</a>. Let&#8217;s see who is democratizing it:</p>
<dl>
<dd><a href="http://blog.radvision.com/videooverenterprise/2010/08/03/voting-for-video-with-our-webcams/">&#8220;the democratization process of video is ChatRoulette&#8221;</a> Radvission.</p>
<dd><a href="http://www.thefastertimes.com/theweb/2010/03/12/chatroulette-breaks-fifth-wall-content-production/">&#8220;ChatRoulette represents a true breakdown and symbolic revolution of the relationship between content producers and consumers&#8221;</a> Faster Times.
</dl>
<p>ChatRoulette, for those who don&#8217;t know it, is a very simple idea: it randomly and anonymously connects you to other users via a web-cam and instant messaging. Have you seen ChatRoulette lately? This is not what democracy looks like. If you have not seen it before, check out this <a href="http://www.youtube.com/watch?v=JTwJetox_tU">old(ish) video of Merton in hooded top singing and playing piano to random people on ChatRoulette</a> and take my word for it that the current user-base is not about partially concealed pianists &#8211; a very specific 1% has taken control of this channel. Chatroulette, and internet video as a whole, did not &#8216;democratize&#8217; video &#8211; it became &#8216;voyeur takes all&#8217;.</p>
<p>The only people to openly admit to me recently that they used &#8220;democratizing data&#8221; in a less than noble way were advertisers, confessing that &#8220;democratizing data&#8221; (to them) mostly meant trying to coerce Facebook into make it easier for their start-up to scrape and sell data. The advertising community has been capitalizing on big data for some time (more than <a href="http://www.quora.com/What-percent-of-Googles-revenue-comes-from-search-advertising">95% of Google&#8217;s revenue</a> is advertising targeted via big-data analytics) and they seem to be ahead of the curve. For them, it is not about democracy but simple capitalism &#8211; personal gain through someone else&#8217;s data. Respect my privacy, and more power to you. </p>
<p>The more serious problems arise when data can be used to harm an individual. I have lost count of how many &#8220;open data&#8221; or &#8220;information sharing&#8221; technologies have been enthusiastically called a &#8220;Swiss Army Knife&#8221;, followed by a list of many positive use cases. A Swiss Army Knife can be used to harm you in many more ways than it can be used to help you &#8211; it is a weapon. With the proliferation of cellphones and information sharing tools like Drupal, Twitter, Ushahidi and WordPress, anybody with a little technical knowledge can share masses of data. But a little knowledge is a dangerous thing, and the ease of use of many of these platforms means that we are sending people off to battle with inadequate weapons training.</p>
<p>I have also lost count of how many people have come to me over the last year asking for help with a real or planned map to document a crisis that they were passionate about. It has been from people world-wide, but not a democratic mix. People who launch crisis-maps are overwhelmingly the same demographic as those on Chatroulette: excited young men with an internet connection. The deployments might serve and connect some of the least resourced people in the world, but they are not being curated by them. I try to give the same advice in all cases where there is a real element of physical danger: constantly review all your data in light of changing conditions; remove anything that is dangerous or irrelevant; and if you do not have the resources to constantly monitor and reevaluate what you have already published, discontinue your service. The overwhelming majority listen, and most of them decide that they cannot meet this requirement, instead serving their communities in more direct ways.</p>
<p><div id="attachment_619" class="wp-caption alignleft" style="width: 310px"><a href="http://www.junglelightspeed.com/files/bigdata_revolution.jpg"><img src="http://www.junglelightspeed.com/files/bigdata_revolution-300x164.jpg" alt="Big Data Revolution" title=""Big Data Revolution" width="300" height="164" class="size-medium wp-image-619" /></a><p class="wp-caption-text">Left: the 1% of the revolution - open victory on tank. Right: the 99% of the revolution - a family huddled in the dark, trying to determine if the gunshots are coming closer. The only guarantee that the 99% have from open data is to shine a spotlight on them - would you?</p></div>It is common for someone to be taken from their home and killed in a conflict and for the cause to never be known. The recent to bloggers in Mexico, whose bodies were deliberately displayed with their social media handles, are the exception. We have to assume that contributing to social media leads to targeted deaths much more frequently. </p>
<p>The victims in Mexico knew that they were taking calculated risks. Open data means that someone could contribute to an open platform without even realizing it &#8211; someone else could take their words/reports and add this to an open platform, making them oblivious collaborators. Connecting with open data is uncertain &#8211; it can bring help or it can bring enemies. There is only one guarantee in publishing your information to open, social media in a conflict situation: it shines a splotlight on you. If you choose to publish information from/about someone in a conflict zone, you are shining a spotlight on them too. Republishing simply makes that spotlight brighter. The 1% of the revolution is a celebration on a now-still tank. The 99% of the revolution is huddling in the dark with your family close, trying to determine if the gunshots are coming closer. </p>
<p>Oblivious collaboration exists everywhere. There is no doubt that I am an oblivious collaborator with the advertising agencies mentioned above, looking to increase their market share having scraped information from my Twitter account or this site. I don&#8217;t care much about this context.</p>
<p>I am leading the construction of the largest humanitarian open data project to date &#8211; <a href="http://strataconf.com/stratany2011/public/schedule/detail/21499">EpidemicIQ is currently processing about 1 billion data points per day</a>, almost all of which are from open data. We do not yet republish open data &#8211; it is the struggle of coming to terms with the complexities of open humanitarian data at this scale that led me to write this article. </p>
<p>Take one example report: &#8220;a young girl from village X was treated for Y&#8221;. It is anonymous to me. If it is published openly, but only in medical circles, then she remains anonymous. If we republish this somewhere that people in village X will read, it might not be &#8211; perhaps only one young girl from the village was hospitalized at that time, so they will know who she is. Should we republish? What if the people in village X have been known to harm people with disease Y, because of a mixture of fear of disease and traditional beliefs? I have seen all these factors line up more than once. At the recent <a href="http://strataconf.com/stratany2011">Strata big data conference in New York</a>, a wealthy CEO insinuated that people were cowards for not republishing aggregated open data for fear of the legal implications. I don&#8217;t fear lawyers. I don&#8217;t fear billions of data points. I consistently worry about balancing the need to share information with the privacy and well-being of this girl, and many like her who are now oblivious collaborators in a global outbreak monitoring system.</p>
<p>Oblivious collaborators in conflict situations are a greater concern. This is not a fringe problem &#8211; 30% of the world (about 2 billion people) live a conflict zone or a transitional situation. For obvious reasons, these are the most recent people to join the connected world meaning that the <i>least</i> experienced populations now accessing social media are also the most vulnerable. We saw this with the recent <a href="http://blog.standbytaskforce.com/libya-crisis-map-report/">Libya Crisis-Map</a> that was commissioned by the United Nations Office for the Coordination of Humanitarian Affairs (UN OCHA) and initially implemented by the Stand By Task Force (SBTF), of which I&#8217;m a co-founder (full disclosure, I am also co-author of the <a href="https://docs.google.com/document/d/12meslH-Bo1WTnP3Y9dye-rsNJmHC6BzqRIFlLrzc3cE/edit?hl=en_US">SBTF Libya report</a> which I&#8217;ll be quoting). </p>
<p>The Libya Crisis Map aggregated information, primarily from traditional and social media, about the (then) mounting crisis in Libya, in order to support intelligence gathering by the UN in the leadup to their deployment. The feedback was positive:</p>
<dl>
<dd>&#8220;If you go back a couple of years, all of this information probably would have been available, but it would have been seen as noise coming at you in multiple formats &#8230; Libya Crisis Map has done an extraordinary job to aggregate all of this information.&#8221; Brendan McDonald, UN OCHA
</dl>
<p>But part-way through the deployment, UN OCHA decided to make the map public. This was a case of the 1% making a decision without the 99%. The people who submitted and structured the reports were not asked if they wanted to make the map public. The majority were not even informed. A compromise was reached where only partial and/or obfuscated data was published on the public-facing map. For fear of security, the public map still drove away the most important volunteers &#8211; those with knowledge of Libya. In their rush to show the world that they were using crowdsourcing technologies, the UN excluded and endangered the crowd.</p>
<p>The UN OCHA response to this in the <a href="https://docs.google.com/document/d/12meslH-Bo1WTnP3Y9dye-rsNJmHC6BzqRIFlLrzc3cE/edit?hl=en_US&#038;pli=1">Libya Crisis Map Report</a> was unrepentant:</p>
<dl>
<dd>&#8220;why not allow full text of tweets already available? &#8230; if it is already fully available on the web&#8221; Information Management Unit, UN OCHA, Libya Crisis Map Report.</p>
<dd>(re withholding/obfuscating information) &#8220;Bad instruction. All this became available on the web very quickly &#8230; belligerents know where camps and exit routes are, there is no security risk from this appearing on one more site on the web.&#8221; Information Management Unit, UN OCHA, Libya Crisis Map Report.
</dl>
<p>I don&#8217;t think it is productive to be so absolutist about something we know so little about, especially in big data&#8217;s <i>first</i> public use in a conflict setting. There are two clear reasons why publishing all information is dangerous:</p>
<p>1) You are showing your hand. Let&#8217;s say the bad guys know all the details that you do, and many more. If you have missed somewthing, they now know that you don&#8217;t know: they know where to target. </p>
<p>2) You are creating oblivious collaborators. It is one thing for someone to tweet &#8220;there are many gunshots here, I wonder why&#8221;, but it is another for someone to aggregate this with reports of violence in the same area, using this tweet as further evidence, and publishing both together next to a logo of an organization considered to be an enemy. (This actually happened, but I&#8217;ve deliberately changed the wording). From the analogy above, you are turning the spotlight on that person up to 100. (Unless they don&#8217;t know about it, which is more like keeping them in the dark while giving all soldiers night-vision goggles).</p>
<p>There are two more reasons, both of which come from being in a newly connected world:</p>
<p>1) Not all bad guys will otherwise be resourced to collect data from disparate sources. Even if the information is open, if it is spread across dozens or 100s of information points across the web, it takes a sizable operation to collect this information. Some bad guys belong to complex large networks that might be able to scrape and parse all this information. Most are just opportunistic but they might now have an internet connection. Publishing aggregate, structured data weaponizes everybody.</p>
<p>2) Information can be open and describe an entire region in fine detail before <i>any</i> one person on the ground knows the full extent. Previously, conditions would change much quicker on the ground than the reports that made it through to aid agencies and, yes, the bad guys on the ground very often knew about the changes before the aid agencies. But big open data can, and often is, ahead of the curve of any one individual, or any one organization. In the global disease outbreaks that we track, this is the norm, not the exception. You can get ahead of the bad guys on the ground for the first time. This is one of the most positive aspects of big open data (in parsing, if not republishing) &#8211; do not give away your advantage so quickly.</p>
<p>The most frequent response to these kinds of arguments made it into the report:</p>
<dl>
<dd>&#8220;If we can&#8217;t handle the info publicly, it&#8217;s off, we lack adequate security to handle confidential info reported&#8221; Information Management Unit, UN OCHA, Libya Crisis Map Report.
</dl>
<p>It is impossible to predict what information will become sensitive. A report that obliquely mentions doctors in a secure refugee camp is harmless, right up until that camp is later raided and the most educated witnesses are deliberately targeted (this has also happened, but again I&#8217;ve deliberately changed the details a little). To avoid any possibility of security implications is to collect no information. Any information that is held, whether privately or publicly, needs to be constantly reviewed according to a changing environment. There is no way around this. You need to collect data. You need to have the resources to continually review it.</p>
<p>The second most frequent response to these kinds of arguments also made it into the report:</p>
<dl>
<dd>&#8220;the personal responsibility [is] incumbent on the info sender.&#8221; Information Management Unit, UN OCHA, Libya Crisis Map Report.
</dl>
<p>In conflict situations, I think it is rare that someone caught up in middle has a complete picture of their security situation. If you choose to publish aggregated information (regardless of your organization) then your act of publication is asserting a position of power and knowledge. That places at least some responsibility onto you. </p>
<p>If the security <i>is</i> wholly on the reporter, then it falls on the reporter to remove/edit any reports that they have contributed. If they lose communications (or are unreachable for unknown reasons) or may not have known that security was their responsibility, then the responsibility must still fall back on the publishers &#8211; the exact same situation.</p>
<p>For oblivious collaborators over open data, this will also put limits on how much data you can store, as you will need to maintain the manpower and/or technology to continually review all existing data. So just what are the limits on how much already-open data can be stored? I can give one answer:</p>
<dl>
<dd>29.
</dl>
<p><div id="attachment_622" class="wp-caption alignright" style="width: 170px"><a href="http://www.junglelightspeed.com/files/twitter_29.jpg"><img src="http://www.junglelightspeed.com/files/twitter_29.jpg" alt="Twitter 29" title="twitter_29" width="160" height="160" class="size-full wp-image-622" /></a><p class="wp-caption-text">29 tweets: your ethical upper limit on the number of tweets to republish from free open data.</p></div>That&#8217;s the maximum number of tweets that anyone should ever republish from free open data if security is the responsibility of the reporter. It is not exactly &#8220;big data&#8221;. The math is simple. Let&#8217;s assume that the person who tweeted &#8220;there are a lot of gunshots here&#8221; decided to delete their tweet &#8211; if security is their responsibility alone, then the republishers have the responsibility to also remove it. Let&#8217;s also say that an acceptable latency in deletion is five minutes and that you have an <a href="https://dev.twitter.com/docs/rate-limiting">OAuth key that allows you the maximum 350 free API calls per hour on Twitter</a>. You will need to check every existing tweet for deletion via the API every five minutes: (350/60)*5 = 29.16. As soon as you have stored your 30th tweet, you will no longer be able to check for deletions every 5 minutes without hitting the API limit.</p>
<p>You could pay to increase the limits on Twitter, but this is no longer free open data. Or you could simply not honor people&#8217;s wish for their tweet to be deleted, possibly endangering them (for reasons that may only be apparent to them), but this is falling short in data vigilance. So if you want to put the security in the hands of the reporters, leveraging only free open data, then that is your ethical upper-limit for Twitter: 29 data points.</p>
<p>I don&#8217;t want to be too harsh on the individuals in UN OCHA (or anyone entering the big data for the first time &#8211; we are all new), and I greatly appreciate the willingness to discuss these points publicly. But we need to be critical on the idea that there are simple rules for collecting and publishing data that absolve us of responsibility once the data is out there. </p>
<p>By living most of my adult life outside my homeland, I have helped monitor elections more times than I have been permitted a vote. I would love to say that being at the forefront of big open data means taking part in democracy, but this simply isn&#8217;t the case. In big open data, I am the excited 1% trying to meet my obligations to the 99%. Open data is a form of freedom that can help liberate us from disease and oppression, but it is not a democratic freedom &#8211; it is extreme and potentially dangerous &#8211; we need to always keep watch.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/open-datanarchy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Squanto</title>
		<link>http://www.junglelightspeed.com/squanto/</link>
		<comments>http://www.junglelightspeed.com/squanto/#comments</comments>
		<pubDate>Tue, 11 Oct 2011 06:14:07 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Translation]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=539</guid>
		<description><![CDATA[It always feels odd to take a long trip only to find English-speakers in the remotest places. It is relatively easy to do, of course, with the recent spread of the language, to the point that it is sometimes too easy to be lazy and expect it. Imagine if people had always done this? Imagine [...]]]></description>
			<content:encoded><![CDATA[<p>It always feels odd to take a long trip only to find English-speakers in the remotest places. It is relatively easy to do, of course, with the recent spread of the language, to the point that it is sometimes too easy to be lazy and expect it. Imagine if people had always done this? Imagine if the pilgrims had sailed the Mayflower across the Atlantic to land in what is now the USA, and expected to find a waiting village that was empty of all but one English-speaker, ready to translate.</p>
<p>They did. Squanto, the last member of the Patuxet tribe at what is now Plymouth. It is not that he idly picked up English from a few random visitors &#8211; he had lived in London. Twice. He was first taken from his home in 1614 by slave traders but freed by friars in Southern Spain. From Spain, he traveled across Europe at the time that Galileo was first demonstrating his telescope. He arrived in London, living there during the height the Scientific Revolution. He was not the only Native American in London at the time, or even the most well-known: Pocahontas was in London at that time too, and was also trying to find passage home, only to die of smallpox as she was preparing her trip back (her son would make it). He would have also been in London during the final years of Shakespeare&#8217;s life. The world seemed bigger back then, but perhaps not as big as we sometimes imagine.</p>
<p>Squanto&#8217;s first trip back to America took him to Newfoundland (in what is now Canada), but it proved impossible to organize a trip home down the coast. He returned all the way to London. He eventually made it back in 1619, just over 5 years after he was kidnapped. He is thought to have missed the decimation of his village by disease by just one year. </p>
<p>When the pilgrims arrived a year later it must have been strange for him to see his empty village repopulated by people from the land that he had worked so hard to leave, and then to watch half of them quickly die of disease. Perhaps it was watching them suffer in the same place as his own people must have suffered, and in much the same way, that gave him the compassion to help. He taught the pilgrims to farm and fish, and acted as translator. </p>
<p><div id="attachment_602" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/shipbuilding.jpg"><img src="http://www.junglelightspeed.com/files/shipbuilding-300x194.jpg" alt="Shipbuilding" title="shipbuilding" width="300" height="194" class="size-medium wp-image-602" /></a><p class="wp-caption-text">The only pictures I could find of Squanto were of him planting maize for others, translating, or otherwise serving someone. Here is a picture of him (or someone like him) building the most modern technology of its day.</p></div>To complete his mastery of the pilgrim&#8217;s needed skills, Squanto probably knew more about the structure of the Mayflower itself than anybody else among them &#8211; he had funded his own passage back by working as a ship-builder. I don&#8217;t want to take away from the importance pursing religious freedom (which isn&#8217;t exactly what the pilgrims were doing, but close enough). It just sounds far more rare that there was a fortuitous person with the right linguistic and agricultural knowledge who had also traveled across Galileo&#8217;s Europe and lived in Shakespeare&#8217;s London. Thanksgiving toasts each year should be thanking him above all others.</p>
<p>Is it a coincidence that the most famous early European settlement in America relied on an English-speaker who knew the land and people? No. The majority of the European settlements in America at that time did not survive. People mostly starved, died of disease or returned. The reason we know about it at all was because he was there. It is not that there were only a few settlements &#8211; there were many. We only remember this surviving one <em>because</em> they had the English speaker that allowed them to survive and tell their story. </p>
<p>It is the same when I look back and marvel at how many English-speakers I have met in remote places. It is not that there were so many &#8211; just that I remembered the people I spoke to more than I remembered the countless others I passed who did not. I don&#8217;t think people will start toasting Squanto at thanksgiving, but I&#8217;ll remember my &#8216;Squanto fallacy&#8217; the next time that I am getting away with speaking some language far from where it started.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/squanto/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Start of the Art</title>
		<link>http://www.junglelightspeed.com/the-start-of-the-art/</link>
		<comments>http://www.junglelightspeed.com/the-start-of-the-art/#comments</comments>
		<pubDate>Mon, 22 Aug 2011 17:41:06 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Crowdsourcing]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Microtasking]]></category>
		<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=585</guid>
		<description><![CDATA[Workshop Summary: Crowdsourcing Technologies for Language and Cognition Studies (This post is also published as the introductory paper to the Workshop on Crowdsourcing Technologies for Language and Cognition Studies that Hal Tily and I organized.) More than a million workers currently login to crowdsourcing/microtasking platforms to complete short tasks for pay-per-task compensation. The platforms were [...]]]></description>
			<content:encoded><![CDATA[<h3>Workshop Summary: Crowdsourcing Technologies for Language and Cognition Studies</h3>
<p><em>(This post is also published as the <A href="http://www.crowdscientist.com/wp-content/uploads/2011/08/start_of_the_art.pdf" target="_blank">introductory paper</a> to the <A href="http://www.crowdscientist.com/workshop">Workshop on Crowdsourcing Technologies for Language and Cognition Studies</A> that Hal Tily and I organized.)</em></p>
<p>More than a million workers currently login to crowdsourcing/microtasking platforms to complete short tasks for pay-per-task compensation. The platforms were originally developed to allow companies to outsource work but are now being productively used for research. On July 27th, 2011, language and cognition researchers came together for a workshop devoted to crowdsourcing technologies for language and cognition studies. While language and cognition researchers have been running some of the most varied and sophisticated crowdsourcing tasks since the earliest days of the platforms, this was the first time that researchers had come together for a workshop dedicated wholly to crowdsourcing technologies as a tool for empirical studies.</p>
<div id="attachment_587" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/language_learning.jpg"><img class="size-medium wp-image-587" title="language_learning" src="http://www.junglelightspeed.com/files/language_learning-300x209.jpg" alt="Language Learning" width="300" height="209" /></a><p class="wp-caption-text">Screenshot from an artificial language learning task, where the participants view an action via the video and hear/see the sentence describing that action (Jaeger et al., 2011).</p></div>
<p>The workshop was run in conjunction with the 2011 LSA Institute at the University of Boulder and it combined presentations by researchers using crowdsourcing technologies with tutorials for those wanting to learn more about them. This paper summarizes the outcomes of the workshop. The tutorial itself is not covered here, but the participants from the tutorials were as active as the presenters in the broader discussions and so this paper draws from all participants, with thanks to everyone who attended the workshop and contributed to its success.</p>
<p>&nbsp;</p>
<h3>Discussions</h3>
<p>Language processing was one of first large-scale uses of crowdsourcing technologies (Biewald, 2011). Shortly after Amazon Mechanical Turk (AMT) started to allow third parties to post tasks in 2007, a tech start-up in San Francisco, Powerset, began using AMT to create training data for semantic indexing and relevancy judgments for its natural language search system. Spearheaded by Biewald, these natural language evaluation and annotation tasks made Powerset the single biggest requester on AMT for more than a year. Innovation in crowdsourcing for language processing has moved in several directions since then. From this same start in crowdsourcing technologies for language processing, computational linguists were soon using crowdsourcing technologies for natural language processing (Snow et al., 2008) and research (Munro et al., 2010), followed by innovative work in annotation (Hseh et al., 2009), translation (Callison-Burch, 2009), transcription (Marge et al. 2010) and direct experiments (Gibson &amp; Fedorenko, to appear; Schnoebelen &amp; Kuperman, 2010).</p>
<p>In 2009, AMT overhauled its online interface to allow batch processing from CSV files (it previously only supported batch processing from the command line arguments.) This was a turning point for research, opening up the potential for non-programming researchers to conduct large-scale studies. While AMT is still the preferred choice of platform for researchers, many participants at the workshop were surprised to learn that it is a very small part of the overall crowdsourcing market (perhaps less than 10%). Currently, the biggest platforms are now where people are working for virtual currency inside of games. Rather than being paid a few cents per task to working on AMT, it is just as likely that someone is being paid right now in virtual seeds within an online farming game.</p>
<div id="attachment_588" class="wp-caption alignleft" style="width: 310px"><a href="http://www.junglelightspeed.com/files/maze.jpg"><img class="size-medium wp-image-588" title="maze" src="http://www.junglelightspeed.com/files/maze-300x184.jpg" alt="Maze" width="300" height="184" /></a><p class="wp-caption-text">Screenshot from an interactive maze-like game, where participants coordinated with each other via an online chat to complete a card-collection task (Clausen &amp; Potts, 2011).</p></div>
<p>Crowdsourcing/microtasking technologies are often known as ‘human computing’ or ‘artificial artificial intelligence’. This is because the distributed online workforces are accessed much like an online computer service: data is passed out to a distributed queue, processed, and returned. It was clear from the discussions that this description does not apply for experiments accessing linguistic judgments and language performance. To be more precise, the ‘computing’ and both ‘artificial’s do not apply, as we are eliciting the actual human intelligence of the crowdsourced participants. Research has the capacity to achieve something much more exciting than fast, affordable information processing – it can give us insight into the very nature of human communications, and by extension our neurolinguistic and sociolinguistic systems (Munro and Tily, 2011). Much of the discussion in the introduction and keynote focused on the differences between experimental research and large-scale information processing, and the implications for experimental design. Large-scale crowdsourcing has consistently found that breaking tasks up into small substasks is needed to optimize accuracy, such that this strategy is now more assumed than tested (Kittur et al. 2008, Ledlie et al 2010, Munro et al. 2010, Lawson et al. 2010, Paolacci et al. 2010). This was confirmed by the professional experience of the keynote speaker (Biewald, 2011). In fact, recent work is exploring metrics to indicate where simple tasks can be embedded within more complex, dynamic workflows (Kittur et al. 2011) without even considering the easier question of exploring where can we simply combine elements in single, larger tasks. Many of the workshop participants, and one of the presentations (de Marneffe and Potts, 2011) argued the opposite for language research, finding that workers did remain engaged for extended sets of questions/tasks, producing higher quality responses as a result. Unlike the scam click-throughs or robots that plague commercial crowdsourcing tasks, the researchers also noted the general high quality of their results. It was clear that the people undertaking the tasks were engaging with the research-focused tasks in a way that they were not engaging with commercial tasks.</p>
<p>&nbsp;</p>
<p>Several explanations were offered for why researchers were not experiencing the amount of scammers that industry sees. Biewald suggested that the amount of scamming is a step-function, that is, there is no scamming at all until a certain volume of tasks are available, and it is simply not worth the efforts of a potential scammer to try to write programs to automatically complete a task when it is low volume (researchers rarely seek more than 100s of responses, and sometimes much less, while 100,000s are common for commercial tasks). This effectively puts researchers under the radar of this one type of scamming strategy. A second suggestion was that it would be harder to fake. While it is more difficult to automatically detect aberrant responses in the types of open-ended questions or interaction tasks that are common to linguistic experiments as there is no ‘right’ answer to gauge someone’s performance against, the flip-side of this is that it is much harder to disguise fake responses when the response requires writing a sentence as opposed to selecting a multiple-choice question. For an interaction task, faking ‘being human’ is almost impossible, and so this might also discourage people from trying to scam these kinds of tasks. A third reason was more straightforward: linguistic experiments are fun. The motivations for why people undertake work on microtasking platforms are varied and complex (and largely limited to AMT) (Kaufmann et al., 2011). While money ranks highest for AMT, there is no majority reason and ‘fun’ is also very common. Experiments are often framed as the type of games and puzzles that people might play for free online, and it is easy to imagine that this is a motivator in itself. For people receiving virtual payment as part of a game we can assume that money is even less of a motivation. Another motivation might be that people like to contribute to science, rather than simply cutting the costs of some large business. Finally, the fact the some researchers pay above market wages will no doubt also be a good motivator for someone to pay attention when responding.</p>
<div id="attachment_589" class="wp-caption alignright" style="width: 245px"><a href="http://www.junglelightspeed.com/files/spatial.jpg"><img class="size-full wp-image-589" title="spatial" src="http://www.junglelightspeed.com/files/spatial.jpg" alt="Spatial" width="235" height="278" /></a><p class="wp-caption-text">Screenshot from a task that exploited the ‘requester’ and ‘worker’ roles on AMT, to see whether people’s interpretation of spatial indices like ‘left’ differed according to the assumed social roles of participants (Duran &amp; Dale, 2011).</p></div>
<p>The complexities of payment (ethics in particular) were discussed throughout the workshop. Many labs pay workers above market-wages (which are otherwise often only a few dollars an hour at best) either by choice or to meet IRB requirements. It was especially interesting to compare notes on this. Relative to the cost of hosting a lab experiment, paying higher salaries to online workers is often still a very big saving, especially in the case of shorter tasks, and if anything leads to quicker response times. The most common payment adjustment method that people used within AMT was to calculate the actual time spent through the returned metadata, and then pay the appropriate difference in wage through the built-in ‘bonus’ system. There were no dissenting voices to this approach, but participants remained concerned about how the anonymity of the worker on many platforms could still mean that it harbored an exploitative working environment. For example, the worker, even when ostensibly getting a fair wage, could still be a minor or someone coerced into giving their payment to a third party. The ability to tap online gamers, or workers from within reputable organizations, were both seen as positive future directions in this regard.</p>
<p>&nbsp;</p>
<p>Overall, what seem to impress people the most (conference organizers included) was the great breadth of research that is now being carried out on crowdsouring platforms. The variety of linguistics within the workshop presentations was among the greatest that we have seen at any language or cognition workshop this year, ranging from a fine-grained distinctions in logical metonymy (Zarcone &amp; Pado, 2011) to the interaction of human and machine topic-identification workflows (Satinoff &amp; Boyd-Graber, 2011). The sheer inventiveness of the task designs were equally impressive, including images, sound, videos generated with artificial languages (Jaeger et al., 2011), and at the most complex full interactive games with instant-message chats (Clausen &amp; Potts, 2011). The nature of microtasking platforms themselves was explored in a number of the presentations, including bonus payment-strategies to ensure a high retention rate of workers between tasks (Watts &amp; Jaeger, 2011). The inherent paradigmatic biases of AMT as a experimental platform were part of many presentations, too, especially the need to model and test for any potential biases in the experimental design (Anand, Andrews &amp; Wagers, 2011). In one interesting case, the researchers deliberately exploited the ‘requester’/’worker’ roles to simulate specific social conditions of tasks, taking advantage of the perceived power-bias for a deliberate experimental effect (Duran and Dale, 2011).</p>
<h3>Conclusions</h3>
<p>The sophistication of the tasks and evaluation methods that researchers are employing on crowdsourcing platforms are already an order of magnitude more sophisticated than the tasks run by commercial organizations that simply focus on throughput and ‘gold’ accuracy. The use of crowdsourcing platforms is also increasing at such a rate that crowdsourcing will soon become the single most common tool for empirical language and cognition studies: from discussions, it was clear that in some institutions it already has.</p>
<p>Despite the rapid increase in the sophistication and scale, perhaps the greatest change we are seeing is the number and nature of the researchers who are running experiments with very little overhead. Until now, a typical researcher would be about 10 years into their career before they could receive a grant to be the principal investigator for an empirical study with 100 or so participants. In this workshop, many participants learned about crowdsourcing in the morning and were able to generate experimental results by the close of day (in one case, even presenting their first analysis (Harcroft, 2011)). With any researcher now able to run experiments quickly and cheaply, anybody can be a principal investigator. The lowered barrier has also resulted in novel empirical research from fields like formal semantics and theoretical syntax: subfields with very little prior experimental research (experiments from both were presented in this workshop). Just as all researchers currently learn how to internally analyze language to test and generate hypotheses, it looks like an increasing number of researchers will soon be doing the same through direct experimentation. This makes for a very bright future for empirical language and cognition studies, and for crowdsourcing technologies as a whole.</p>
<p>Rob Munro and Hal Tily, August 2011</p>
<h3>References</h3>
<p>Anand, Pranav, Caroline Andrews and Matt Wagers. (2011). Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature. <em>Workshop on Crowdsourcing Technologies for Language and Cognition Studies</em>. Boulder, Colorado.</p>
<p>Biewald, Lukas. (2011). Keynote. <em>Workshop on Crowdsourcing Technologies for Language and Cognition Studies</em>. Boulder, Colorado.</p>
<p>Callison-Burch, Chris (2009). Fast, cheap, and creative: evaluating translation quality using Amazon’s Mechanical Turk. In <em>EMNLP ’09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing</em>.</p>
<p>Clausen, David and Chris Potts. (2011). Collecting task-oriented dialogues. <em>Workshop on Crowdsourcing Technologies for Language and Cognition Studies</em>. Boulder, Colorado.</p>
<p>Duran, Nicholas and Rick Dale. (2011). Creating illusory social connectivity in Amazon Mechanical Turk. <em>Workshop on Crowdsourcing Technologies for Language and Cognition Studies</em>. Boulder, Colorado.</p>
<p>Gibson, Edward. and Evelina Fedorenko. (to appear). The need for quantitative methods in syntax. <em>Language and Cognitive Processes</em>.</p>
<p>Harcroft, David. (2011). French Semantic Role Labeling: a pilot pilot study. <em>Workshop on Crowdsourcing Technologies for Language and Cognition Studies</em>. Boulder, Colorado.</p>
<p>Hsueh, Pei-Yun, Prem Melville and Vikas Sindhwani. (2009). Data quality from crowdsourcing: a study of annotation selection criteria. In <em>Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing</em>.</p>
<p>Jaeger, T. Florian, Harry Tily, Michael C. Frank, Jacqueline Gutman and Andrew Watts. (2011). A web-based (iterated) language learning paradigm with human participants. <em>Workshop on Crowdsourcing Technologies for Language and Cognition Studies</em>. Boulder, Colorado.</p>
<p>Kaufmann, Nicolas, Thimo Schulze, Daniel Veit. (2011). More than fun and money. Worker Motivation in Crowdsourcing – A Study on Mechanical Turk. <em>Proceedings of the Seventeenth Americas Conference on Information Systems</em>. Detroit.</p>
<p>Kittur, Aniket, Boris Smus and Robert E. Kraut. 2011. CrowdForge: Crowdsourcing Complex Work. <em>Technical Report</em>, School of Computer Science, Carnegie Mellon University. Pittsburgh, PA.</p>
<p>Kittur, Aniket, Ed H. Chi, and Bongwon Suh. (2008). Crowdsourcing user studies with Mechanical Turk. In <em>Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems</em> (CHI &#8217;08). ACM, New York, 453-456.</p>
<p>Lawson, Nolan, Kevin Eustice, Mike Perkowitz, and Meliha Yetisgen-Yildiz. (2010). Annotating large email datasets for named entity recognition with Mechanical Turk. In <em>Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon&#8217;s Mechanical Turk</em>. Los Angeles, CA.</p>
<p>Ledlie, Jonathan, Billy Odero, Einat Minkov, Imre Kiss, and Joseph Polifroni. (2010). Crowd translator: on building localized speech recognizers through micropayments. <em>SIGOPS Operating Systems Review</em> 43:4, 84-89</p>
<p>Marge, Matthew, Satanjeev Banerjee, and Alexander I. Rudnicky (2010). Using the Amazon Mechanical Turk for transcription of spoken language. In <em>Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing</em>.</p>
<p>de Marneffe, Marie-Catherine and Chris Potts. (2011). A case study in e?ectively crowdsourcing long tasks with novel categories. <em>Workshop on Crowdsourcing Technologies for Language and Cognition Studies</em>. Boulder, Colorado.</p>
<p>Munro, Robert, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai, Robin Melnick, Christopher Potts, Tyler Schnoebelen and Harry Tily. (2010). Crowdsourcing and language studies: the new generation of linguistic data. In <em>Proceedings NAACL-2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk</em>.</p>
<p>Munro, Robert and Hal Tily. (2011). The Start of the Art: An Introduction to Crowdsourcing Technologies for Language and Cognition Studies. <em>Workshop on Crowdsourcing Technologies for Language and Cognition Studies</em>. Boulder, Colorado.</p>
<p>Paolacci, Gabriele,  Jesse Chandler and Panagiotis G. Ipeirotis.  (2010). Running Experiments on Amazon Mechanical Turk. <em>Judgment and Decision Making</em>, 5:5, 411-419.</p>
<p>Satinoff, Brianna, and Jordan Boyd-Graber. (2011). Trivial Classification: What features do humans use for classification? <em>Workshop on Crowdsourcing Technologies for Language and Cognition Studies</em>. Boulder, Colorado.</p>
<p>Schnoebelen, Tyler, and Victor Kuperman (2010). Using Amazon Mechanical Turk for linguistic research: Fast, cheap, easy, and reliable. <em>PSIHOLOGIJA</em>, 43 (4), 441-464.</p>
<p>Snow, Rion, Brendan O’Conner, Dan Jurafsky, and Andrew Ng. (2008). Cheap and fast &#8211; but is it good?: evaluating non-expert annotations for natural language tasks. In <em>Proceedings of the Conference on Empirical Methods in Natural Language Processing</em>.</p>
<p>Sprouse, Jon (2011). A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. <em>Behavior Research Methods, 1-13</em>, Springer</p>
<p>Watts, Andrew and T. Florian Jaeger. (2011). Balancing experimental lists without sacrificing voluntary participation. <em>Workshop on Crowdsourcing Technologies for Language and Cognition Studies</em>. Boulder, Colorado.</p>
<p>Zarcone , Alessandra and Sebastian Padó. (2011). A crowdsourcing study of logical metonymy. <em>Workshop on Crowdsourcing Technologies for Language and Cognition Studies</em>. Boulder, Colorado.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/the-start-of-the-art/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Alphabetical order</title>
		<link>http://www.junglelightspeed.com/alphabetical-order/</link>
		<comments>http://www.junglelightspeed.com/alphabetical-order/#comments</comments>
		<pubDate>Fri, 10 Jun 2011 06:38:45 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Writing Systems]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=541</guid>
		<description><![CDATA[There has only ever been one alphabetical order. In any language. I only just learned that fact this week, after also only just realizing that &#8216;alphabet&#8217; comes from its first two letters &#8216;alpha&#8217; and &#8216;beta&#8217;.* I decided to learn more about it. Not all writing systems are alphabetic, Chinese being the most obvious with characters [...]]]></description>
			<content:encoded><![CDATA[<p>There has only ever been one alphabetical order. In any language. I only just learned that fact this week, after also only just realizing that &#8216;alphabet&#8217; comes from its first two letters &#8216;alpha&#8217; and &#8216;beta&#8217;.* I decided to learn more about it.</p>
<p>Not all writing systems are alphabetic, Chinese being the most obvious with characters as whole words, and most languages do not have a writing system at all, but I had always assumed that there was a different canonical order of letters in most alphabets and that this would naturally varied greatly over time. </p>
<div id="attachment_559" class="wp-caption alignleft" style="width: 610px"><a href="http://www.junglelightspeed.com/alphabetical-order/alphabetical_order_evolution/" rel="attachment wp-att-559"><img src="http://www.junglelightspeed.com/files/alphabetical_order_evolution.gif" alt="Evolution of the alphabet" title="alphabetical_order_evolution" width="600" class="size-full wp-image-559" /></a><p class="wp-caption-text">Evolution of the alphabet</p></div>
<p>Apparently not. It turns out that the order most likely comes from a combination of Phoenician and Ancient Hebrew and survived to become part of completely unrelated languages that would not exist for at least another millennium, like English.</p>
<p>Like the animation shows** the order hasn&#8217;t changed much at all. That, and &#8216;U&#8217; and &#8216;W&#8217; were clearly late hasty additions, adapted from &#8216;V&#8217; with little imagination.</p>
<p>I still can&#8217;t decide whether having only one alphabetical order for all of humanity is profound or mundane. A canonical order doesn&#8217;t make speaking any easier, or for that matter particularly effect writing. But then again, so many things beyond speech and writing rely on ordering: text from dictionaries to databases; the &#8216;Adam&#8217;s and &#8216;Anna&#8217;s that I&#8217;m always reminded off at the top of chat windows; the reason my pre-school teacher never let me leave first for lunch; and the complete arbitrariness of my students striving for &#8216;A&#8217;s.*** All thanks to some choices by scholars in the western Mediterranean some 3000 years ago.</p>
<p>Rob, 10 June 2011.</p>
<p>* With thanks to the solidarity from other linguists who confessed also not knowing this.<br />
**I couldn&#8217;t find the original creator of this animation &#8211; will give full credit if/when I do.<br />
*** Aardvarks are genuinely strange and unique animals that deserve their alphabetical prominence.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/alphabetical-order/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Power of One Beard</title>
		<link>http://www.junglelightspeed.com/the-power-of-one-beard/</link>
		<comments>http://www.junglelightspeed.com/the-power-of-one-beard/#comments</comments>
		<pubDate>Mon, 18 Apr 2011 09:40:46 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Sociolinguistics]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=507</guid>
		<description><![CDATA[On October 24, 2010 &#8220;beard&#8221; suddenly more than doubled in Google Trends popularity, after maintaining a relatively stable prior baseline. There are probably a billion or so beards in the world but this was due to just one, belonging to the SF Giant&#8217;s closer Brian Wilson. If you don&#8217;t follow baseball, a closer is the [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_511" class="wp-caption alignright" style="width: 410px"><a rel="attachment wp-att-511" href=http://www.junglelightspeed.com/files/beard_graph.jpg><img class="size-full wp-image-511" title="Comparing Google Trends for San Francisco Giants and Beard" src="http://www.junglelightspeed.com/files/beard_graph.jpg" alt="" width="400" /></a><p class="wp-caption-text">Google Trends for &quot;San Francisco Giants&quot; and &quot;Beard&quot; in 2010, log scale. The two orange peaks are San Francisco qualifying for and winning the World Series.</p></div>
<p>On October 24, 2010 &#8220;beard&#8221; suddenly more than doubled in <a href="http://www.google.com/trends?q=%22San+Francisco+giants%22%2C+beard&amp;ctab=0&amp;geo=us&amp;geor=all&amp;date=2010-10&amp;sort=0">Google Trends</a> popularity, after maintaining a relatively stable prior baseline. There are probably a billion or so beards in the world but this was due to just one, belonging to the SF Giant&#8217;s closer <a href="http://brianwilson38.com/">Brian Wilson</a>.</p>
<p>If you don&#8217;t follow baseball, a closer is the <a href="http://en.wikipedia.org/wiki/Closer_%28baseball%29">closing pitcher</a> who rarely comes in until the last (9th) innings and even then not in every game. Yet despite his limited field time and the fact that having a beard is not particularly odd, this one beard pretty much dominated the trending of the term leading up to and during the world series (see graph). As for exactly why this coal-miner-like beard grabbed everyone&#8217;s attention? Your guess is as good as mine. But it highlights nicely that a change in language does <i>not</i> necessarily map directly to a change in culture: the increase in global beards in October 2010 was infinitesimal while the increase in online reporting and discussion more than doubled. </p>
<p><div id="attachment_512" class="wp-caption alignleft" style="width: 310px"><a rel="attachment wp-att-512" href="http://www.junglelightspeed.com/the-power-of-one-beard/brian_wilson_beard/"><img class="size-medium wp-image-512" title="brian_wilson_beard" src="http://www.junglelightspeed.com/files/brian_wilson_beard-300x280.jpg" alt="" width="270" /></a><p class="wp-caption-text">The power of one beard</p></div> In this case, the graph makes it easy to see the correlation. The tricky part of linguistics is when the external evidence is not so easy to identify &#8211; how easy would it be to mistakenly see the &#8220;beard&#8221; graph alone, and infer some greater cultural significance in ignorance of the actual factors? We simply can&#8217;t count the frequency of word usages in isolation and directly infer the nature of any cultural trends as one real-world instance among millions can easily carry half weight of all reported instances.</p>
<p>Of course, the discussion alone can spark the trend if not reflect it. I know one prominent bay-area computational linguist <a href="http://conspiracyofbeards.com/">conspiring to grow a beard</a> and I first thought to look at the trends after noticing so many at a game last week. But this post isn&#8217;t about beards, it&#8217;s about language and you could probably argue that the trend <i>does</i> reflect a cultural trend. It&#8217;s just that it is a trend of celebrity rather than social change.</p>
<p>Rob</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/the-power-of-one-beard/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microtasking as a real-time relief work</title>
		<link>http://www.junglelightspeed.com/disaster_relief_microtasking/</link>
		<comments>http://www.junglelightspeed.com/disaster_relief_microtasking/#comments</comments>
		<pubDate>Thu, 31 Mar 2011 23:39:13 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=491</guid>
		<description><![CDATA[This is a concurrent post of an invited article I wrote for UN Dispatch:http://www.undispatch.com/disaster-relief-2-0-microtasking-as-a-real-time-support-to-relief-workers-and-real-time-engagement-for-communities Relief workers frequently arrive at crisis-affected locations with little prior knowledge of the language, geography or community structures. How can communications be understood? Where are the locations of existing services? Who are the trusted community leaders? Many first-responders have expressed the [...]]]></description>
			<content:encoded><![CDATA[<p><em>This is a concurrent post of an invited article I wrote for UN Dispatch:<a href="http://www.undispatch.com/disaster-relief-2-0-microtasking-as-a-real-time-support-to-relief-workers-and-real-time-engagement-for-communities">http://www.undispatch.com/disaster-relief-2-0-microtasking-as-a-real-time-support-to-relief-workers-and-real-time-engagement-for-communities</a></em></p>
<p>Relief workers frequently arrive at crisis-affected  locations with  little prior knowledge of the language, geography or  community  structures. How can communications be understood? Where are  the  locations of existing services? Who are the  trusted community leaders?  Many first-responders have expressed the  same dream solution: an  ‘artificial intelligence’ device that can  translate any language and  answer specific queries.</p>
<p>While this sounds like fiction it is not that far away, but it is called ‘<em>artificial</em> artificial intelligence’ or more commonly ‘microtasking’. To a relief   worker the device might feel like artificial intelligence, but the   digital  answers and translations are really performed by scalable  workforces  collaborating on statistically disassembled and distributed   ‘micro-tasks’, and delivered in near-real-time.</p>
<p><div id="attachment_492" class="wp-caption alignright" style="width: 266px"><a rel="attachment wp-att-492" href="http://www.junglelightspeed.com/disaster_relief_microtasking/haiti_volunteer_locations/"><img class="size-large wp-image-492" title="Locations of Mission 4636 volunteers globally" src="http://www.junglelightspeed.com/files/haiti_volunteer_locations-1024x492.jpg" alt="" width="256" height="123" /></a><p class="wp-caption-text">Locations of Mission 4636 volunteers globally</p></div><br />
But where to find the workforce? If you have been  away from home at  the time of a tragedy then you understand the desire  to immediately  help. Simply donating to relief organizations is  incomparable to  plugging directly into the relief effort,  helping your community in  real-time. These are the right people to  mobilize.</p>
<p>In the wake of the 2010 earthquake in Haiti I  managed a microtasking  initiative called Mission 4636. 2,000  Kreyol-speakers from 49  countries translated, mapped and categorized  80,000 emergency  communications (about the length of 10 novels)  in real-time. The  majority of relief workers on the ground could not  understand a  sentence like “<em>Ti ekipman lopital fokal genyen yo paka minm fè 24 è</em>”, but any of our volunteers could immediately translate this to “<em>Fokal Hospital has less than 24  hours of supplies remaining</em>”.  Crucially, they also knew that  “fokal” was slang for “Fort  California”, identified it immediately on an  unlabelled map, and knew  who to call to verify the information. We took  this information  processing burden off the relief  workers within Haiti and injected the  crucial local knowledge where it  was needed most.</p>
<p>There are still hurdles to overcome before we make  this a larger,  sustainable practice (connectivity, workflows, security)  but we hope to  see microtasking become a mainstream strategy for large  scale  information processing in relief work.  Above all, distributed  strategies like microtasking expand the ways in  which crisis-affected  communities can help themselves – a broader goal  of all relief work.</p>
<p><em>With thanks to the other contributors to <a href="http://www.unfoundation.org/global-issues/technology/disaster-report.html">Disaster Relief 2.0: The Future of Information Sharing in Humanitarian Emergencies</a> and the organizations responsible for the report: the UN Office for the Coordination of Humanitarian Affairs (OCHA), the United Nations Foundation, Vodafone and the Harvard Humanitarian Initiative.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/disaster_relief_microtasking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The smallest signal</title>
		<link>http://www.junglelightspeed.com/the-smallest-signal/</link>
		<comments>http://www.junglelightspeed.com/the-smallest-signal/#comments</comments>
		<pubDate>Wed, 16 Feb 2011 03:21:08 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Social Networks]]></category>
		<category><![CDATA[Sociolinguistics]]></category>
		<category><![CDATA[Translation]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=455</guid>
		<description><![CDATA[What is the smallest meaningful signal of communication? In Egypt over the last few weeks we saw the protesters speak to the world thanks to a collaboration by Twitter, Google and SayNow, real-time transcriptions and translations by organizations like Meedan, and the uptime of certain phone networks, ISPs and social media like Facebook, all while [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_456" class="wp-caption alignright" style="width: 436px"><a rel="attachment wp-att-456" href="http://www.junglelightspeed.com/the-smallest-signal/missed_call/"><img class="size-full wp-image-456" title="missed_call" src="http://www.junglelightspeed.com/files/missed_call.jpg" alt="" width="426" height="299" /></a><p class="wp-caption-text">&quot;I am ok&quot;</p></div>
<p>What is the smallest meaningful signal of communication?</p>
<p>In Egypt over the last few weeks we saw the protesters speak to the world thanks to a collaboration by <a href="http://googleblog.blogspot.com/2011/01/some-weekend-work-that-will-hopefully.html">Twitter, Google and SayNow</a>, real-time transcriptions and translations by organizations like <a href="http://blog.meedan.net/2011/02/14/egyptian-hopes-for-a-better-tomorrow/">Meedan</a>, and the uptime of certain phone networks, ISPs and social media like <a href="http://www.huffingtonpost.com/2011/02/04/egypt-protesters-thank-you-facebook_n_818745.html">Facebook</a>, all while the government was doing its best to block all local access.</p>
<p>But one of the most important modes of digital communication was given no press at all: drop-dialing. That is, phoning someone and immediately hanging up once you hear their phone ringing. Protesters were drop-dialing their relatives elsewhere as a pre-arranged message that simply meant &#8220;<em>I am ok</em>&#8220;. Unlike a text-message its content cannot incriminate you, it continues to function even when an actual call would be dropped due to an overloaded network, and it doesn&#8217;t cost anything. Noone was more worried for people protesting in Tahrir Square than their own close family and friends. For them, a simple drop-dialed call was a more frequent assurance of their loved one&#8217;s well-being than communication via social-networking sites.</p>
<p>It is a form of remote communication that has a lot of names: &#8220;<em>drop-call&#8221;</em>, &#8220;<em>one-bell&#8221;</em>, <em>&#8220;buzz</em>&#8220;. Czech even has a dedicated word for it, &#8220;<em>prozvonit</em>&#8220;. In Sierra Leone, the one place I&#8217;ve used it regularly, it is called &#8220;<em>flashing</em>&#8220;. I learned it the day I arrived when a colleague complained that a man at her office kept &#8220;flashing her all day&#8221;, and I was left shocked and confused. During the 2007 elections we had the same prearranged signal as in Egypt &#8211; a &#8216;flash&#8217; to let our friends and families know we were ok &#8211; but fortunately neither the predicted violence nor phone network congestion came to near the feared levels.</p>
<p>Beyond this one narrow meaning there is an incredibly wide range of interpretations. Depending on the relationship between the two people, the drop-dial can be interpreted as:<em> &#8220;call me&#8221; </em>or <em>&#8220;call me, because I have no credit&#8221; , &#8220;where are you?&#8221;, &#8220;how are you?&#8221;, &#8220;I am interested in you&#8221;, &#8220;I am thinking of you&#8221;, &#8220;I have arrived&#8221;, &#8220;I am home&#8221;, &#8220;I have your (item)</em>&#8221; and more.  The meaning of <em>“call me”</em> is a widely established practice. For example, if one of two friends is currently employed and the other is not, the employed person is generally expected to make the calls between the two. The others are more subtle. The signal itself is the same in all communications but its use and meaning is determined by the interlocutors’ relationship(s) within existing social networks and recent interactions within those networks. The minimum, of course, is that two people share each other&#8217;s phone numbers so that the name is later displayed. Beyond that, the exact meaning can be an extension of any type of conversion within your extended social network, easily crossing literacy and language barriers. But while it is a free method for remote communication, it seems to be <em>only</em> used by people who are also regularly in face-to-face contact. You could equally drop-dial someone across the country as in the next office block to say &#8220;<em>I am thinking of you</em>&#8220;, but in all the places that drop-dialing is common I have never seen it used this way. Perhaps the ambiguity is too great if it is not interpretable within the shared conversational history of both people &#8211; it relies on whatever our current strong network links happen to be.</p>
<p>I never fully mastered it in Sierra Leone. Girls I didn&#8217;t know very well would flash me and I would call back only for them so say <em>&#8220;oh, you didn&#8217;t need to call, I was just saying hi&#8221;</em>, or I would ignore the same flash and be later chided for being rude. Work colleagues would flash and I would call back (it usually meant they had no phone credit) only to be standing behind me with a puzzled expression &#8220;<em>I was just telling you that I was arriving&#8221;.</em> Clearly, I did not grasp this one type of technology-aided communication very well.</p>
<p>Drop-dialing may well be one of the most wide-spread free methods for remote communication that has ever existed, but as far as I can tell it has not been studied as part of our social network &#8211; it continually slips beneath our radar. Telecommunication companies are already allowing free &#8220;call me&#8221; messages between phones and the changing nature of data plans for cellphones will eradicate it completely before too long, especially with free communication through services like <a href="http://techcrunch.com/2010/02/16/facebook-launches-zero-a-text-only-mobile-site-for-carriers/">Facebook Zero</a> becoming popular across Africa. The newer communications will be richer, but none will have so minimal and diverse a signal.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/the-smallest-signal/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Culturomics</title>
		<link>http://www.junglelightspeed.com/culturomics/</link>
		<comments>http://www.junglelightspeed.com/culturomics/#comments</comments>
		<pubDate>Fri, 21 Jan 2011 08:14:43 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Sociolinguistics]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=443</guid>
		<description><![CDATA[On a recent trip to Australia I was reminded of when the field of culturomics was born there some half a century ago. From 1954-1955 the seminal sociologist John Fischer and his wife were famously engaged in the study of child-rearing in a New England village (Fischer 1964). It is not so widely-known that the [...]]]></description>
			<content:encoded><![CDATA[<p>On a recent trip to Australia I was reminded of when the field of culturomics was born there some half a century ago. From 1954-1955 the seminal sociologist John Fischer and his wife were famously engaged in the study of child-rearing in a New England village (Fischer 1964). It is not so widely-known that the sociolinguist Martin Joos also accompanied them on this trip and that their study also took them to Australia and Great Britain. The team at Google Books recently discovered transcripts from this period where the two were talking (talkin&#8217;) about the relationship between language variation and linguistic universals:</p>
<h3>December 1955, Sydney, AUSTRALIA:</h3>
<p><em>Fischer:</em> Just as I observed in New England, the ‘-ing’ (/ŋ/) variant is the prestige variant and the ‘-in’ (/n/) variant more stigmatized here in Australia. In addition, the ‘-ing’ variant is also more common among female speakers. This seems to arise from the division between Standard Australian English (Standard AusE) which favors the ‘-ing’ more than the stigmatized Broad Australian English (Broad AusE).  This looks like the same case of ‘phonetic drift’ (Joos 1952) we noted in New England. Surely, this variation should be part of our linguistic models and not shunned from our science (Joos 1950).</p>
<p><em>Joos: </em>It is more than a coincidence that /ŋ/ is considered the standard form, and /n/ the non-standard variant in independently evolved English speech communities. As many researchers have no-doubt noted /ŋ/ is more marked than /n/ in English, as evidenced by the fact that only /n/ may be word-initial. Cross-linguistically, there is evidence that this results from linguistic universals. To my best knowledge, there are no languages that contain /ŋ/ but not /n/ in their phonetic inventory, but many that contain /n/ but not /ŋ/. It is straightforward to claim, therefore, that a linguistic universal exists where /ŋ/ is more marked than /n/. It seems that English speakers evaluate that alternating /ŋ/ with /n/ is non-standard (despite neither being inherently right or wrong) which leads to the characterization of the /n/ alternation as the stigmatized ‘lazy’ form in all cases.   This accounts for why we find the same variation (and, importantly, the same evaluation of the variation) in independent English speech communities. It may pattern probabilistically, but the key part of linguistics is the invariable markedness constraint.</p>
<p><em>Fischer: </em>But this universal falls short of a full explanation when we compare it to variation that we cannot similarly explain, such as variation involving a feature that is strictly local. For example, in New England the ‘ey’ variant for the article ‘a’ indexes formality more strongly than the ‘-ing’ variant, which is a subtle but important observation: not all variation is equal. Unlike the ‘-ing’/‘-in’ alternation, the use of ‘ey’ does not seem to be related to any phonetic or phonological universals, and it is not a variation that is commonly found in English more broadly. Here in Australian English it seems that the ‘ey’ variant simply does not exist, and if produced it would most likely be evaluated the product of a (non-English) accent, if noticed at all. Therefore, while the ‘ey’ variant more strongly indexes formality in New England English, the ‘-ing’ variant more strongly indexes formality in Australian English. This leaves very little room to appeal to linguistic universals in the variants, as the variation motivated by linguistic universals can be more or less strong than the local variant.</p>
<p><em>Joos:</em> It is not important that the local variant can be stronger than the one derived from universals. It simply means that knowledge of linguistic universals is difficult to derive from observing everyday speech, which is evidence against using this particular methodology. I believe a promising young linguist named Chomsky is currently working on a manuscript making arguments to this effect…</p>
<h3>February 1956, London, GREAT BRITAIN:</h3>
<p><em>Fischer: </em>To return to our earlier conversation, we cannot only consider universal factors when the local variation can reverse the universal trends. The ‘-in’ form seems to carry more prestige among the educated upper-classes here in Great Britain in words that express the social activities of the class such as ‘huntin’, ‘fishin’ and ‘shootin’.  This may well be the result of the upper-class seeking to distance itself from the lower classes through innovation, and is therefore still ‘phonetic drift’ (Joos 1950), but it stands as a counter-example to the markedness constraint. The result is that even if /ŋ/ is universally more marked than /n/ and the ranking is absolute according to universal linguistic constraints, it is nonetheless employed probabilistically. Do we banish the constraint from linguistics for having an observable but non-absolute effect among a minority of speech communities? It seems that speakers are free to take ‘universals’ and employ them like any other variant for constructing social meaning.</p>
<p><em>Joos:</em> It is likely that the /ŋ/-favoring British middle-classes provide enough of a buffer to prevent /n/ indexing the lower-classes, and this allows the /n/ variant to be evaluated very differently in the context of upper-class British (Received) English and the lower-class Cockney English, where it is also prevalent.   No such buffer exists in Australian English. The upper-class men, talking about ‘shootin’ and ‘huntin’, are an exception only because the speakers are deliberately flaunting what they instinctively know to be a universal markedness constraint (and only for a restricted vocabulary relating to upper-class activities) so this can be explained as an exception that proves the rule.</p>
<p><em>Fischer: </em>Certainly, but this fails to account for why we find the /n/ variant in both the British and Australian upper-classes, and the social meaning of employing the variant in each speech community. Among the upper-classes of both Australia and Great Britain, the men are much more likely than the women to use the /n/ variant than the /ŋ/ variant, but for opposing reasons. Upper-class Australian men are more likely to use the /n/ variant to express egalitarianism with Australian men of other socioeconomic groups, while upper-class British men are more are more likely to use the /n/ variant to distance themselves from British men of the lowest socioeconomic groups. For both, only men are able to use the /n/ variant without stigma, but for very different reasons. If an upper-class Australian woman used the /n/ variant, she would be indexing a lower socio-economic status or nouveau rich. However, if an upper-class British woman used the /n/ variant in ‘shootin’ or ‘huntin’ she would be indexing masculinity, but upper-class masculinity. Therefore, even when we find the same pattern of usage, the social meaning may differ.</p>
<p><em>Joos: </em>So are you claiming that there is no room for universals in such an analysis?</p>
<p><em>Fischer:</em> No, there is room for identifying linguistic universals, but identifying universals in language use is applying a label, not providing an explanation.</p>
<h3>References</h3>
<p>Fischer, John. 1964.  Social influence in the choice of a linguistic variant, in Word, 14.<br />
Joos, Martin. 1950. Description of language design. Journal of the Acoustical Society of America, 22<br />
Joos, Martin. 1952. The Five Clocks, New York: Harcourt</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/culturomics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

