<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jungle light speed</title>
	<atom:link href="http://www.junglelightspeed.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.junglelightspeed.com</link>
	<description>language and the desire to connect</description>
	<lastBuildDate>Tue, 16 Apr 2013 18:51:45 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5</generator>
		<item>
		<title>New funding for Idibon</title>
		<link>http://www.junglelightspeed.com/idibon_funding/</link>
		<comments>http://www.junglelightspeed.com/idibon_funding/#comments</comments>
		<pubDate>Tue, 16 Apr 2013 18:48:44 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=894</guid>
		<description><![CDATA[Cross-post from http://idibon.com/new-funding-for-idibon/ We’ve seen Idibon grow in leaps and bounds over the last six months, supporting language processing systems in more than a dozen languages. We have applied our expertise to everything from finance to health to education and we are grateful to everyone who has helped us bring more sophisticated language technologies to [...]]]></description>
				<content:encoded><![CDATA[<p><i>Cross-post from <a href="http://idibon.com/new-funding-for-idibon/">http://idibon.com/new-funding-for-idibon/</a></i></p>
<p>We’ve seen Idibon grow in leaps and bounds over the last six months, supporting language processing systems in more than a dozen languages. We have applied our expertise to everything from finance to health to education and we are grateful to everyone who has helped us bring more sophisticated language technologies to the connected world.</p>
<p>Today, I’m really happy to announce that Khosla Ventures are sharing in our vision by investing $1.4 million in Idibon. We extend our thanks to everyone at Khosla Ventures and are especially excited to be working with a firm that supports growing companies while making a sustained positive impact on the world. We’re also excited to add their CTO, Sven Strohband, to our board of directors. Like a number of us, Sven is a graduate of Stanford where he was the lead engineer for the Stanford Racing team’s self-driving “Stanley” vehicle, one of the few pieces of artificial intelligence now on display in the Smithsonian Museum. We are fortunate to draw on his machine-learning expertise in addition to his business acumen.</p>
<p>For our clients, this investment means that we will be bringing you even more accurate and efficient services. We greatly appreciate how you were willing to work with us when we were still getting going and we have enjoyed having the opportunity to develop our technology with your continued feedback. Whether we’re helping you manage your business or social media communications, our predictive analytics will continue to help increase your business intelligence.</p>
<p>There’s a lot on the horizon for Idibon. Language technologies are at the cutting edge of intelligent computing. It’s going to take some long hours for us to get there, but we are enthusiastic about meeting our objective of creating accurate, scalable language technologies that help people understand the world and each other.</p>
<p>- Robert Munro</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/idibon_funding/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Website update for Idibon</title>
		<link>http://www.junglelightspeed.com/new-website-for-idibon/</link>
		<comments>http://www.junglelightspeed.com/new-website-for-idibon/#comments</comments>
		<pubDate>Tue, 16 Apr 2013 01:35:43 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=890</guid>
		<description><![CDATA[I&#8217;m happy to announce a new website for my company, Idibon, which now lists my three talented cofounders and amazing advisory team: http://www.idibon.com/ More on who created the website at: http://idibon.com/launching-our-new-website/ Special shout out to Tyler Schnoebelen for pulling the new site together &#8211; we really like the result! More good news about Idibon coming [...]]]></description>
				<content:encoded><![CDATA[<p>I&#8217;m happy to announce a new website for my company, <a href="http://www.idibon.com">Idibon</a>, which now lists my three talented cofounders and amazing advisory team:</p>
<ul>
<li> <a href="http://www.idibon.com">http://www.idibon.com/</a>
   </ul>
<p>More on who created the website at:</p>
<ul>
<li> <a href="http://idibon.com/launching-our-new-website/">http://idibon.com/launching-our-new-website/</a>
   </ul>
<p>Special shout out to Tyler Schnoebelen for pulling the new site together &#8211; we really like the result!</p>
<p>More good news about Idibon coming soon!</p>
<p>Rob<br />
April 15, 2013</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/new-website-for-idibon/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adventures in online education</title>
		<link>http://www.junglelightspeed.com/adventures-in-online-education/</link>
		<comments>http://www.junglelightspeed.com/adventures-in-online-education/#comments</comments>
		<pubDate>Mon, 31 Dec 2012 20:42:46 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Education]]></category>
		<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=855</guid>
		<description><![CDATA[Over the past year I&#8217;ve helped create and run and few online courses. Some have reached tens of thousands of people, and some were aimed at only a dozen. Of everything I achieved in 2012 (finishing a PhD, starting a business, cycling across Alaska and Haiti), it was the standout for me as being the [...]]]></description>
				<content:encoded><![CDATA[<p>Over the past year I&#8217;ve helped create and run and few online courses. Some have reached tens of thousands of people, and some were aimed at only a dozen. Of everything I achieved in 2012 (finishing a PhD, starting a business, cycling across Alaska and Haiti), it was the standout for me as being the <em>least</em> likely source of a sense of achievement, so I&#8217;m sharing my thoughts on the experience.</p>
<p><iframe width="420" height="315" src="http://www.youtube.com/embed/FcUi6UEQh00" frameborder="0" allowfullscreen></iframe><br />
<em>Figure 1: Interpretive dance predicting the future of education.</em></p>
<p>In 2010 I ran into Stanford Professor Andrew Ng at a tech camp. He was an already famous academic in artificial intelligence circles, but was talking about a new focus on online education. He asked if I was interested in helping. I said yes, but I never followed up on the invitation. To be honest, I didn&#8217;t see it: why would the leading researcher in their field completely change their direction? Wouldn&#8217;t we have solved online education by now if it was a viable game changer? One year and <a href="http://engineering.stanford.edu/news/stanford-engineering-new-online-classes-hugely-popular-and-bursting-activity">more then 70,000 students registered</a> later, Andrew&#8217;s inaugural online Machine Learning class had proved the demand.</p>
<p>So when the opportunity came again in late 2011 I didn&#8217;t let it pass. By then, the production of <em>Massive Open Online Courses</em> (MOOCs) had spread (to neighboring offices at least) and my then PhD advisors Chris Manning and Dan Jurafsky were creating <a href="https://www.coursera.org/course/nlp">online courses on Natural Language Processing</a> and looking for help. I volunteered to help out my colleagues and to share our knowledge, but I also just wanted a better understanding what this movement in education was all about (Note: I was among the least involved. The main creators were Chris and Dan themselves, Jane Manning, Adam Vogel, John Lyman, and probably many others!).</p>
<p><iframe width="560" height="315" src="http://www.youtube.com/embed/nfoudtpBV68" frameborder="0" allowfullscreen></iframe><br />
<em>Figure 2: The first lecture! Introduction to Natural Language Processing</em></p>
<p>I also helped launch a new course at Tulane&#8217;s <a href="http://www.drlatulane.org/">Disaster Resilience Leadership Academy</a>: &#8216;Crisis Informatics and Analytics&#8217;. I ran the first month, filling in for Jeannie Stamberger in a program coordinated by Ky Luu. It was at the smaller of the spectrum: I flew to New Orleans to run the first week in person and then delivered the following weeks using a combination of online lectures and a direct stream with the class.</p>
<p>Here are a few observations from both experiences: </p>
<h3>1. It&#8217;s not that different to existing education</h3>
<p>The NLP courses were more or less the same as the Stanford-internal courses that I had taken or taught. The inclusion of quizzes was a nice addition. For a real-time lecture, it&#8217;s not really feasible to have pop-quizzes when there are a large number of people, and certainly not in a way to make sure that everyone has understood the material so far. </p>
<p>It is obvious that there is an advantage to self-paced learning via online courses, but I think the ability for the <em>educators</em> to pace the learning has been under-appreciated. You can more easily enforce that someone doesn&#8217;t skip ahead then you could in-person.</p>
<h3>2. YouTube will win where Flash failed</h3>
<p>For a long time, Flash was predicted to begin taking over from HTML as the preferred format for many websites, primarily because of the greater richness in interaction and possibility for animation. This never really happened. Flash pages tended to be poorly indexed by search engines (mainly for technical reasons) which slowed the uptake, with the final blow being the iPhone browser&#8217;s deliberate lack of support. Our pages remained largely static.</p>
<p>It looks like we <em>are</em> finally getting this promise fulfilled, but that video will be the medium for online animation and interaction. Where Flash failed to add dynamic content to webpages, YouTube is now successfully adding static and interactive content to video. </p>
<p>At the moment it hasn&#8217;t really progressed beyond the quizzes mentioned above and links to further content, but it is likely to become more sophisticated as we become more comfortable with this new approach.</p>
<h3>3. Recruitment</h3>
<p>As someone currently looking to recruit top engineers, I like the potential to find people through these platforms. This may be a good revenue model for the education systems and some of the online education platforms are already exploring the option of <a href="http://chronicle.com/article/Providers-of-Free-MOOCs-Now/136117/">charging employers for access to student data</a>.</p>
<p>If I was considering hiring an engineer that did not have a background in Natural Language Processing, I would now expect them to take this course in preparation, to demonstrate both their willingness and competence. For engineers themselves, it is a convenient way to begin to re-skill in new areas.</p>
<h3>4. Specialization</h3>
<p>Online education systems don&#8217;t need to be massive. Reaching tens of thousands of people is a big step, but so is allowing for more specialized courses. </p>
<p>The &#8216;Crisis Informatics and Analytics&#8217; course at Tulane was a great example. I was teaching only a dozen students, most of them completing a Masters in disaster management. There aren&#8217;t many people who could teach these courses: I am the only person in crowdsourcing and Natural Language Processing circles who has worked in all three of academia, industry and disaster response. As a result there are very few quality papers on either of these topics that are useful for students, and many misleading ones clearly aimed at funding agencies. While I couldn&#8217;t physically relocate to teach the course, I really enjoyed the opportunity to connect with the incoming generation of disaster response professionals and to teach them how to objectively evaluate humanitarian technologies.</p>
<p>In general, the internet has been a great place for small, distributed groups of like-minded people. It looks like this will be true for education, too.</p>
<h3>5. Reference materials</h3>
<p>I found myself using one of the Stanford NLP videos recently: </p>
<p><iframe width="560" height="315" src="http://www.youtube.com/embed/XdjCCkFUBKU" frameborder="0" allowfullscreen></iframe><br />
<em>Figure 3: Good Turing</em></p>
<p>I was implementing smoothing algorithms and wanted to double-check that I was getting one of them correct. The Stanford NLP video had everything I needed: the formal definition of the functions and an audio explanation. Running this in a screen alongside my code was more convenient than online text or an offline textbook. </p>
<p>As with the course at Tulane, this topic is pretty specialized (the video only has 600 views at this time), but it will also remain current as the algorithm itself will remain useful for a long time. </p>
<h3>One prediction for the future</h3>
<p>It wouldn&#8217;t be fun to write about this without making at least one prediction about the future of online education, so here is mine:</p>
<dl>
<dd>Our future educators will be robots</dd>
</dl>
<p>Or perhaps more precisely, artificial intelligence will play a central role in online education.</p>
<p>Along with Andrew Ng, Coursera was launched by Daphne Koller who is also a leading academic in machine learning. Coursera&#8217;s main competitor from within Stanford is Udacity, launched by another artificial intelligence researcher Sebastian Thrun (and first co-taught with Terry Winograd who also began his career in Natural Language Processing). </p>
<p>Outside of Natural Language Processing, I have also worked in artificial intelligence applied to geographic information systems (my 2003 paper, &#8216;Complex Spatial Relationships&#8217; now forms the basis of the entry with the same name in <a href="http://books.google.com/books/about/Encyclopedia_of_GIS.html?id=6q2lOfLnwkAC">Encyclopedia of GIS</a>). Our work stemmed from <a href="http://scholar.google.com/citations?view_op=view_citation&#038;hl=en&#038;citation_for_view=XPdhXUUAAAAJ:u5HHmVD_uO8C">Rakesh Agrawal&#8217;s seminal &#8216;Fast algorithms for mining association rules&#8217;</a>. </p>
<p>I was delighted to finally meet Rakesh this year, but not in a strictly artificial intelligence context. He is also now working in education, researching how digital education can be augmented by automated information retrieval (among other education-focussed projects at Microsoft Research). </p>
<p>We are at a tipping point where it will become cheaper to give every student a reading tablet than to ship physical textbooks to them. Once we are working from purely digital media, we all of a sudden have the ability to automatically extend the content through video, further text, or interactive environments with other students and smart systems. It is exciting to see so many of the best minds in artificial intelligence now applying themselves to this problem. </p>
<p>In short, the people leading the way in online education are also the world&#8217;s leaders in artificial intelligence. This makes for a nice symmetry. The Natural Language Processing courses help bring these technologies to a wider audience. In turn, the same technologies are now being applied to online education itself. So the intersection of artificial intelligence and education is where I will be watching (and continuing to help) with the most interest. </p>
<p>Rob Munro<br />
December 31, 2012</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/adventures-in-online-education/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Volunteerism vs Professionalism for remote humanitarian work</title>
		<link>http://www.junglelightspeed.com/digital_volunteers/</link>
		<comments>http://www.junglelightspeed.com/digital_volunteers/#comments</comments>
		<pubDate>Mon, 12 Nov 2012 14:43:13 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Crisis Response]]></category>
		<category><![CDATA[Crowdsourcing]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[Microtasking]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=821</guid>
		<description><![CDATA[There has been a lot of media recently about digital humanitarian volunteers, especially those who are working remotely. Much of it is pure sensationalism, like this quote from Fast Company about Hurricane Sandy last week: &#8220;The proof is in the tweets: More than 20 million about the storm were sent between October 27 and Nov [...]]]></description>
				<content:encoded><![CDATA[<p>There has been a lot of media recently about digital humanitarian volunteers, especially those who are working remotely. Much of it is pure sensationalism, like this quote from Fast Company about Hurricane Sandy last week:</p>
<dl>
<dd>
&#8220;The proof is in the tweets: More than 20 million about the storm were sent between October 27 and Nov 1.&#8221; <a href="http://www.fastcompany.com/3002837/sandy-became-sandy-emergency-services-got-social">Fast Company</a>
</dl>
<p>If you work in information technology, even if not in disaster-response circles, this no doubt makes you shudder. How is the raw volume of communications proof of anything? What is the baseline for the same people tweeting about their lives before Sandy? The value is in what could be processed and utilized, and for this the real numbers come later in the article:</p>
<dl>
<dd>&#8220;All together, the city [of NY] has more than 200 staff doing social in English and Spanish. They sent out more than 2,000 tweets between Oct. 26 and Nov 7.&#8221;
</dl>
<p>It sounds impressive until you realize that&#8217;s only about 150 tweets per day, among the 200 staff. In other words, <em>New York City sent less than 1 tweet per day per social media staff member</em>. How is this even news? This tells us that engaging in social media in this way was <em>not</em> a significantly large task. It also tells us that Fast Company cannot read their own numbers.</p>
<p>The truth is that despite the press and $100,000s or $1,000,000s of investment, remote digital volunteerism has generally failed to have a significant impact on disaster response, with only a very few exceptions. Of the amount of money that has gone into digital volunteer initiatives, the return has typically been less than 10%. </p>
<p>Crowdsourced information processing is a predominantly paid task, with at least 99% of crowdsourced workers (microtaskers) receiving compensation for their work. The work is characterized by being cost-effective and efficient. Unless you are manually processing 10,000s of data points per day, it will cost <em>more</em> to manage volunteers than to pay professional microtaskers. Even worse, well-meaning remote digital volunteerism can disrupt the burgeoning information economies within a region and ultimately result in a net-deficit by limiting the potential for ongoing digital work.</p>
<p>This is something that I talk about regularly in humanitarian circles. This article is a summary of talks that I have given at the <a href="https://www.understandrisk.org/">World Bank&#8217;s &#8216;Understanding Risk&#8217;</a> conference earlier this year in Capetown, the recent <a href="http://wilsoncenter.net/event/webcast-day-1-connecting-grassroots-to-government-for-disaster-management-policy-roundtable-0">Wilson Center Roundtable on Connecting Grassroots to Government for Disaster Management</a> and at Tulane University&#8217;s <a href="http://www.drlatulane.org/">Disaster Resilience Leadership Academy</a>, where I helped establish the &#8216;Crisis Informatics and Analytics&#8217; course in their Masters program. I owe thanks to discussions with a number of people, especially John Crowley and Jennifer Chan at the Harvard Humanitarian Initiative, Jeannie Stamberger and Jessica Ports of Tulane, Shadrock Roberts of USAID, Schuyler Erle and Kate Chapman of the Humanitarian Open Street Map team, and Jeffrey Villaveces of the United Nations Office for the Communication of Humanitarian Affairs.</p>
<p>Contrary to what media organizations like Fast Company tells you, there have not yet been any large crowdsourcing deployments in disaster-response contexts, and some of the largest to date <em>have</em> paid the workers. These past paid deployments give us a convenient baseline to work out the appropriate cost. Translation comes to about $0.30 per sentence, categorization to about $0.05 per report, and geolocation to $0.20. These are numbers can be found in <a href="http://www.mission4636.org/report">&#8216;Crowdsourcing and the Crisis-affected Community&#8217;</a> in the Journal of Information Retrieval, and could be applied to processing communications/reports in any scenario.</p>
<p>We can take these numbers and apply them to past remote volunteer initiatives in order to calculate the amount saved by not paying the workers. For those that also used paid crowdsourced workers, we can also estimate the economic impact of the deployment by the total amount of wages that have subsequently been paid to these workers:  </p>
<style>
<!--
.figures td{
text-align:right;
border-width: 1px;
border-color:black;
border-style:solid;
padding: 5px;
}
td.l{
text-align:left;
}
-->
</style>
<table class="figures">
<tr>
<th class="l">Volunteer initiative</th>
<th>Volume</th>
<th>Gross Value</th>
<th>Economic impact</th>
</tr>
<tr>
<td class="l">Mission 4636
<td>  80,000
<td> $23,000
<td>$250,000+</p>
<tr>
<td class="l">Ushahidi Haiti
<td > 3,400
<td> $850
<td> -</p>
<tr>
<td class="l">Chile Earthquake Map
<td> 1,200
<td> $300
<td> &#8211; </p>
<tr>
<td class="l">Pakreport
<td> 1,500
<td> $600
<td> $100,000+</p>
<tr>
<td class="l">Alabama Recovery Map
<td> 355
<td> $89
<td> -</p>
<tr>
<td class="l">Oil Spill Crisis Map
<td> 3,400
<td> $850
<td> &#8211; </p>
<tr>
<td class="l">Saskatchewan Flood Map
<td> 240
<td> $60
<td> -</p>
<tr>
<td class="l">Queensland Flood Map
<td> 500
<td> $125
<td> -</p>
<tr>
<td class="l">Libya Crisis Map
<td> 2,500
<td> $600
<td> -</p>
<tr>
<td class="l">Sinsai (Japan)
<td> 11,500
<td> $3,020
<td> -</p>
<tr>
<td class="l" colspan=4><em>Money saved by digital volunteers / economic impact</em></td>
</table>
<p>The deployments above are listed in more or less chronicle order from the beginning of 2010. In three years, this totals about $30,000 worth of unpaid information processing. These numbers should be taken independent of the less quantitative impacts of the humanitarian work itself. For example, Mission 4636 was primarily a translation platform between Haitian Kreyol and English at a time when translators were few and far between. I remember connecting translators in Canada with relief workers inside logistic bases in Port-au-Prince, where it was easier for the relief workers to connect with a translator over the internet than to engage potential translators just a few metres away outside the gates of the base. It is hard to put a price on the importance of enabling communication in these contexts. However, the remote translators could just as easily have been paid, enabling the same communication and impact, so the comparison holds. This is ultimately what happened with Mission 4636, transferring to paid workers within Haiti at an organization called FATEM. </p>
<p>As with FATEM in Haiti, Pakreport resulted in an injection of money into the information economy. Pakreport was deployed following the floods in Pakistan in 2010 and resulted in the permanent employment of workers through an organization called Brightspyre. For FATEM and Brightspyre, the workers that were originally employed in humanitarian contexts now work as professional microtaskers, bringing in aggregate salaries that far exceed the value of information processing during the disasters.</p>
<p>Of the volunteer components of the deployments above, only Mission 4636 and possibly Sinsai created more value than they cost to run. Some were particularly expensive. For example, the Libya Crisis Map was run by the UN and relied on volunteers with a full-time manager supervising. After four weeks the volunteers were dwindling, so the UN posted three people on burn-out eight-hour back-to-back shifts to ensure that there was always someone staffing the incoming information. It is likely that the UN paid more than $5,000 to coordinate a $600 volunteer effort.<br />
Had they used paid, professional microtaskers, the UN would have had more funds to apply to other aspects of the response. </p>
<p><a href="http://www.smbc-comics.com/index.php?db=comics&#038;id=2797" target="_new"><br />
<img src="http://www.smbc-comics.com/comics/20121117.gif" width=300 style="float:right"></a></p>
<h3>The lesson not learned</h3>
<p>Given the numbers above, I estimate that for every $1 spent on remote digital volunteers for humanitarian work, only $0.10 goes into the actual work, with the rest going on internal expenses, primarily for management (donations to the above initiatives total about $300,000). Wyclef&#8217;s recently bankrupt charity, Yele, has been heavily criticized for spending 50% of its money on internal expenses and is currently being investigated for corruption. If 90% of the money spent on digital volunteerism is going on internal expenses, why aren&#8217;t digital volunteer networks being investigated for corruption? For deployments only worth a few hundred dollars, it simply doesn&#8217;t matter. It is not wrong to run a small inefficient deployment using crowdsourcing &#8211; an information processing strategy that would have been new to many people. It would only be unethical if the inefficiency of remote volunteerism was scaled to larger numbers with no place for paid workers. </p>
<p>Since 2010, there have not been any digital humanitarian initiatives that have resulted in paid ongoing employment for crowdsourced workers. There are worrying steps in the direction of expanding remote volunteer networks (sometimes known as the &#8216;volunteer and technical community&#8217; in humanitarian circles). We have seen enough deployments now to safely say that this model of volunteer information processing does not work, and that humanitarian organizations should utilize more conventional paid crowdsourced workers.</p>
<h3>Remote volunteers and the textile industry</h3>
<p>So what should a remote volunteer do to avoid wasting 90% of their efforts, but still contribute? The answer is to work on someone else&#8217;s crowdsourced task.</p>
<p>There is an analogous situation in the textile industry. When you donate clothing to charity, it does not always end up in the less resourced parts of the world. Many charities have realized that they are having a long-term negative impact on the textile industries of countries when they dump large amounts of donated clothes on the market. Even a temporary disruption can drive people out of the industry on a permanent basis. So when you donate your clothes, the charity sells them in your own country and it is the money from the sale that makes it to the less resourced parts of the world.</p>
<p>The same is true for information processing. If someone within a crisis-affected region can be paid for information processing, <em>and</em> it will cost less than a remote volunteer, then there is no need for a remote worker to disrupt the local information economy. This can prevent efforts like FATEM and BrightSpyre from emerging. </p>
<p>If you are remote to a disaster and have no special local knowledge or connections, the economics come out in favor of you undertaking completely unrelated work for $5-$6 per hour on crowdsourcing/microtasking platforms like Amazon&#8217;s Mechanical Turk and simply sending this on to organizations within the region. </p>
<h3>Willing workers</h3>
<p>I wrote this article while completing some calculations on the volunteer response to Hurricane Sandy. In this case, they were completing damage assessments from arial photographs. The actual report is not yet public, which is why I did not include Sandy among the examples above, but I wanted to conclude with words from the potential workforce. I surveyed 20 professional microtaskers on their willingness to perform disaster-assessment tasks for compensation. All were interested and here are a couple of the responses. </p>
<dl>
<dd>&#8220;I think $2.00 is fair for moderating 100 pictures with a single criterion. I am taking into consideration the nature of the task and the organization making the request because I would price it higher for a normal business requester.&#8221;</p>
<dd>&#8220;You are being fair with the given price range. As long as task is easy enough to complete several in a timeframe to equal 5-6 dollars an hour &#8230; I do this work as a necessary PT job, so pay does matter!&#8221;</p>
<dd>
</dl>
<p>If we can create employment for these people and save the humanitarian community money, then I think that there is a strong argument to reverse the recent trend towards non-professional volunteers and return to systems that aim to support professional information services within the disaster-affected communities.  </p>
<p>Rob Munro<br />
November 2012</p>
<p>EDIT: added the comic, HT to George Chamales </p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/digital_volunteers/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>&#8220;Processing short-message communications in low-resource languages&#8221;</title>
		<link>http://www.junglelightspeed.com/processing-short-message-communications-in-low-resource-languages/</link>
		<comments>http://www.junglelightspeed.com/processing-short-message-communications-in-low-resource-languages/#comments</comments>
		<pubDate>Thu, 09 Aug 2012 21:32:15 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Crisis Response]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=783</guid>
		<description><![CDATA[I completed my PhD at Stanford University earlier this year. It is published by Stanford with Creative Commons Attribution license: &#8220;Processing short-message communications in low-resource languages.&#8221; The work explores the nature of the variation inherent to short message communications in the majority of the world&#8217;s languages, and the extent to which modeling this variation can [...]]]></description>
				<content:encoded><![CDATA[<p>I completed my PhD at Stanford University earlier this year. It is published by Stanford with Creative Commons Attribution license:</p>
<dl>
<dd><a href="http://purl.stanford.edu/cg721hb0673">&#8220;Processing short-message communications in low-resource languages.&#8221;</a></dd>
</dl>
<p>The work explores the nature of the variation inherent to short message communications in the majority of the world&#8217;s languages, and the extent to which modeling this variation can improve natural language processing systems. It is safe to say that text messaging is used by a more linguistically diverse set of people than any prior digital communication technology. Many languages that were only ever spoken languages are now being written for the first time in short bursts of one or two sentences.</p>
<p><div id="attachment_784" class="wp-caption alignright" style="width: 460px"><a href="http://www.junglelightspeed.com/files/cellphone_use.png"><img class="size-full wp-image-784" title="cellphone_use" src="http://www.junglelightspeed.com/files/cellphone_use.png" alt="" width="450" /></a><p class="wp-caption-text">The shift to a diverse texting world. The size of the graphics indicate the number of text messages sent, and the segments are the number of cellphone subscribers defined by the International Telecommunication Union (ITU) as being in the ‘developing’ or ‘developed’ world.</p></div> In the last 10 years the spread of cellphone technology has put most of the world&#8217;s people in near-immediate contact with one another, with most of the growth coming from regions defined by the International Telecommunication Union (ITU) as &#8216;developing&#8217;, which will correlate with greater linguistic diversity and lack of access to resources (see figures to the rights). </p>
<p>The spread of technology has not been matched by a similar increase in the capacity to process and understand this information. Ten years ago, to hear most low resource languages meant days of travel. Today, most of the world&#8217;s 7,000 languages can be found on the other end of your phone, but we have no speech-recognition, machine-translation, search-technologies or even spam-filters for 99% of them. Natural language processing is one way to leverage our limited resources across large amounts of data and the work here shows that it is possible to build accurate systems for most languages, despite the ubiquitous written variation and lack of existing resources. This should be encouraging for anyone looking to leverage digital technologies to support linguistically diverse populations, but it opens as many questions as it solves. In a way, putting a phone in the hands of everybody on the planet is the easy part. Understanding everybody is going to be more complicated.</p>
<p>The objective of my dissertation was to explore the nature of the variation inherent to short message communications in the majority of the world&#8217;s languages, and the extent to which modeling this variation can improve natural language processing systems. Text messaging may be the most linguistically diverse form of digital communication that has ever existed, but it is almost completely unstudied. Three sets of short messages were studied, in the Haitian Kreyol, Chichewa, and Urdu languages.</p>
<p><div id="attachment_797" class="wp-caption alignright" style="width: 460px"><a href="http://www.junglelightspeed.com/files/chichewa_odwala4.png"><img src="http://www.junglelightspeed.com/files/chichewa_odwala4-1024x380.png" alt="" title="chichewa_odwala" width="450"  class="size-large wp-image-797" /></a><p class="wp-caption-text">The complication for natural language processing: the number of spellings for &#039;odwala&#039; (&#039;patient&#039;) in text messages between Chichewa-speaking health workers, compared to the English translations of those messages.</p></div> All three contain the substantial spelling variations that result from a productive use of affixes/compounds, from phonological/orthographic variation, and from the typographic errors that arise from speakers with varying literacy. For example, the 600 Chichewa messages have more than 40 spellings for the word <em>odwala</em> (&#8216;patient&#8217;), with most appearing just once. This is problematic for many current approaches to natural language processing, which assume the level of standardization that is found in formal written English. However, as the variation is linguistically predictable it follows patterns that can be modeled. </p>
<p>The dissertation first looked at automated methods for modeling the subword variation, finding that language independent methods can perform as accurately as language specific methods, indicating a broad deployment potential. Turning to categorization, it was shown that by generalizing across the spelling variations, we can, for example, implement classification systems that can more accurately distinguish emergency messages from those that are less time critical, even when incoming messages contain a large number of previously unknown spellings of words. Looking across languages, the words that vary the least in translation are named entities, meaning that it is possible to leverage loosely aligned translations to automatically extract the names of people, places and organizations. Taken together, it is hoped that the results will lead to more accurate natural language processing systems for low-resource languages and, in turn, lead to greater services for their speakers. I will expand on each of these a little below. The full accounts can be found in the dissertation itself, with much of it also in <a href="http://www.robertmunro.com/research/">the publications arising from this work</a>.</p>
<h3>Modeling subword variation</h3>
<p>This part of my dissertation evaluated methods for modeling subword variation. It looked at segmentation strategies and compared language specific and language independent methods to distinguish stems from affixes, such as distinguishing &#8216;go&#8217; and &#8216;-ing&#8217; in the English verb &#8216;going&#8217;. It then compared normalization strategies for spelling alternations that arise from phonological or orthographic variation, such as the `recogni<em>z</em>e&#8217; and `recogni<em>s</em>e&#8217; variants in English or more-or-less phonetic spellings like the &#8216;z&#8217; in &#8216;cat<em>s</em> and dog<em>z</em>.&#8217; The language specific segmentation methods were created from the Sam Mchombo&#8217;s <a href="http://books.google.com/books/about/The_Syntax_of_Chichewa.html?id=SRCFoDp88oUC">Syntax of Chichewa</a>. The language independent methods for segmentation built on Sharon Goldwater, Thomas L. Griffiths, Mark Johnson&#8217;s <a href="http://www.sciencedirect.com/science/article/pii/S0010027709000675">A Bayesian framework for word segmentation: Exploring the effects of context</a>, adapted to segmenting word-internal morpheme boundaries. The normalization methods use language specific methods created from Steven Paas&#8217; <a href="http://books.google.com/books/about/English_Chichewa_Chinyanja_dictionary.html?id=_PA_AQAAIAAJ">English-Chichewa/Chinyanja dictionary</a>. They are compared to normalization methods that are applicable to any language utilizing Roman script, and to language independent noise-reduction algorithms.</p>
<p>For morphological and normalization strategies alike, the language independent methods performed almost as accurately as the language specific methods, indicating a broad deployment potential. Overall, these results  are promising, finding that language independent unsupervised methods often perform as well as language specific hand-crafted methods. When performance is evaluated in terms of deployment within a supervised classification system to identify medical labels like &#8216;patient-related&#8217; and &#8216;malaria&#8217; the average gain in accuracy when introducing subword models was F=0.206, with an error reduction of up to 63.8% for specific labels. This indicates that while subword variation is a persistent problem in natural language processing, especially with the prevalence of word-as-feature systems, this variation can itself be modeled in a robust manner that can be deployed with few or no prior resources for a given language. In conclusion, while subword variation is a widespread feature of language, it is possible to model this variation in robust, language independent ways.</p>
<p><div id="attachment_807" class="wp-caption alignright" style="width: 460px"><a href="http://www.junglelightspeed.com/files/learning_rate_chichewa2.png"><img src="http://www.junglelightspeed.com/files/learning_rate_chichewa2-1024x585.png" alt="" title="learning_rate_chichewa" width="450" class="size-large wp-image-807" /></a><p class="wp-caption-text">Comparing subword models for Chichewa and English, showing why they are needed for Chichewa, with more cross-linguistically typical spelling variation.</p></div> The cross-linguistic component of the research shows that when applying subword models to the Chichewa messages, there is a substantial F=0.091 gain when compared to a system trained only on words and phrases. By contrast, for the English translations of the same messages there is only a F=0.009 gain in accuracy, which is not significant. Similar results were found for Kreyol and Urdu. This highlights why subword modeling has not been a core component of the mostly English-centric research in classification to date, while also emphasizing its importance here.</p>
<h3>Classification of short message communications</h3>
<p>This part of my research extended the work on broad-coverage subword modeling to classification. Document classification is a very diverse field, but the literature review showed that past research into the classification of short messages is relatively rare, despite the prevalence of text messaging as a global form of communication. Several aspects of the classification process were investigated here, each shedding light on subword variation, cross-linguistic applicability, or the potential to actually deploy a system.</p>
<p><div id="attachment_805" class="wp-caption alignright" style="width: 460px"><a href="http://www.junglelightspeed.com/files/activelearning_kreyol1.png"><img src="http://www.junglelightspeed.com/files/activelearning_kreyol1-1024x842.png" alt="" title="activelearning_kreyol" width="450" class="size-large wp-image-805" /></a><p class="wp-caption-text">Active-learning to prioritize time-critical messages: by deliberately sampling the time-critical &#039;actionable&#039; messages, the system can incrementally prioritize new messages with near the same accuracy with only 1/10th the (manual) workforce.</p></div>Looking at architectures suited to actual deployment scenarios, the research investigated streaming models and active learning. These are constrained but realistic approaches to learning, where a classifier updates its model(s) dynamically as new messages stream in, with a bounded capacity to manually assign new labels to the incoming messages. A new approach to combining linguistic and nonlinguistic data was proposed, using hierarchical streaming models applied to the particular task of identifying messages with an &#8216;actionable&#8217; label among the Kreyol data. From a baseline of F=0.207, the inclusion of subword models and nonlinguistic features raises the accuracy to F=0.855, a substantial gain. It is also found that accurate models can be built when only a subset of the incoming data receives a manual label, by explicitly targeting actionable-looking messages for inspection and manual classification, finding that when as little as 5% of the incoming data receives a label it is still possible to classify incoming messages with F=0.756 accuracy. It is concluded that accurate classification is possible in a realistic deployment scenario, especially for prioritization tasks.</p>
<p>This is especially important for the information used here, which was taken from <a href="http://www.mission4636.org">Mission 4636</a>, an initiative that I coordinated following the 2010 earthquake in Haiti, which processed emergency text messages from the Haitian population using crowdsourced workers to translate and geolocate the information for the responders. My biggest fear when running this as a crowdsourced effort was that the volume would overwhelm the workforce, delaying the processing of important messages. This proved not to be the case for Mission 4636, with the (predominantly Haitian) workers and volunteers processing the messages with a median turnaround of only 5 minutes. It is easy to imagine a situation with an even higher volume of messages where human-processing could not keep pace, with even an hour&#8217;s delay meaning life-or-death. Information coming out of crisis-affected regions is only likely to increase in the future, so these kinds of technologies will be key to information processing.</p>
<p>Domain dependence, a related deployment hurdle, was also investigated here, comparing cross-domain accuracy when applying text message-trained data to Twitter and vice-versa. Despite being short message systems about the same events, the cross-domain accuracy is poor, supporting an analysis about the difference in the usage of the platforms. However, it is also shown that some of the accuracy can be reclaimed by modeling the prior probability of labels per message-source in certain contexts.</p>
<h3>Information Extraction</h3>
<p>The final part of the dissertation looked at information extraction, with a particular focus on Named Entity Recognition. The research leverages the observation that the names of people, locations and organizations are the least likely to change form across translations, drawing on this observation to build systems that can accurately identify instances of named entities in loosely aligned translations of short message communications. Named Entity Recognition is the cornerstone of a number of critical Information Extraction tasks. In actual deployment scenarios, it could support the geo-location of events and the identification of information about missing people, two functions that were performed manually on this data. This approach, novel for Named Entity Recognition, has three steps.</p>
<p>First, candidate named entities are generated from the messages and their translations (the parallel texts). It is found that the best method to do this is to calculate the local deviation in normalized edit distance. For all cross-language pairs of messages, the word/phrase pairs across the languages that are the most similar according to edit-distance are extracted, and the deviation is calculated relative to the average similarity of all words/phrases across the translation. In other words, this step attempts to find highly similar phrases across the languages in translations that are otherwise very different. This method alone gives a little over 60% accuracy for both languages.</p>
<p>In the second step, the set of candidate named entity seeds are used to bootstrap predictive models. The context, subword, word-shape and alignment features of the entity candidates are used in a model that is applied to all candidate pairs, predicting the existence of named entities across all translated messages. This raises the accuracy to F=0.781 for Kreyol and F=0.840 for English. </p>
<p>As there are candidate alignments between the languages, it is also possible to jointly learn to identify named entities in the second step by leveraging the context and word-shape features in the parallel text, by extending the feature space into both languages across the alignment predictions. In other words, the feature-space can be extended across both languages for the aligned candidate entity. This raises the accuracy to F=0.846 for Kreyol and F=0.861 for English.</p>
<p>In the third and final step, a supervised model is trained on existing annotated data in English, the high-resource language. This model is used to tag the English translations of the messages, which in turn is leveraged across the candidate alignments of entities. This raises the accuracy to  F=0.904 for both Kreyol and English. The figures for Kreyol are approaching accuracies that would be competitive with purely supervised approaches trained on tens of thousands of examples. </p>
<p>It is concluded that this novel approach to Named Entity Recognition is a promising new direction in information extraction for low-resource languages. Most low-resource languages will never have large manually annotated training corpora for Named Entity Recognition, but many will have a few thousand loosely aligned translated sentences, which might allow a system to perform comparably, if not better.</p>
<h3>Concluding remarks</h3>
<p>While the analysis of the data sheds enough light on the nature of short message communications to undertake the work presented here, there is still much that remains unknown. Which languages are currently being used to send text messages? We simply don&#8217;t know. There may be dozens or even hundreds of languages that are being written for the first time, in short bursts of one or two sentences. If text messaging really is the most linguistically diverse form of written communication that has ever existed, then there is still much to learn.</p>
<p>
Rob Munro, August 2012</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/processing-short-message-communications-in-low-resource-languages/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Crowdsourcing and the crisis-affected population</title>
		<link>http://www.junglelightspeed.com/crowdsourcing-and-the-crisis-affected-population/</link>
		<comments>http://www.junglelightspeed.com/crowdsourcing-and-the-crisis-affected-population/#comments</comments>
		<pubDate>Fri, 01 Jun 2012 20:12:07 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=745</guid>
		<description><![CDATA[Edit: This is cross-posted with minor changes at http://www.mission4636.org/report/ Just over two years ago, Haiti was hit by one of the worst natural disasters in living memory. Despite the scale of the earthquake, most of the communication infrastructure remained intact. The Haitian community came together via radio and sms to share information about the quickly [...]]]></description>
				<content:encoded><![CDATA[<p><strong>Edit:</strong> This is cross-posted with minor changes at <a href="http://www.mission4636.org/report/">http://www.mission4636.org/report/</a></p>
<p>Just over two years ago, Haiti was hit by one of the worst natural disasters in living memory. Despite the scale of the earthquake, most of the communication infrastructure remained intact. The Haitian community came together via radio and sms to share information about the quickly changing conditions: the locations of operational clinics and hospitals, information about missing people, the status of the international relief efforts that were arriving in the country.</p>
<p><div id="attachment_746" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/crowdsourcing_deployments.jpg"><img src="http://www.junglelightspeed.com/files/crowdsourcing_deployments-300x189.jpg" alt="" title="crowdsourcing_deployments" width="300" height="189" class="size-medium wp-image-746" /></a><p class="wp-caption-text">Mission 4636 processed more information than the next ten humanitarian crowdsourced deployments combined.</p></div>Most of the international relief workers arriving in the country did not speak Haitian Kreyol or know the geography of Haiti. I had the privilege to support an effort to bridge the gap between the Haitian community and the international relief efforts. Haitian engineers established a number in Haiti, &#8217;4636&#8242;, that anybody could send a text message to for free. In an effort called &#8216;Mission 4636&#8242;, Kreyol and French speakers worked on crowdsourcing platforms to translated, categorize, geolocate and extract missing person information from the text messages. The structured data, now in English, was streamed directly back to the relief efforts in Haiti, with a typical turnaround of just 5 minutes. </p>
<p>The majority of the people working on Mission 4636 were members of the Haitian diaspora, working from at least 49 different countries and collaborating via a simple online chat. I coordinated this initiative, which ran for several months, and for the latter half we transfered from international volunteers to paid crowdsourced workers within Haiti, creating jobs where they were needed most. </p>
<p>This post is to accompany the first full report about Mission 4636. It was the first time that crowdsourcing had been used for disaster response, and is still the largest deployment of its kind to date. The report has been accepted to the Journal of Information Retrieval and will be published there soon. This manuscript is released in advance of the publication:</p>
<dl>
<dd><a href="http://bit.ly/m4636" target="_new">http://bit.ly/m4636</a>
</dl>
<p>In summary, the report has the following findings:</p>
<ol>
<li>1. The greatest volume, speed and accuracy in information processing was by Haitian nationals and those working most closely with them.
<li>2. Previous reports about Mission 4636 have incorrectly credited international organizations with the majority of the work. Only 5% of messages to 4636 went through the software run by international not-for-profits, but reports like the <a href="http://www.globalproblems-globalsolutions-files.org/gpgs_files/pdf/2011/DisasterResponse.pdf" target="_new">Disaster Relief 2.0 Report</a> inflated this 5% to appear to be the whole effort, sidelining the 95% that was Haitian run.
<li>3. No new technologies played a significant role in Mission 4636, which is again contrary to most reports to date.
<li>4. Crowdsourcing (microtasking) was an effective strategy to structure and translate information into reports that the that responders could act on.
<li>5. The online chat was vital for information sharing, as no one person could know all the possible locations and translations, but someone among the collaborating volunteers often did.
<li>6. Among social media platforms, Facebook was by far the most important.
<li>7. Translation was the largest and most important information processing task, followed by categorization and then geolocation and structuring information about missing people.
<li>8. The use of a public-facing &#8216;crisis map&#8217; for the messages was opposed by the majority of people within Mission 4636 and exposed the identities of at-risk individuals.
<li>9. The majority of volunteers came together through social media and strong social ties.
<li>10. A quarter of all crowdsourced information processing was by paid workers within Haiti, who were one of the most vital workforces but have also been excluded from most other reports to date.
<li>11. The most important connections to the country were through the volunteers themselves, with direct relationships to people managing the clinics, radio stations, and individual people that we were supporting.
</ol>
<p>From the findings in the report, the following recommendations are made for organizations or individuals considering the use of crowdsourcing in response to future disasters:</p>
<ol>
<li>1. Find and manage volunteers via strong social ties.
<li>2. Maintain a ten-to-one local-to-international workforce.
<li>3. Default to private data practices.
<li>4. Publish in the language of the crisis-affected community.
<li>5. Do not elicit information for which there is not the capacity to respond.
<li>6. Do not elicit emergency response communications.
<li>7. Use social media to encourage the centralization of information.
<li>8. Establish partnerships with technology companies.
<li>9. Avoid partnerships with media organizations and citizen journalists.
<li>10. Integrate, don’t innovate or disrupt.
<li>11. Employ people with close ties to the crisis-affected region.
</ol>
<p>As I hope the report makes clear, those of us who work in technology for social good owe an overwhelming debt to the Haitians among the diaspora who put aside their personal grief to come together online and work tirelessly to help those in the greatest need. There are many non-Haitians who have also played an important role, and continue to do so. For example, Mark Belinsky helped establish some of the workflows and has continued to serve Haiti, helping to establish <a href="http://kofaviv.org/" target="_new">KOFAVIV</a>, an organization in Haiti that supports the victims of gender-based violence. Chrissy Martin helped manage a group of non-Haitians at Tufts University and has also helped Haitian telecommunication companies to establish phone banking. Ronny Hoffman was one of the most tireless volunteers, bootstrapping his knowledge of Kreyol from his work as a French teacher. </p>
<p>However, the greatest respect, credit and admiration should go to Haitians themselves. Many helped compile the report and it was an honor to invite them to give the final words in this article:</p>
<dl>
<dd>
<div style="border-style:dashed; border-width:1px; border-color:grey; display:block; padding:5px;"<br />
"I just want to thank you and the incredible team of superb talent and generosity that you have organized and kept bonded 24/7..., for making one of the best collective efforts that I ever experienced or contributed to in my life.</p>
<p>I wish you the best. I am still very grateful."</p>
<p>Ronald Beliard, Mission 4636 Volunteer
</p></div>
<dd>
<div style="border-style:dashed; border-width:1px; border-color:grey; display:block; padding:5px;">
&#8220;The January 2010 earthquake woke-up the entire world, especially the people in Haiti, for a catastrophe of this proportion had not occurred in this region for more than 150 years. Those familiar with Haitian history had known that such earthquakes had plagued Haiti in prior centuries. For example, the May 7, 1842 quake that damaged the northern region of Haiti was estimated to be an 8.1 on the Richter Scale. The June 3, 1770 quake destroyed Port-au-Prince and was estimated to be a 7.5. These previous disasters, by the way, were both estimated to be of greater magnitude than the most current Jan. 12, 2010 earthquake, which registered 7.0 on the Richter Scale. This particular quake was most devastating however, for Port-au-Prince is much more heavily populated than in prior periods. While Haitians near ground zero suffered greatly, fellow Haitians in unaffected parts of the world were also traumatized for they could not assist loved ones in need.</p>
<p>Mission 4636 was a comforting way for me to help the Haitians people at home while I was all the way in California. We received their requests of assistance, translated them, and located the places for emergency personal to attend to. My involvement in these activities brought Haiti and my people closer to my heart, and I felt as if I was helping on the scene. Our relief effort became so large that we invited friends who were not native of Haiti, but who knew Haitian Creole to join us. Extremely touching were the enthusiasm, love, passion and dedication of every single Mission 4636 volunteer. We came together from all over the world at all hours of the day for the same, wonderful purpose – to help Haitians in crisis situation. We volunteers comforted each other in this difficult time of despair by sharing our experiences and expressing our feelings at any time we entered the chat room. The sharing was in fact a form of therapy for us. Most importantly, together we achieved an incredible feat, which in return filled us with the satisfaction and joy of helping others in need. I felt tremendously delighted that I was a part of this mission, and I hope that Mission 4636 could be a model to follow for future disasters.&#8221;</p>
<p>Jacqueline Oriscar, Mission 4636 Volunteer
</p></div>
</dd>
<dd>
<div style="border-style:dashed; border-width:1px; border-color:grey; display:block; padding:5px;"<br />
"I have close family members who live in Haiti, and my first reaction was "Is this really happening?" I found out about the 4636 effort through a friend of mine. I got online, staying up late after I put the kids to bed and trying to translate as many text messages as I could. There was this energy, with people from all over the world creating this support system. It made it feel like I was almost on the ground helping"</p>
<p>Jean-Robert Durocher, Mission 4636 Volunteer
</p></div>
</dd>
<dd>
<div style="border-style:dashed; border-width:1px; border-color:grey; display:block; padding:5px;">
&#8220;Like many other Haitian immigrants in the U.S., I watched with horror, despair, and helplessness the media coverage of the earthquake that devastated my home country and destroyed so many lives in Haiti. Having lived in the US for nearly two decades, I have never felt so disconnected from my home country and people than at that moment. I was not there to help or even be there, to pay witness to their suffering, in their greatest moment of need. Grief ridden, overwhelmed with guilt and sadness, I sought ways to help by providing assistance to loved ones in Haiti and making ﬁnancial contributions to various aid agencies in Haiti. But, it still did not feel enough. Translating as a volunteer in Mission 4636 helped me to do something more active and emotionally engaging. It helped me to connect emotionally, though far away, with the people of Haiti. It was satisfying to know that I was making tangible contributions and that I was helping to save lives. By helping others, I also helped myself. I felt useful once again and able to participate in and sit with the pain of other fellow Haitians. It brought me closer to home and closer to my people. Indeed, it was painful at times to read traumatic stories over and over again and to read about peoples desperate cries for help and not being able to help. But, this was pain that I welcomed, a pain that made me feel Haitian.</p>
<p>The other volunteers also provided much needed psychological support at the time. I was able to connect with others all over the world and share stories, experiences, and reactions to the earthquake. I quickly became part of a community where I was able to speak my native tongue, Haitian Creole and relate to other Haitians and friends of Haiti. Feeling isolated in a predominantly white community in Northern California, those connections were signiﬁcant sources of support.&#8221;</p>
<p>Johanne Eliacin, Mission 4636 Volunteer
</p></div>
</dd>
<p>Rob Munro<br />
June 2012</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/crowdsourcing-and-the-crisis-affected-population/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Doubly-named places</title>
		<link>http://www.junglelightspeed.com/doubly-named-places/</link>
		<comments>http://www.junglelightspeed.com/doubly-named-places/#comments</comments>
		<pubDate>Tue, 29 May 2012 17:13:44 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=685</guid>
		<description><![CDATA[The names of locations often survive in the original language well beyond that language&#8217;s influence in the location, meaning that the original meanings (and sometimes the entire language) are often forgotten. My commute often takes me past the tree from which the city of Palo Alto gets it name, meaning &#8216;High Branch&#8217;, in Spanish. The [...]]]></description>
				<content:encoded><![CDATA[<p>The names of locations often survive in the original language well beyond that language&#8217;s influence in the location, meaning that the original meanings (and sometimes the entire language) are often forgotten.</p>
<p><div id="attachment_696" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/palo_alto_high_branch.jpg"><img src="http://www.junglelightspeed.com/files/palo_alto_high_branch-300x219.jpg" alt="" title="palo_alto_high_branch" width="300" height="219" class="size-medium wp-image-696" /></a><p class="wp-caption-text">The tree from which Palo Alto (&#039;High Branch&#039;) gets it name, today and more than a century ago</p></div> My commute often takes me past the tree from which the city of Palo Alto gets it name, meaning &#8216;High Branch&#8217;, in Spanish. The giant redwood tree containing the high branch sits by Menlo Park creek, between the railway and cycle/walkway bridges. It is still there tall and healthy, having survived the transition from Spanish to English, and from Ramaytush to Spanish before that.</p>
<p>The Spanish naming influence remains, with much of the San Francisco peninsula taking its current names from when California was part of Mexico. This includes road alongside the Palo Alto redwood, which runs the length of the peninsula and is widely known as &#8216;the El Camino road&#8217;. The redundancy, obvious to Spanish speakers, is that &#8216;El Camino&#8217; more or less means &#8216;the road&#8217;, so &#8216;the El Camino road&#8217; literally translates to &#8216;the the road road&#8217;. </p>
<p>As it turns out, the world is full of places that are doubly-named. I have been working on computational methods for identifying locations in cross-linguistic text (don&#8217;t worry, I&#8217;m not going into the details of it) and came across many as an interesting aside. So I thought would share some of my favorites (with thanks to <a href="http://www.stanford.edu/~cgpotts/">Chris Potts</a> for pointing me towards a number of them):</p>
<dl><div id="attachment_721" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/gobi_desert_desert.jpg"><img src="http://www.junglelightspeed.com/files/gobi_desert_desert-300x200.jpg" alt="" title="gobi_desert_desert" width="300" height="200" class="size-medium wp-image-721" /></a><p class="wp-caption-text">The Gobi Desert (&#039;Desert Desert&#039;) Once ruled by King Genghis Kahn</p></div></p>
<dd> Sahara Desert (Large-Desert Desert), Arabic</p>
<dd> Lake Michigan (Lake Large-Lake), Chippewa</p>
<dd> Cuyahoga River (Crooked-River River), multiple Native American languages</p>
<dd> Himalayan Mountains (Mountains Mountains), Sanskrit</p>
<dd> Orkney Islands (Boar-Islands Islands), Gaelic</p>
<dd> Labrea Tar Pits (The The-Tar Tar Pits), Spanish</p>
<dd> The Rock of Gibraltar (The Rock of the-Rock-of-Tariq), Spanish/Arabic</p>
<dd> Minnehaha Falls (Waterfall Falls), Dakota </p>
<dd> Gobi Desert (Desert Desert), Mongolian</p>
<dd> Jirisan Mountain (Jiri-Mountain Mountain), Korean  </p>
<dd> Mississippi River (Great-River River), Chippewa   </p>
<dd> Rio Grande River (Big River River), Spanish </p>
<dd> Tiwai Island (Island Island), Mende</p>
<dd> Lake Tahoe (Lake Lake), Washo      </dl>
<p><div id="attachment_704" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/river_river_avon.jpg"><img src="http://www.junglelightspeed.com/files/river_river_avon-300x225.jpg" alt="" title="river_river_avon" width="300" height="225" class="size-medium wp-image-704" /></a><p class="wp-caption-text">Shakespeare&#039;s the River Avon, literally meaning &#039;River River&#039; (No bards were consulted)</p></div>Sometimes, the name survives beyond the language itself. Shakespeare&#8217;s &#8216;River Avon&#8217; literally translates to &#8216;River River&#8217;, as &#8216;Avon&#8217; literally means &#8216;river&#8217; in a pre-English Celtic language now called <a href="http://en.wikipedia.org/wiki/British_language_(Celtic)">Old Brythonic</a>: </p>
<dl>
<dd> River Avon (River River)
</dl>
<p>Like so many languages, there are no surviving written documents in Old Brythonic, with place names providing the greatest insight into the structure and nature of the language. Other places with Old Brythonic names include &#8216;Kent&#8217; meaning &#8216;border&#8217;, &#8216;Thames&#8217; meaning &#8216;dark&#8217;, and &#8216;Britain&#8217; translating as the mysterious &#8216;People of the Forms&#8217;.</p>
<p>But none of the names above compare to the magnificent <a href="http://en.wikipedia.org/wiki/Bredon_Hill">Bredon Hill</a>:</p>
<dl>
<dd> Bredon Hill (Hill-Hill Hill)
</dl>
<p>Yes, it trumps the others by being a <em>triply</em> named place. &#8216;Bre&#8217; is Celtic, possibly also Old Brythonic, and &#8216;Don&#8217; is <a href="http://en.wikipedia.org/wiki/Old_English_language">Old English</a>, which meant that sometime around 1000AD people were referring to this place as &#8216;Bre Don&#8217;. Despite modern English developing from Old English (no, really) somewhere along the way people forgot the &#8216;don&#8217; part and decided that people needed to be aware that this &#8216;Bredon&#8217; place was on a hill. </p>
<p><div id="attachment_705" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/bredon_hill_hill_hill.jpg"><img src="http://www.junglelightspeed.com/files/bredon_hill_hill_hill-300x225.jpg" alt="" title="bredon_hill_hill_hill" width="300" height="225" class="size-medium wp-image-705" /></a><p class="wp-caption-text">Bredon Hill, literally &#039;Hill-Hill Hill&#039; (Known to Romans as &#039;Collis Hill&#039;)</p></div> After some research on my part (ok, a few Wikipedia articles) I found that Bredon Hill is 981ft high. It used to be that 1000ft was the cut-off between a &#8216;hill&#8217; and a &#8216;mountain&#8217;. Do you remember the scene in <a href="http://en.wikipedia.org/wiki/The_Englishman_Who_Went_Up_a_Hill_But_Came_Down_a_Mountain">&#8216;The Englishman Who Went Up a Hill But Came Down a Mountain&#8217;</a> where Hugh Grant plays a cartographer who discovers, much to dismay of the folksy villagers, that their &#8216;mountain&#8217; was a few feet short of 1000ft and was therefore really a hill? And the later scene where they built up the hill to make it exactly 1000ft so that Hugh Grant would have to declare it a mountain? I don&#8217;t, because I haven&#8217;t seen the film &#8212; it looks terrible &#8212; but I&#8217;m sure that is what happened. </p>
<p>Now, see that tower behind the annoying dogs in the photograph on the right? It is called Parsons Folly and was built by John Parsons in the mid 1800&#8242;s. The tower is 19ft high, bringing the total height of Bredon Hill to exactly 1000ft. </p>
<p>Which means that &#8216;Hill-Hill Hill&#8217;, after more than 1000 years of insistent, repeated naming, stopped being a &#8216;hill&#8217; and became a &#8216;mountain&#8217; &#8230; making it the most wrongly named place on earth.</p>
<p><a href="http://www.robertmunro.com">Rob Munro</a><br />
May 28, 2012</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/doubly-named-places/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Take English, add awesome</title>
		<link>http://www.junglelightspeed.com/take-english-add-awesome/</link>
		<comments>http://www.junglelightspeed.com/take-english-add-awesome/#comments</comments>
		<pubDate>Thu, 22 Mar 2012 15:23:11 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Sociolinguistics]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=659</guid>
		<description><![CDATA[We may not get many more languages in the world. New languages usually form in linguistic isolation and in a connected world this possibility is greatly reduced. All Languages constantly evolve, so change is inevitable, but a language branching into two will be unlikely. This has mostly been true for English. While the Roman empire [...]]]></description>
				<content:encoded><![CDATA[<p>We may not get many more languages in the world. New languages usually form in linguistic isolation and in a connected world this possibility is greatly reduced. All Languages constantly evolve, so change is inevitable, but a language branching into two will be unlikely. This has mostly been true for English. While the Roman empire had collapsed for long enough that Latin branched to become French, Italian, Spanish, etc, there wasn&#8217;t enough time between the dismantling of the British empire and the proliferation of telecommunications, only distributing long enough for a few minor variations across the Americas, Asia, Africa and the Pacific; perhaps not even enough to be considered different dialects.  </p>
<p><div id="attachment_663" class="wp-caption alignleft" style="width: 193px"><a href="http://www.junglelightspeed.com/files/curious_pigeon.jpg"><img src="http://www.junglelightspeed.com/files/curious_pigeon.jpg" alt="" title="curious_pigeon" width="183" height="221" class="size-full wp-image-663" /></a><p class="wp-caption-text">No, not you.</p></div>An exception to the absence of new languages can be found in Creoles. When speakers of different languages come together and need to communicate they will often create a very simple vocabulary and grammar for interacting, known as a <a href="http://en.wikipedia.org/wiki/Pidgin" target="_new">Pigin</a>. Sometimes it will be based on one language, or sometimes a blend of two like &#8216;<a href="http://en.wikipedia.org/wiki/Russenorsk" target="_new">Russenorsk</a>&#8216; a pigin drawing from both Russian and Norwegian vocabulary to enable trade in the fishing industry, with about 400 words; mostly fish-related words (there were few famous Russenorsk poets). When speakers maintain separate primary languages then the trade languages often remain a pigin. When the speakers adopt the pigin as a primary language, it quickly develops into a Creole &#8211; a full mixed-language &#8211; and achieves independent language status in just a generation or two, often retaining the name &#8216;Creole&#8217; while continually evolving. </p>
<p><div id="attachment_662" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/russenorsk.jpg"><img src="http://www.junglelightspeed.com/files/russenorsk-300x247.jpg" alt="" title="russenorsk" width="300" height="247" class="size-medium wp-image-662" /></a><p class="wp-caption-text">The renowned Russenorsk Poet, Radost Severormorsk, wooed his wife with his ballad &quot;I enjoy you like a herring enjoys krill&quot;. Her joy is clear.</p></div> This has happened to English a number of times. I&#8217;ve just returned from Sierra Leone where the linga franca is <a href="http://en.wikipedia.org/wiki/Sierra_Leone_Krio_language" target="_new">Krio</a>. As a linguist it has always been fascinating to listen to such a familiar sounding language, especially when you listen closely and realize that you understand almost nothing that is said. When I l<a href="http://www.junglelightspeed.com/the-smallest-signal/">ast blogged about communication in Sierra Leone I spoke of drop-dialing</a>, or &#8216;flashing&#8217;, which is still alive and well. But when back there it seemed unfair to be writing only about non-verbal communication, so I thought I&#8217;d add an article about Krio.</p>
<p>In North America in the 17th century, slaves from West and Central Africa begun learning an English-based pigin that quickly evolved into a Creole used throughout the region. It can still be found today in <a href="http://en.wikipedia.org/wiki/Jamaican_Patois">Jamaican Patois</a> and, as I stumbled upon by accident a few years ago, in the islands off Belize (although the standard greetings are different, which I also stumbled over accidentally, to everyone&#8217;s confusion). There are even a few speakers left in the USA (now all elderly) in South Carolina and Georgia where it is called <a href="http://en.wikipedia.org/wiki/Gullah_language">Gullah</a>. For years people thought that Gullah was simply an impoverished English, not realizing that it was a full, rich language, one of the few to ever have been derived from English, and a unique window into one of the most important times in America&#8217;s history.</p>
<p><div id="attachment_670" class="wp-caption alignleft" style="width: 310px"><a href="http://www.junglelightspeed.com/files/IMG_2185.jpg"><img src="http://www.junglelightspeed.com/files/IMG_2185-300x224.jpg" alt="" title="IMG_2185" width="300" height="224" class="size-medium wp-image-670" /></a><p class="wp-caption-text">A Krio house in York on the Freetown Peninsula</p></div>Freetown, the capital of Sierra Leone, was primarily settled by freed slaves from the Americas, also known as &#8216;Krios&#8217;. It draws on influences from dozens of languages, including many West African and European languages and (to confuse things a little) a separate English-based trade pigin that was already in use in West Africa. To this day, the Krios are recognized as both an ethnic and linguistic group in their own right. There is even a certain style of &#8216;<a href="http://www.chronicleworld.org/archive/krioarch.htm">Krio architecture</a>&#8216;. The language is widely spoken across the region, helped greatly by being the language of choice for much of the music broadcast across the nation (see below).</p>
<p>That new languages can spring up so quickly from pigins, and that they come to independently develop grammars so similar to existing languages, has been the source of much debate among linguists, especially as evidence for the possible universal qualities of human language and its cognitive roots. While the jury is still out on the grammar, all linguists agree on the universal that the most interesting idioms in every language are to do with sex, relationships and insults (or all three). So I&#8217;m taking the chance to add 10 favorites to my previous report on &#8216;flashing&#8217; (from memory so excuse the spellings):</p>
<p>1. <em>id de han bag.</em> A little hand-bag (one of those tiny just-under-the-armpit ones). It took me a while to work out the insult- if you go out only carrying a very small handbag then you are expecting a man to pay for everything. In other words, you use this phrase to acuse someone of being a bit prostitutey. Sounds odd, I know, but I was able to use the fallout from someone using this phrase to negotiate a better rental deal for our NGO there, <a href="http://www.energyforopportunity.org/en/home/">Energy for Opportunity</a> (long story).</p>
<p><div id="attachment_671" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/IMG_2055.jpg"><img src="http://www.junglelightspeed.com/files/IMG_2055-300x224.jpg" alt="" title="IMG_2055" width="300" height="224" class="size-medium wp-image-671" /></a><p class="wp-caption-text">The view from my balcony office in Aberdeen, Freetown. On the bottom right is my neighbor singing loud covers of Elton John hits in his underwear. He sung mainly in English and was half Russian ... I should have asked if he new Russenorsk</p></div>2. <em>e geh blue tooth</em>. He/she has blue tooth (&#8216;e&#8217; means &#8216;he&#8217; or &#8216;she&#8217;). Internet is not widely used, so data is transfered by sms, memory stick, or (most openly and dangerously) via blue-tooth. So if you accuse someone of having blue-tooth (man or woman) you are accusing them of being too &#8216;open&#8217; to others. In other words, a slut. (I love that this term is based on a data-transfer analogy).</p>
<p>3. <em>keep a stick behind door</em>. As it sounds. Wild (or wild-ish) dogs can be a problem, so it&#8217;s a good idea to keep a stick handy by the door when you go out, just in case. The implication has nothing to do with dogs, of course &#8211; it means that a backup boyfriend or girlfriend should be kept at the ready.</p>
<p>4. <em>take more than one man to fill box</em>. Similar to the previous but just about keeping a spare (or extra) man around. Don&#8217;t make me spell it out&#8230;</p>
<p>5-8. <em>dry, super slim, straight cut, old stock</em>. Female figure types, meaning (in order): too thin/leathery, attractive, elegant, and (a friend carefully explained) someone who has &#8216;earned their body&#8217;. </p>
<p>9. <em>boxer</em>. Someone who is tight with their cash &#8211; the closed fist of a boxer not letting go of money.</p>
<p>10. <em>greased</em>. Damaged. I can&#8217;t work out the history of why &#8216;greased it&#8217; means &#8216;damaged it&#8217;, but it&#8217;s just so much fun to use: &#8220;I greased the car&#8221;, &#8220;I greased the test&#8221;, etc.</p>
<p>I&#8217;m sure I&#8217;m barely scratching the surface. Beyond the idioms, the grammar patterns differently too. One thing that stood out was the use of &#8216;for&#8217; as an infinitive. So instead of saying &#8220;I want to go&#8221; you would say &#8220;I want for go&#8221;. The &#8216;to&#8217; / &#8216;for&#8217; alternation is pretty arbitrary, but for some reason the little changes in the actual grammar stood out more than the changes in the words themselves. You can see this in (Lady) Laurish&#8217;s &#8216;Lose your love&#8217; video clip<br />
where she says &#8220;so try for understand I don&#8217;t want for lose your love&#8221;</p>
<p><iframe width="420" height="315" src="http://www.youtube.com/embed/NeDZ5-BQ2fs" frameborder="0" allowfullscreen></iframe></p>
<p>It&#8217;s a pretty good intro to Krio, although she speaks more on the English side of things.<br />
For a more deep Krio, check out the Krio figures of speech:</p>
<p><iframe width="560" height="315" src="http://www.youtube.com/embed/_wW2oCHFjsk" frameborder="0" allowfullscreen></iframe></p>
<p>Did you notice the (cleaner) variation on the &#8216;stick behind door&#8217; idiom? Don&#8217;t believe for a second that&#8217;s the first thing that people will think <img src='http://www.junglelightspeed.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>And just to balance out the languages, here is Sorie Kondi singing &#8220;Without Money, No Family&#8221;</p>
<p><iframe width="420" height="315" src="http://www.youtube.com/embed/5o8dl-E9K_I" frameborder="0" allowfullscreen></iframe></p>
<p>The first verse is sung in Loko, the second in Krio, and the third in﻿ Temne. There are at least a dozen commonly spoken languages across the country, all with their unique ways to insinuate and insult &#8211; wish I could share more about them all!</p>
<p>Rob<br />
22 Mar 2012</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/take-english-add-awesome/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Published!</title>
		<link>http://www.junglelightspeed.com/published/</link>
		<comments>http://www.junglelightspeed.com/published/#comments</comments>
		<pubDate>Tue, 20 Dec 2011 21:12:26 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Fieldwork]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[Linguistics]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=632</guid>
		<description><![CDATA[It&#8217;s out! After ten years of publishing only at conferences (which is more standard in computer science) I have an actual journal article: Robert Munro, Rainer Ludwig, Uli Sauerland and David Fleck. 2012. Reported Speech in Matses: perspective persistence and evidential narratives. International Journal of American Linguistics. 78:1, 41-75. (http://www.jstor.org/pss/10.1086/662637) I wrote a previous blog [...]]]></description>
				<content:encoded><![CDATA[<p>It&#8217;s out! After ten years of publishing only at conferences (which is more standard in computer science) I have an actual journal article:</p>
<p>Robert Munro, Rainer Ludwig, Uli Sauerland and David Fleck. 2012. Reported Speech in Matses: perspective persistence and evidential narratives. <em>International Journal of American Linguistics</em>. 78:1, 41-75. (<a href="http://www.jstor.org/pss/10.1086/662637">http://www.jstor.org/pss/10.1086/662637</a>)</p>
<p><div id="attachment_639" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/amazon_dugout.jpg"><img src="http://www.junglelightspeed.com/files/amazon_dugout-300x224.jpg" alt="" title="amazon_dugout" width="300" height="224" class="size-medium wp-image-639" /></a><p class="wp-caption-text">On the way to work</p></div>I wrote a <a href="http://www.junglelightspeed.com/fieldworking/" title="Fieldworking">previous blog entry about one day of fieldwork</a> so I&#8217;m going to devote this one to the combination of linguistic theory and travel that got me there. How exactly did I end up spending eight days traveling to study just one aspect of such a remote language? </p>
<p>Noam Chomsky. </p>
<p>Of course.</p>
<p>Along with two other language researchers he published <a href="http://www.sciencemag.org/content/298/5598/1569.short">an article in <em>Science</em> that suggested that every human language (and no non-human language) contained recursion</a>. It is an issue that goes right to the heart of human existence &#8211; what mental capacities separate us from other animals (if any) and how is this manifested in language, the most complex interaction of cognitive and social action?</p>
<p>For Chomsky, the primary distinguishing feature is recursion. Recursion, put simply, means any process or structure that self-embeds. The idea that human languages have recursion is easy to demonstrate. In fact, I just did. The last sentence had another one embedded in it. That is, &#8220;human languages have recursion&#8221; was embedded within &#8220;the idea is easy to demonstrate&#8221; (I know, very meta). The theory can be disproved in two ways: finding an animal-language with recursion or finding a human language without recursion.</p>
<p><div id="attachment_643" class="wp-caption alignleft" style="width: 252px"><a href="http://www.junglelightspeed.com/files/images.jpeg"><img src="http://www.junglelightspeed.com/files/images.jpeg" alt="" title="images" width="242" height="208" class="size-full wp-image-643" /></a><p class="wp-caption-text">Unlike the European Starling, no-one doubts the recursion of birds in turducken.</p></div>Proving that only humans have recursion in their language has to led to a lot of monkey-time, with people coercing poor apes in to admitting &#8220;the idea that bananas are very tasty is correct&#8221; (or something along those lines). There have also been some very heated debates with <a href="http://www.nature.com/nature/journal/v440/n7088/abs/nature04675.html">a counterargument in <em>Nature</em> that claimed that the European Starling has a recursive song</a>. Yes, grown men and women have argued about the recursive nature of bird-songs.</p>
<p>Proving that <em>all</em> human languages have recursion means studying every possible outlier. Not every language embeds sentences like those above and would be limited to sequential ordering. For example: &#8220;humans languages have recursion; this idea is easy to demonstrate&#8221;. You could say (arguably) that this chain is purely sequential and not recursively embedded. However, it was thought that you could always embed other people&#8217;s utterances within your own. For example, you could say, in any language, something like &#8220;Rob said that bananas are tasty yesterday&#8221;. It is often on the existence of this type of &#8216;reported speech&#8217; that the recursion claim is tested.</p>
<p>Far from the European Starling, the Matses have been living in the Peruvian and Brazilian Amazon for &#8230; well, no one really knows how long. They were living their traditional lives until the 1960s. As they mainly kept to non-navigable headwater regions deep in the forest, they were largely left alone, but throughout the 1900s they had increasingly been fighting the governments of each country. In the late 60s and early 70s they took amnesty, with most of the Matses population taking refuge at two missions, one each in Peru and Brazil. The rest of the Matses are still &#8216;uncontacted&#8217; although it&#8217;s not a very accurate word as they regularly bump into the &#8216;contacted&#8217; people of the region as each go about hunting trips (<a href="http://newswatch.nationalgeographic.com/2011/04/01/uncontacted-tribes-the-last-free-people-on-earth/">the well-known <em>National Geographic</em> photos might be Matses as it is the same valley</a>). In the 1980s the Matses left the missions and resettled along rivers in the region. Their language was studied for the first time during this period.</p>
<p>Cue David Fleck (one of my coauthors). He moved to the region and in 2003 he wrote a paper that mentioned, in passing, that Matses only had direct speech. Direct speech is simply when you quote someone (near) verbatim. Imagine that yesterday I told you: &#8220;I will go there tomorrow&#8221;. You could then quote me directly:</p>
<dl>
<dd>Rob said &#8220;I will go there tomorrow&#8221;</dd>
</dl>
<p>In English (and people have theorized, every language) you can also use &#8216;indirect speech&#8217;, and rephrase what I said from your own point of view:</p>
<dl>
<dd>Rob said he is coming here today.</dd>
</dl>
<p><div id="attachment_649" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/recording.jpg"><img src="http://www.junglelightspeed.com/files/recording-300x224.jpg" alt="Working" title="recording" width="300" height="224" class="size-medium wp-image-649" /></a><p class="wp-caption-text">Working with David (left) and Abraham (center) in our &#039;office&#039;. Note my collared shirt - I don&#039;t care where we are, people, this is still a work day!</p></div>With indirect speech you can alternate everything to your own personal, temporal, spatial and directional point of view. &#8220;I&#8221;->&#8221;he&#8221;, &#8220;will _&#8221;->&#8221;-ing&#8221;, &#8220;tomorrow&#8221;->&#8221;today&#8221;, &#8220;there&#8221;->&#8221;here&#8221;, and &#8220;go&#8221;->&#8221;come&#8221;. In this example, not a single word is actually repeated in the report. You do this all the time without even thinking about it. Indirect speech is considered to be a form of recursion, while direct speech is not. This is because for direct speech, the part you are repeating is a like a single unalterable unit. We can see this by manipulating the sentences a little. For example, you can paraphrase only in indirect speech:</p>
<dl>
<dd>Rob said that he would be rocking up around now</dd>
</dl>
<p>You can say this without people thinking that I really say silly expressions like &#8220;rock up&#8221;, but you could not rephrase it in the case of direct speech. For indirect speech, too, you can also &#8216;extract&#8217; some of the embedded sentence in ways that are not possible for direct speech. This is most obvious in questions:</p>
<dl>
<dd>Where did Rob say he was going tomorrow?</dd>
</dl>
<p>In linguistics, we usually treat the &#8216;where&#8217; as being extracted from the embedded sentence. This is not possible with direct speech:</p>
<dl>
<dd>Where did Rob say &#8220;I am going tomorrow?&#8221;</dd>
</dl>
<p>This doesn&#8217;t sound like it is from Rob&#8217;s point of view. It actually makes it sound like indirect speech that is talking about where <em>you</em> are going. In other contexts/configurations, it would sound completely ungrammatical. There are a few other well-documented differences, too, but they are more technical/linguistic, so I won&#8217;t going into them here.</p>
<p>Matses is one of the languages that does not generally allow one sentence to be embedded within another, so the test for whether it did not contain recursion relied on whether it really did only possess direct speech.</p>
<p><div id="attachment_650" class="wp-caption alignleft" style="width: 310px"><a href="http://www.junglelightspeed.com/files/matses_football.jpg"><img src="http://www.junglelightspeed.com/files/matses_football-300x225.jpg" alt="Estiron vs San Roce. Yes, it really is the world sport." title="matses_football" width="300" height="225" class="size-medium wp-image-650" /></a><p class="wp-caption-text">Estiron vs San Roce. Yes, it really is the world sport.</p></div>As a quick aside, it needs to be pointed out that the lack of sentence embedding has nothing to do with overall complexity. For example, Matses has an obligatory evidential system that marks each sentence for how the speaker came to know about an event. In English, we obligatorily mark verbs for tense: &#8220;I kick-ed&#8221;, &#8220;I am kick-ing&#8221;, &#8220;I will kick&#8221;. Not all languages require this, like Chinese, where you encode the time separate from the verb. In English, we code evidentiality separately. For example, you might say &#8220;I saw him kick the ball&#8221;, &#8220;I inferred from seeing the ball move that he kicked it&#8221;, &#8220;I guess he kicked the ball&#8221;, &#8220;someone told me he kicked the ball&#8221;. In Matses, these form part of the verb suffix. There is an obligatory suffix that expresses that the speaker either knows something from direct observation, by inference, or that it is conjecture (if you know because someone told you, you have to quote them). Put simply, you can&#8217;t create a grammatical sentence in Matses without first considering how you came to know about the information you are sharing. It&#8217;s actually a little more complicated than just that, as there are different suffixes for different evidential/tense combinations, and more complicated than English in this, as there are also different past tenses for recent past, distant past and remote past events. To borrow the suffix into English, you might say something like &#8220;he kick-ond the ball&#8221; (I saw him kick the ball some time ago) or &#8220;he kick-ak the ball&#8221; (I infer that he recently kicked the ball). If fact, it is even <em>more</em> complicated than that, too, but it took 10 pages to explain in the paper so I&#8217;m going to leave it out here. It is a unique and beautiful language, just one that doesn&#8217;t allow productive embedding.</p>
<p>Back to recursion and how it took me to the Amazon.</p>
<p>A researcher in Europe looks up from his paper (a fascinating treatise on bird-song structure) to see an email from David Fleck, inviting him to study the Matses. The researcher couldn&#8217;t make it, but it bounced through other people and the invitation found its way to Uli Sauerland. Uli was a visiting scholar at Stanford at the time and while he could not himself make it, due to other fieldwork commitments, his student back in Germany, Rainer Ludwig, was keen to go (the four of us rounding out the authors on the final paper). </p>
<p><div id="attachment_654" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/rainer.jpg"><img src="http://www.junglelightspeed.com/files/rainer-300x225.jpg" alt="Rainer and Daniel" title="rainer" width="300" height="225" class="size-medium wp-image-654" /></a><p class="wp-caption-text">Rainer (left) with Daniel, who was our most active language consultant and helped us run many of the experiments.</p></div>Uli sent out an email to Stanford&#8217;s Linguistics Department asking if anyone wanted to go in his place. I focussed carefully on writing a reply as quickly as possible, balancing out the need to make a solid case to be accepted, with wanting to be the first to reply (assuming this would count in a tie-breaker). It was my most nervous part of the whole project. After frantically reading and making many tiny edits to the email, I tentatively hit send, fingers crossed that I would be invited. I followed this up by making excuses to lurk around Uli&#8217;s office over the next few days. Eventually we talked and he agreed that I could go (to this day, I&#8217;m not sure if anyone else even expressed interest).</p>
<p>After stashing my bicycle in Lima (I followed my fieldwork by <a href="http://www.robertmunro.com/bike/peru/">cycling across the Andes</a>) I met Rainer in Iquitos, clutching a print-out of David&#8217;s instructions. David&#8217;s email was all in capital letters and was a long list of instructions for how to get to Estiron, the village he was based in: negotiating passage on a cargo plane from Iquitos, the nearest city, to Angamos, the nearest village where a small craft could land; finding someone with a dugout canoe; obtaining enough fuel for an attached motor; and how to first contact him via short-wave radio when close. (I keep meaning to ask David why the email was all-caps &#8211; he&#8217;s out of contact again right now). I kept the print out close for the week it took us to get there. ALmost everything had changed in the time since he sent it, but we were able to piece everything together.</p>
<p><div id="attachment_651" class="wp-caption alignleft" style="width: 310px"><a href="http://www.junglelightspeed.com/files/estiron1.jpg"><img src="http://www.junglelightspeed.com/files/estiron1-300x225.jpg" alt="Estiron" title="estiron" width="300" height="225" class="size-medium wp-image-651" /></a><p class="wp-caption-text">Estiron</p></div>Iquitos is the world&#8217;s largest city that cannot be reached by land. Sitting where two rivers join to form the Amazon, the only way in or out is by plane or boat. Some 4000kms from the ocean, the river is still almost too wide to see across. We spent several days here negotiating our way onto a boat-plane (including two days just sitting beside it waiting for the weather to clear). Short-wave radio contact with the village was complicated. The Matses only turned on the radio for an hour a day and because of the storms (I think) there was a lot of static. I can&#8217;t remember if it was in Iquitos or when we tried again in Angamos, but after several attempts and a lot of slow-talk-shouting that only received static in reply, we heard one sentence come through clearly &#8220;we are expecting you&#8221;. It had been more than a year since David had first sent the email, so with this small warning of our arrival and invitation we were on our way.</p>
<p>Angamos was more developed than I anticipated &#8211; it had a long paved walking path along a line of well-built houses. It was a border town, although I never saw the sister village which was down river a little way on the Brazilian side. After a few more days of collecting enough fuel from different villages, and a guide/boatman in Alleandro, we set off in our dugout for the 10 hours to Estiron. It was the rainy season and the rivers were flooded, so we kept near the shore to avoid the stronger currents. One small hole in the bow lapped water into the boat and we sat low, but it was sturdy enough. I was comfortable enough to nap, safe in the knowledge that Rainer was keeping everything afloat through sheer willpower. </p>
<p><div id="attachment_657" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/kid_parrot_monkey.jpg"><img src="http://www.junglelightspeed.com/files/kid_parrot_monkey-300x225.jpg" alt="Kid, parrot and monkey" title="kid_parrot_monkey" width="300" height="225" class="size-medium wp-image-657" /></a><p class="wp-caption-text">Kid, parrot and monkey in a huts doorway.</p></div>Eight days from when I left San Francisco, and six from when I arrived in the Amazon, we finally rocked up in Estiron, our main fieldsite. The village is nestled beside two bends of the Chobayacu Creek, with two rows of thatch huts near the water and several more scattered back into the forest. The area is surrounded by farms, but they are not visible from the water or village, and sometimes not even when you are in them &#8211; they are partially-cleared pockets of forest planted with a scattering of fruit and yams. Between the farms, fish, and a regular supply of bush-meat, no-one had to work very hard to stay healthy. A few found-objects aside, the huts were made according to traditional practices &#8211; wood and thatch from the forest. Most of the huts were large airy buildings low to the ground (we set up home and office in one) but a handful of smaller ones were on longer stilts several meters in the air. It was explained to us that this was a recent fashion for some of the younger Matses (I recently built a raised platform in my own sunroom &#8211; I think I get it). The football field and volleyball net were also more recent additions. </p>
<p>So did we discover whether or not Matses contained recursion? I think we did find an answer, but our paper leaves it to the reader, simply laying out our observations, examples and experimental methods. You are welcome to read it and decide for yourself!</p>
<p>Rob<br />
December 20, 2011</p>
<p>Acknowledgments: Thanks to the Characterizing Human Language by Structural Complexity (CHLaSC) Project, who funded much of the research, and <a href="http://people.ucsc.edu/~ardeal/Deal-indexicals.pdf">Amy Rose Deal of Harvard</a> for citing us (already!)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/published/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Open datanarchy</title>
		<link>http://www.junglelightspeed.com/open-datanarchy/</link>
		<comments>http://www.junglelightspeed.com/open-datanarchy/#comments</comments>
		<pubDate>Mon, 07 Nov 2011 03:39:55 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Crisis Response]]></category>
		<category><![CDATA[Crowdsourcing]]></category>
		<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://www.junglelightspeed.com/?p=605</guid>
		<description><![CDATA[Open data is increasingly claimed to be &#8220;democratizing&#8221;. It is not clear to me where the &#8220;democracy&#8221; part is. If 99% of people decide to keep information private but 1% person disagrees, that 1% can still make that information publicly available. This is more like anarchy. There is a place for anarchy in the world [...]]]></description>
				<content:encoded><![CDATA[<p>Open data is increasingly claimed to be &#8220;democratizing&#8221;. It is not clear to me where the &#8220;democracy&#8221; part is. If 99% of people decide to keep information private but 1% person disagrees, that 1% can still make that information publicly available. This is more like anarchy. There is a place for anarchy in the world &#8211; it is freedom at its most extreme. I am an overwhelming proponent of open data, but the price of data freedom is data vigilance.</p>
<p>The phrase &#8220;democratizing data&#8221; came up more than once at the recent &#8220;<a href="http://blog.peoplebrowsr.com/blog/?p=1514" title="Big Open Data">Big Open Data Panel</a>&#8221; panel at PeopleBrowsr Labs and on the &#8220;<a href="http://www.crowdconf.com/">Philanthropy Panel</a>&#8221; at CrowdConf, where I was a panelist, and the &#8220;<a href="https://www.rightscon.org/">Silicon Valley Humans Rights Conference</a>&#8220;, where I was a regular participant. By &#8220;democratizing&#8221;, people simply meant &#8220;publishing online&#8221;, but by calling it &#8220;democratizing&#8221; it carries the implication of inherent good (and more divisively, that any opponent to publishing data is somehow non-democratic.) I question whether many people calling for open data really have the resources to also support the needed vigilance, or simply use the &#8220;democratizing&#8221; tag to absolve themselves from the consequences of publishing or republishing information.</p>
<p><div id="attachment_618" class="wp-caption alignright" style="width: 310px"><a href="http://www.junglelightspeed.com/files/occupy_oakland.png"><img src="http://www.junglelightspeed.com/files/occupy_oakland-300x169.png" alt="Occupy Oakland" title="Occupy Oakland" width="300" height="169" class="size-medium wp-image-618" /></a><p class="wp-caption-text">Occupy Oakland - this is what democracy looks like? Or this is what it looks like when people are no longer sure they are in a democracy</p></div>Returning to the 1% who make data public when the 99% disagree, that 1%-person can take part in further open data sharing with the others among the 1%, and the 99% can simply opt out of that channel of information. I joined the Occupy Wall Street protests in Oakland following <a href="http://articles.sfgate.com/2011-10-28/news/30335416_1_protesters-canister-police-and-hundreds">the shooting of a former marine by Oakland Police</a>. I hadn&#8217;t been closely involved with Occupy Wall Street until that point, but the decision to turn weapons upon a peaceful protest (especially against someone who had served two tours of duty for their country) was too much to ignore. When I was there, I wondered how much the 99% had previously opted out of financial channels long ago? Did everything get too far gone (at least in part) because the 1% had become so egregious and removed that the 99% had let them operate unchecked for too long?</p>
<p>Perhaps the same is happening with open data online. Cisco predicts that <a href="http://socialtimes.com/cisco-predicts-that-90-of-all-internet-traffic-will-be-video-in-the-next-three-years_b82819">90% of all web traffic will be video in the next three years</a>. Let&#8217;s see who is democratizing it:</p>
<dl>
<dd><a href="http://blog.radvision.com/videooverenterprise/2010/08/03/voting-for-video-with-our-webcams/">&#8220;the democratization process of video is ChatRoulette&#8221;</a> Radvission.</p>
<dd><a href="http://www.thefastertimes.com/theweb/2010/03/12/chatroulette-breaks-fifth-wall-content-production/">&#8220;ChatRoulette represents a true breakdown and symbolic revolution of the relationship between content producers and consumers&#8221;</a> Faster Times.
</dl>
<p>ChatRoulette, for those who don&#8217;t know it, is a very simple idea: it randomly and anonymously connects you to other users via a web-cam and instant messaging. Have you seen ChatRoulette lately? This is not what democracy looks like. If you have not seen it before, check out this <a href="http://www.youtube.com/watch?v=JTwJetox_tU">old(ish) video of Merton in hooded top singing and playing piano to random people on ChatRoulette</a> and take my word for it that the current user-base is not about partially concealed pianists &#8211; a very specific 1% has taken control of this channel. Chatroulette, and internet video as a whole, did not &#8216;democratize&#8217; video &#8211; it became &#8216;voyeur takes all&#8217;.</p>
<p>The only people to openly admit to me recently that they used &#8220;democratizing data&#8221; in a less than noble way were advertisers, confessing that &#8220;democratizing data&#8221; (to them) mostly meant trying to coerce Facebook into make it easier for their start-up to scrape and sell data. The advertising community has been capitalizing on big data for some time (more than <a href="http://www.quora.com/What-percent-of-Googles-revenue-comes-from-search-advertising">95% of Google&#8217;s revenue</a> is advertising targeted via big-data analytics) and they seem to be ahead of the curve. For them, it is not about democracy but simple capitalism &#8211; personal gain through someone else&#8217;s data. Respect my privacy, and more power to you. </p>
<p>The more serious problems arise when data can be used to harm an individual. I have lost count of how many &#8220;open data&#8221; or &#8220;information sharing&#8221; technologies have been enthusiastically called a &#8220;Swiss Army Knife&#8221;, followed by a list of many positive use cases. A Swiss Army Knife can be used to harm you in many more ways than it can be used to help you &#8211; it is a weapon. With the proliferation of cellphones and information sharing tools like Drupal, Twitter, Ushahidi and WordPress, anybody with a little technical knowledge can share masses of data. But a little knowledge is a dangerous thing, and the ease of use of many of these platforms means that we are sending people off to battle with inadequate weapons training.</p>
<p>I have also lost count of how many people have come to me over the last year asking for help with a real or planned map to document a crisis that they were passionate about. It has been from people world-wide, but not a democratic mix. People who launch crisis-maps are overwhelmingly the same demographic as those on Chatroulette: excited young men with an internet connection. The deployments might serve and connect some of the least resourced people in the world, but they are not being curated by them. I try to give the same advice in all cases where there is a real element of physical danger: constantly review all your data in light of changing conditions; remove anything that is dangerous or irrelevant; and if you do not have the resources to constantly monitor and reevaluate what you have already published, discontinue your service. The overwhelming majority listen, and most of them decide that they cannot meet this requirement, instead serving their communities in more direct ways.</p>
<p><div id="attachment_619" class="wp-caption alignleft" style="width: 310px"><a href="http://www.junglelightspeed.com/files/bigdata_revolution.jpg"><img src="http://www.junglelightspeed.com/files/bigdata_revolution-300x164.jpg" alt="Big Data Revolution" title=""Big Data Revolution" width="300" height="164" class="size-medium wp-image-619" /></a><p class="wp-caption-text">Left: the 1% of the revolution - open victory on tank. Right: the 99% of the revolution - a family huddled in the dark, trying to determine if the gunshots are coming closer. The only guarantee that the 99% have from open data is to shine a spotlight on them - would you?</p></div>It is common for someone to be taken from their home and killed in a conflict and for the cause to never be known. The recent to bloggers in Mexico, whose bodies were deliberately displayed with their social media handles, are the exception. We have to assume that contributing to social media leads to targeted deaths much more frequently. </p>
<p>The victims in Mexico knew that they were taking calculated risks. Open data means that someone could contribute to an open platform without even realizing it &#8211; someone else could take their words/reports and add this to an open platform, making them oblivious collaborators. Connecting with open data is uncertain &#8211; it can bring help or it can bring enemies. There is only one guarantee in publishing your information to open, social media in a conflict situation: it shines a splotlight on you. If you choose to publish information from/about someone in a conflict zone, you are shining a spotlight on them too. Republishing simply makes that spotlight brighter. The 1% of the revolution is a celebration on a now-still tank. The 99% of the revolution is huddling in the dark with your family close, trying to determine if the gunshots are coming closer. </p>
<p>Oblivious collaboration exists everywhere. There is no doubt that I am an oblivious collaborator with the advertising agencies mentioned above, looking to increase their market share having scraped information from my Twitter account or this site. I don&#8217;t care much about this context.</p>
<p>I am leading the construction of the largest humanitarian open data project to date &#8211; <a href="http://strataconf.com/stratany2011/public/schedule/detail/21499">EpidemicIQ is currently processing about 1 billion data points per day</a>, almost all of which are from open data. We do not yet republish open data &#8211; it is the struggle of coming to terms with the complexities of open humanitarian data at this scale that led me to write this article. </p>
<p>Take one example report: &#8220;a young girl from village X was treated for Y&#8221;. It is anonymous to me. If it is published openly, but only in medical circles, then she remains anonymous. If we republish this somewhere that people in village X will read, it might not be &#8211; perhaps only one young girl from the village was hospitalized at that time, so they will know who she is. Should we republish? What if the people in village X have been known to harm people with disease Y, because of a mixture of fear of disease and traditional beliefs? I have seen all these factors line up more than once. At the recent <a href="http://strataconf.com/stratany2011">Strata big data conference in New York</a>, a wealthy CEO insinuated that people were cowards for not republishing aggregated open data for fear of the legal implications. I don&#8217;t fear lawyers. I don&#8217;t fear billions of data points. I consistently worry about balancing the need to share information with the privacy and well-being of this girl, and many like her who are now oblivious collaborators in a global outbreak monitoring system.</p>
<p>Oblivious collaborators in conflict situations are a greater concern. This is not a fringe problem &#8211; 30% of the world (about 2 billion people) live a conflict zone or a transitional situation. For obvious reasons, these are the most recent people to join the connected world meaning that the <i>least</i> experienced populations now accessing social media are also the most vulnerable. We saw this with the recent <a href="http://blog.standbytaskforce.com/libya-crisis-map-report/">Libya Crisis-Map</a> that was commissioned by the United Nations Office for the Coordination of Humanitarian Affairs (UN OCHA) and initially implemented by the Stand By Task Force (SBTF), of which I&#8217;m a co-founder (full disclosure, I am also co-author of the <a href="https://docs.google.com/document/d/12meslH-Bo1WTnP3Y9dye-rsNJmHC6BzqRIFlLrzc3cE/edit?hl=en_US">SBTF Libya report</a> which I&#8217;ll be quoting). </p>
<p>The Libya Crisis Map aggregated information, primarily from traditional and social media, about the (then) mounting crisis in Libya, in order to support intelligence gathering by the UN in the leadup to their deployment. The feedback was positive:</p>
<dl>
<dd>&#8220;If you go back a couple of years, all of this information probably would have been available, but it would have been seen as noise coming at you in multiple formats &#8230; Libya Crisis Map has done an extraordinary job to aggregate all of this information.&#8221; Brendan McDonald, UN OCHA
</dl>
<p>But part-way through the deployment, UN OCHA decided to make the map public. This was a case of the 1% making a decision without the 99%. The people who submitted and structured the reports were not asked if they wanted to make the map public. The majority were not even informed. A compromise was reached where only partial and/or obfuscated data was published on the public-facing map. For fear of security, the public map still drove away the most important volunteers &#8211; those with knowledge of Libya. In their rush to show the world that they were using crowdsourcing technologies, the UN excluded and endangered the crowd.</p>
<p>The UN OCHA response to this in the <a href="https://docs.google.com/document/d/12meslH-Bo1WTnP3Y9dye-rsNJmHC6BzqRIFlLrzc3cE/edit?hl=en_US&#038;pli=1">Libya Crisis Map Report</a> was unrepentant:</p>
<dl>
<dd>&#8220;why not allow full text of tweets already available? &#8230; if it is already fully available on the web&#8221; Information Management Unit, UN OCHA, Libya Crisis Map Report.</p>
<dd>(re withholding/obfuscating information) &#8220;Bad instruction. All this became available on the web very quickly &#8230; belligerents know where camps and exit routes are, there is no security risk from this appearing on one more site on the web.&#8221; Information Management Unit, UN OCHA, Libya Crisis Map Report.
</dl>
<p>I don&#8217;t think it is productive to be so absolutist about something we know so little about, especially in big data&#8217;s <i>first</i> public use in a conflict setting. There are two clear reasons why publishing all information is dangerous:</p>
<p>1) You are showing your hand. Let&#8217;s say the bad guys know all the details that you do, and many more. If you have missed somewthing, they now know that you don&#8217;t know: they know where to target. </p>
<p>2) You are creating oblivious collaborators. It is one thing for someone to tweet &#8220;there are many gunshots here, I wonder why&#8221;, but it is another for someone to aggregate this with reports of violence in the same area, using this tweet as further evidence, and publishing both together next to a logo of an organization considered to be an enemy. (This actually happened, but I&#8217;ve deliberately changed the wording). From the analogy above, you are turning the spotlight on that person up to 100. (Unless they don&#8217;t know about it, which is more like keeping them in the dark while giving all soldiers night-vision goggles).</p>
<p>There are two more reasons, both of which come from being in a newly connected world:</p>
<p>1) Not all bad guys will otherwise be resourced to collect data from disparate sources. Even if the information is open, if it is spread across dozens or 100s of information points across the web, it takes a sizable operation to collect this information. Some bad guys belong to complex large networks that might be able to scrape and parse all this information. Most are just opportunistic but they might now have an internet connection. Publishing aggregate, structured data weaponizes everybody.</p>
<p>2) Information can be open and describe an entire region in fine detail before <i>any</i> one person on the ground knows the full extent. Previously, conditions would change much quicker on the ground than the reports that made it through to aid agencies and, yes, the bad guys on the ground very often knew about the changes before the aid agencies. But big open data can, and often is, ahead of the curve of any one individual, or any one organization. In the global disease outbreaks that we track, this is the norm, not the exception. You can get ahead of the bad guys on the ground for the first time. This is one of the most positive aspects of big open data (in parsing, if not republishing) &#8211; do not give away your advantage so quickly.</p>
<p>The most frequent response to these kinds of arguments made it into the report:</p>
<dl>
<dd>&#8220;If we can&#8217;t handle the info publicly, it&#8217;s off, we lack adequate security to handle confidential info reported&#8221; Information Management Unit, UN OCHA, Libya Crisis Map Report.
</dl>
<p>It is impossible to predict what information will become sensitive. A report that obliquely mentions doctors in a secure refugee camp is harmless, right up until that camp is later raided and the most educated witnesses are deliberately targeted (this has also happened, but again I&#8217;ve deliberately changed the details a little). To avoid any possibility of security implications is to collect no information. Any information that is held, whether privately or publicly, needs to be constantly reviewed according to a changing environment. There is no way around this. You need to collect data. You need to have the resources to continually review it.</p>
<p>The second most frequent response to these kinds of arguments also made it into the report:</p>
<dl>
<dd>&#8220;the personal responsibility [is] incumbent on the info sender.&#8221; Information Management Unit, UN OCHA, Libya Crisis Map Report.
</dl>
<p>In conflict situations, I think it is rare that someone caught up in middle has a complete picture of their security situation. If you choose to publish aggregated information (regardless of your organization) then your act of publication is asserting a position of power and knowledge. That places at least some responsibility onto you. </p>
<p>If the security <i>is</i> wholly on the reporter, then it falls on the reporter to remove/edit any reports that they have contributed. If they lose communications (or are unreachable for unknown reasons) or may not have known that security was their responsibility, then the responsibility must still fall back on the publishers &#8211; the exact same situation.</p>
<p>For oblivious collaborators over open data, this will also put limits on how much data you can store, as you will need to maintain the manpower and/or technology to continually review all existing data. So just what are the limits on how much already-open data can be stored? I can give one answer:</p>
<dl>
<dd>29.
</dl>
<p><div id="attachment_622" class="wp-caption alignright" style="width: 170px"><a href="http://www.junglelightspeed.com/files/twitter_29.jpg"><img src="http://www.junglelightspeed.com/files/twitter_29.jpg" alt="Twitter 29" title="twitter_29" width="160" height="160" class="size-full wp-image-622" /></a><p class="wp-caption-text">29 tweets: your ethical upper limit on the number of tweets to republish from free open data.</p></div>That&#8217;s the maximum number of tweets that anyone should ever republish from free open data if security is the responsibility of the reporter. It is not exactly &#8220;big data&#8221;. The math is simple. Let&#8217;s assume that the person who tweeted &#8220;there are a lot of gunshots here&#8221; decided to delete their tweet &#8211; if security is their responsibility alone, then the republishers have the responsibility to also remove it. Let&#8217;s also say that an acceptable latency in deletion is five minutes and that you have an <a href="https://dev.twitter.com/docs/rate-limiting">OAuth key that allows you the maximum 350 free API calls per hour on Twitter</a>. You will need to check every existing tweet for deletion via the API every five minutes: (350/60)*5 = 29.16. As soon as you have stored your 30th tweet, you will no longer be able to check for deletions every 5 minutes without hitting the API limit.</p>
<p>You could pay to increase the limits on Twitter, but this is no longer free open data. Or you could simply not honor people&#8217;s wish for their tweet to be deleted, possibly endangering them (for reasons that may only be apparent to them), but this is falling short in data vigilance. So if you want to put the security in the hands of the reporters, leveraging only free open data, then that is your ethical upper-limit for Twitter: 29 data points.</p>
<p>I don&#8217;t want to be too harsh on the individuals in UN OCHA (or anyone entering the big data for the first time &#8211; we are all new), and I greatly appreciate the willingness to discuss these points publicly. But we need to be critical on the idea that there are simple rules for collecting and publishing data that absolve us of responsibility once the data is out there. </p>
<p>By living most of my adult life outside my homeland, I have helped monitor elections more times than I have been permitted a vote. I would love to say that being at the forefront of big open data means taking part in democracy, but this simply isn&#8217;t the case. In big open data, I am the excited 1% trying to meet my obligations to the 99%. Open data is a form of freedom that can help liberate us from disease and oppression, but it is not a democratic freedom &#8211; it is extreme and potentially dangerous &#8211; we need to always keep watch.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.junglelightspeed.com/open-datanarchy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
