jungle light speed

Diversity Check-In for Figure Eight’s Technical Staff

Rob — Fri, 21 Sep 2018 17:34:59 +0000

(cross-post from https://www.figure-eight.com/technical-staff-diversity/)

We get to see and meet many different artificial intelligence teams at Figure Eight. After all, we have hundreds of customers using us as a core piece of their machine learning pipeline, as well as about the same amount of research institutions using our free academic license. And somewhere around 50 of the data science teams I’ve visited have claimed to have the strongest team in AI. It can’t be true that all these 50 AI teams are all the strongest!

While a lot of these teams are quite exceptional, I often have to tell many of them the same thing: if your machine learning team consists of 90% men in hoodies, all within a narrow age range, then you do not have the strongest team in AI.

For AI to truly benefit all of humanity, all of humanity has to participate in creating AI. When Figure Eight was known as CrowdFlower, we were most well known for running the largest marketplace for human annotators to train and evaluate machine learning models. With more than 100,000 people from 150 countries regularly taking part in our marketplace, we provide the ability not only to create training data but to create training data that benefits from a diversity of experiences and points of view. We still run this marketplace, of course, but today I am reporting on our technical team’s internal demographics, something we are doing at Figure Eight for the first time.

The diversity of tech companies, especially AI companies, is problematic. Here, I’ll talk about the technology teams: Engineering, Product, and Machine Learning. Too often, technology companies hide their diversity problems by having greater diversity in non-technical roles. All roles in a tech company are important, but it’s not a truly diverse company if each role is dominated by one demographic. It’s also crucial to be transparent about our leadership team’s demographic breakdown, as leaders are more likely to be role models for junior staff, and as glass ceilings can prevent the right set of role models and diversity of input at strategic levels.

Figure Eight’s gains in gender diversity

When I joined the company about a year ago, I made a commitment to diversity. One year ago, only about 15% of the Technology team were non-male, and none of the technical leadership were. Today, 37% of the Technology team are non-male, as are 50% of our leadership, including two thirds of our executive leadership.

Gender diversity of Figure Eight Technology team over the last year.

In the graph above:

“Non-male” is anyone who identifies as female or non-binary
“Tech Executive” is anyone in technology who is also on the executive team that reports to the board (currently just three of us!)
“Tech Leader” is anyone who is a manager, director, or otherwise a leader of a function (eg: Senior Program Manager)

In 12 months, we have gone from 0% non-male Technology leadership to 50% today. This is the statistic that I am most proud of. Having a diverse leadership team in place will help develop our talented staff as we continue to grow.

What we did well

We are more diverse: we went from 0% to 50% non-male Technical Leadership in 1 year, including two non-male executives. We went from 16% to 37% non-male technical staff in the same period.

We are more efficient: in raw numbers, the Technical team only grew by 39%, while we grew by more than 100% in the volume of data and number of customers we support. So, our more diverse team is now doing more for more organizations.

We are happier: it is hard to quantify, but I have always found a diverse team to be more cohesive. It’s no different at Figure Eight. People take the time to more carefully listen to their colleagues when they know they are coming from different life experiences, and communication and respect are improved as a result.

What we did okay

More people identify as LGBT in our technology team than at the national (and even San Francisco) rate, both within staff and leadership. However almost all are men. So, I’ll give us an “okay” for LGBT representation today, with room for growth.

For ethnic and geographic diversity, we are also better than most companies, but with room to grow. We have multiple staff members from each major region: Asia, Australia, Europe, the Middle East, North America, South America, and Sub-Saharan Africa.

What we can do better

When someone’s identity makes them feel like an outsider in multiple ways, they need the most support to overcome the inherent biases in our industry. I would rather give them leaders and colleagues that they identify with than give only my words as an ally. So, I aim to create a more diverse technology team in this way.

For combinations of underrepresented demographics, we can do better. This is partially a numbers game, as we are still much smaller than the large tech companies and the percentage of underrepresented demographics becomes smaller. We have individuals who identify as both LGBT and people of color, among both male and non-male Tech Leadership, so I am glad that we have representation and role models.

A few notes for further transparency

We’re not brushing anything under the carpet, so here are some more details about what I shared from our team:

This omits one person who was here for less than a month and three summer interns, all male.
These are the stats from the teams that I lead today, Engineering, Machine Learning, and Product, which I started running in January.
We also use international contractors whose identities I don’t all know. I am certain that ratio of non-male to male are less diverse among our contractors, but I don’t have this historical data. The total number of contractors is less than total number of in-house people in data shown here, so it wouldn’t move the stats too much.
Figure Eight’s Sales and Success teams also have people with an engineering title. In these teams, 45% of engineering staff are non-male, and the two leaders are both male.
I only have ethnicity identity information at the company-level, and we are about 50/50 for people who identify as white/non-white.

AI for everyone

There are many more aspects of diversity that we should care about. For example, my personal expertise is diversity of languages in AI. English only makes up 5% of the world’s conversations daily, but makes up more than 90% of Machine Learning in the world. It is related to most other underrepresented demographics: speakers of minority languages are more likely to be from underrepresented ethnicities, and males are more likely to be educated in dominant languages like English.

I encourage every AI company, especially those claiming to have the strongest team in AI, to share similar diversity numbers about their technical staff and leadership.

Robert Munro
September 2018

World Cup Characters

Rob — Fri, 13 Jul 2018 05:53:38 +0000

Watching the World Cup, I noticed that the shirt manufacturers are doing a better job of including accented characters. It used to be that only the 26 a-z characters appeared on jerseys.

The use of characters beyond a-z is uneven. Kylian Mbappé will take part in the final this weekend. The first two results in a search for “Kylian Mbappé’s Jersey” turns up one with and without the accent:

Kylian Mbappé’s Jersey

I was curious about what letters have appeared in World Cup players’ names. It’s impossible to know what went on every jersey back in time when photographs were rarer, and early jerseys probably didn’t have names at all. But, there’s a handy page of squads for each World Cup on Wikipedia, eg: 1930 FIFA World Cup squads. So I quickly calculated how often every character appeared across all squads, which gives the count of how often each letter is used across every single Word Cup player’s (romanized) name:

Character	Appearances
a	14062
e	11017
o	9890
r	9888
i	9627
n	8896
l	7103
s	6676
m	4426
d	4405
t	4383
u	3789
h	3597
c	3447
g	3083
b	2738
k	2658
v	2236
j	1937
p	1683
z	1652
y	1647
f	1376
w	829
é	671
á	605
í	352
ć	305
ó	252
q	190
š	161
x	146
ö	108
ú	98
ü	95
č	84
ñ	66
ł	51
ž	40
ç	38
ã	37
ø	36
ä	31
ă	28
ý	25
ï	22
ë	20
ř	19
å	18
ş	17
ő	16
đ	15
æ	11
ð	11
ż	11
è	10
ě	10
ı	10
ô	9
ń	9
ō	9
ß	8
à	6
â	5
î	5
ą	5
ğ	4
ț	4
ś	3
ū	3
ș	3
ň	2
ť	2
ű	2
ò	1
õ	1
þ	1
ď	1
ľ	1
œ	1
ů	1
ź	1
ǎ	1

There’s 83 in total! There are 9 that occur only once. The last of these in the list, “ǎ”, is from Ion Lăpușneanu, who was Romania’s goalkeeper in 1930 (in case you were wondering).

There are many players in the World Cup whose names are really written in Hangul, Cyrrlic, Arabic, Hiragana and other completely different scripts. For a tournament that literally begins with “World”, the least we can do is ensure that players get to wear shirts that spell their names the same way that they do. It’s good to see the tournament moving in this direction!

Robert Munro
July, 2018

The Languages of Black Panther

Rob — Thu, 22 Feb 2018 07:32:33 +0000

If you haven’t seen Black Panther yet, go see it. It was one of the most fun action films I’ve seen in a long time, and one that embraced a pan-African aesthetic that is rare in cinema.

On International Mother Language Day, I feel justified in offering one criticism: they didn’t use enough languages from across Africa, and missed a huge opportunity to let speakers of African languages engage more directly with the characters.

This post is to correct this, and identify the languages spoken by each group of people in the film. I estimate that we could have had 10x as many African languages in the film for the cost of what Black Panther earned every 2 minutes on opening weekend.*

The only spoilers here are from the opening few minutes of the film.

Background on my love for the languages of Africa

The film begins with a meteorite crashing to earth in Wakanda. From the map they show this is somewhere in/near Uganda. We then see people at the site of the meteorite speaking the Xhosa language. Xhosa** is spoken 3000+ kilometers away in South Africa, and while some many languages in Uganda are related, they are not mutually intelligible. It is as if someone set a movie in Romania, but had everyone speak Portuguese, because Romania is in Europe and “Portuguese sounds European enough”. It is the same geographic and linguistic distance.

The movie asks us to suspend disbelief about more than a lost language. So my criticism is less about how out of place it seemed, and more about the lost opportunity to include so many more languages, and have the speakers of those languages identify much more closely with the film.

I work at the human-technology interface, which is a running theme I enjoy in Marvel films. My PhD focused on Artificial Intelligence for an African language. I’ve lived in Sierra Leone in West Africa and worked in refugee camps for the UN in neighboring Liberia. I traveled by bicycle across Kenya, Uganda, Rwanda, Burundi, Tanzania, Zambia, Malawi and Mozambique, covering more than the 3,000 kilometers that separates the fictional Wakanda from the real Xhosa, and often passing through several language regions a day, and enjoying briefly hearing each one.

Everyone identifies a film more when they hear their own language in it. In addition to Xhosa, the film has a brief scene with the Hausa language. But the Hausa scene is only about kidnappers, so this doesn’t count as the film using a language in a positive way that its speakers would want to identify with.

The languages of Black Panther!

I owe thanks to the breakdown of cultures in the film to @DiasporicBlues‘ analysis of where the different groups*** in Black Panther drew their biggest influences. I know about the languages, but less so the dress and other identifying characteristics, so I’m indebted to her identification of each culture. I’ll add the languages to each to get our list, and show her tweets for context and credit:

Zulu

Zulu headdress. Queen Ramonda wears a distinct headdress. It's reminiscent of the reed Zulu flared hats or "Isicholos." The Zulu headdresses are traditionally worn by married women for ceremonial celebrations. pic.twitter.com/5YSIqKjkMg

— Waris (@diasporicblues) February 17, 2018

A neighbor and sister-language to Xhosa, there are 12 Million Zulu speakers, also mainly in South Africa, and 16 Million more who speak it as an additional language.

Surmi and Amharic

Mursi and Surma Lip Plates. Lip plates or disks are a form of ceremonial body modification. While many cultures use them they're best known by the Surma and Mursi tribes in Ethiopia. #BlackPanther #Wakanda pic.twitter.com/gkrfA3AC70

— Waris (@diasporicblues) February 17, 2018

His lip-plate looks Mursi or Suri (Surma) of Ethiopia, but they are traditionally worn by women. His dress looks more like the Les Sapeurs of Congo:

Ethiopia and Congo (esp the major cities in Congo) are a long way from each other. In the film this group guards the water, so I’m going to assume that this group in Black Panther are well-traveled Suris (27,000 speakers), who traveled up the Nile and down the Congo, where they picked up the style, and also speak Ethiopia’s most widely spoken language, Amharic (22 million speakers).

Maasai

Many of the costumes have unique and futuristic ornamentation and details. These were made by emulating styles of the Masai people. The Maasai people of East Africa live in southern Kenya and northern Tanzania. #BlackPanther #Wakanda pic.twitter.com/SjTE7kMGYL

— Waris (@diasporicblues) February 17, 2018

The Maasai has 1.3 Million speakers, very close to where Wakanda is located. I’ll assume they also speak the lingua-franca of the region, Swahili, with 2 Million native speakers and up to 100 Million second-language speakers. My fiancee and I both understand a little and swear we heard Swahili greetings in the film “mzuri”, but I can’t confirm this – anyone notice and can confirm?

Southern Ndebele

Ndebele Neck Rings. Shuri and the Dora Milaje have outfits with a prominent collar. The South Ndebele peoples of Zimbabwe/South Africa wear neck rings as part of their traditional dress and as a sign of wealth and status. #BlackPanther #Wakanda pic.twitter.com/3L010CrUAU

— Waris (@diasporicblues) February 17, 2018

Southern Ndebele is spoken by 1.1 Million people as a first language, and is closely related to Xhosa and Zulu.

Sotho

Basotho Blanket. In several scenes, W'Kabi (Daniel Kaluuya) and others are shown wearing Basotho blankets around their necks. Though the blankets are originally from the Lesotho people the designs are synonymous with the Sesotho people. #BlackPanther #Wakanda pic.twitter.com/XU1RlspXTt

— Waris (@diasporicblues) February 17, 2018

Is anyone left in Southern Africa in the Marvel Cinematic Universe, or did they all move north to Wakanda? Within South Africa, let’s assume that the Lesotho spoke Sesotho, the Sotho Language, with 5.6 Million speakers as a first language and 7.9 Million additional speakers.

Himba

Many of the costumes have a distinctive red earthy tone. This was done by studying the colors used by the Himba people of north-western Namibia. Himba people are known for applying a red ochre paste, known as "otjize", to their skin and hair. #BlackPanther #Wakanda pic.twitter.com/K8eqmwNpcg

— Waris (@diasporicblues) February 17, 2018

In between Southern Africa and Wakanda, the Himba of Namibia speak Herero, which has about 192,000 speakers.

Yoruba

Forest Whittaker plays shaman Zuri who's the spiritual leader of Wakanda. He wears ornate flowing robes known as an Agbada. It's one of the names for a flowing wide-sleeved robe worn by men/women in much of West Africa, and North Africa. #BlackPanther #Wakanda pic.twitter.com/APqePaPMX1

— Waris (@diasporicblues) February 17, 2018

The word Agbada comes from the Yoruba language, so let’s assume that he is one of the 28 Million speakers … and the most lost person in this film as Yoruba is mostly spoken in and around Nigeria which is even further from Wakanda than Xhosa. To Forest Whittaker’s credit, his English accent was definitely East African, not West African.

Gisu

The final group in the film, called the “the Jabari”, aren’t discussed by @DiasporicBlues. We’re told the Jabari are from the mountains, so I’m going to say that they are Gisu, who are from around Mt Elgon in Uganda. The Gisu language has 2.7 Million speakers. My brother lived in a Gisu community and we visited him there while cycling across Africa, so I’ll admit some personal bias in asking for it to be added to the movie.

The total!

Language	Native Speakers
Xhosa	8,200,000
Zulu	12,000,000
Surmi	27,000
Amharic	22,000,000
Maasai	1,300,000
Swahili	2,000,000
Southern Ndebele	1,100,000
Sotho	5,600,000
Himba	192,000
Yoruba	28,000,000
Gisu	2,700,000
Total	83,119,000

The actual languages would reach 83 Million first-language speakers in Africa, and more than double that that if we counted people who spoke these as second, third or fourth languages. That’s a big jump from 8 Million Xhosa speakers in the amount of people who speak an African language and who would engage more deeply with the film!

If Marvel won’t go back and dub these languages in, I hope they consider them for the sequel. Imagine how much more amazing it would be to see such linguistic diversity in a film!

Robert Munro
February 21 (International Mother Language Day) 2018

* the math works out: assume it would cost $10K each for the 10 additional language coaches for the handful of lines in each language = $100K, which is 0.05% of the total budget. It look $192M in the opening 3-day weekend, so to put it another way, every 2 minutes of revenue in the opening weekend from Black Panther’s opening weekend would also cover it.

** Most languages have multiple names. For better or worse I defaulted to what Wikipedia had or the name I was most familiar with. I also left the prefixes off that literally mean ‘name’, like ‘isiXhosa’, ‘isiZulu’, ‘kiSwahili’ etc. Most of these differences are based on sub groups and personal preferences, and mostly there are no wrong or right names. Apologies if any of the spellings are wrong or offensive to you for any reason.

*** In the movie they refer to ‘tribes’. It’s 100% ok to make generalizations about a fictional tribe, that would be 100% offensive if that generalization was made about a real group of people. So, I used ‘group’ in this article to talk about a linguistic group of people sharing the same language in the real world. The languages often share the name with the tribe, but I’m not making any assumptions about tribal identity, ethnicity, or nationality beyond where it places people geographically.

Haiti on the Rise

Rob — Wed, 24 Jan 2018 06:22:38 +0000

I don’t normally write posts to recommend not-for-profits, but I’m delighted to make an exception for Haiti on the Rise! (haitiontherise.org/)

Haiti on the Rise was founded by my friend Jackie (Jacqueline Oriscar Lee). I met Jackie at one of the toughest times in her life. On January 12, 2010, an earthquake struck Haiti, killing more than 100,000 people immediately and leaving many more without safe housing and access to vital resources like water, food and medicine. I was responsible for an emergency reporting service following the earthquake, where we set up a free number, 4636, where anyone within Haiti could send a text message to report their needs. The problem was that most people in Haiti only spoke Haitian Kreyol, while most international responders only spoke English. So, I found and managed 2,000 people from among the Haitian diaspora to translate messages in near real-time. It was called Mission 4636 and Jackie was one of the most committed volunteers, putting aside her own grief to help everyone in Haiti who still needed her help.

Jackie grew up in Haiti and now lives near me in the Bay Area, so I am happy to help her continue to re-build Haiti. I’m on the advisory board of Haiti on the Rise and I financed the website via Code for Haiti, ensuring that the resources to get the charity off the ground went back to Haiti where possible, and increased the experience and skills of people there.

If you are in the Bay Area and would like to support Haiti’s rebuilding efforts within Haiti on the Rise, please consider attending the 3rd Annual Haiti On The Rise Fundraising Dinner & Auction on Saturday, February 3rd, 2018 from 5-9PM at the St. Mary’s Cathedral Event Center in San Francisco, or donating on the website!

Robert Munro
January, 2018

In the Shadow of Bradman

Rob — Mon, 08 Jan 2018 05:35:25 +0000

I’ve been watching the latest cricket series, “The Ashes“, between Australia and English. Australia won and our captain Steven Smith was awarded player of the series. The series takes place every two years and is one of the longest standing and most watched sporting rivalries of any sport.

Steven Smith had an incredible series. He is now averaging 63.75 runs per game, which is the second highest average of any cricketer who has played enough games to make the statistics meaningful – 20 “innings” for those who know cricket. If you don’t know cricket, ‘runs’ are points like in baseball. This means that Smith scores an average of 63.75 runs for every time he gets out. You can see a comparison of cricket and baseball batting averages on the Wikipedia page on batting averages. The main reason that cricket is so high compared to baseball (without getting into the details) is that cricket players can ‘block’ the ball and they aren’t obliged to run, so they are harder to get out.

Among current players, Smith’s average of 63.75 is 10 above the next highest average, which happens to be his rival, England caption Joe Root. To be so far ahead of the nearest person, and second highest of all time, should be an incredible achievement. But cricket’s top career average makes the case for the having the greatest sports person of all time. Don Bradman’s career average was 99.94. To put Bradman’s 99.94 in perspective, here are the top 66 cricket career averages, spanning from 1890 to 2018:

The Top 66 Cricket Batting Averages from 1890 to 2018

From 1928 to 1948, Don Bradman averaged more than 30 points higher than anyone has achieved before or since! By comparison the gap between #2 and #66 is less than 20 points. You don’t have know cricket to appreciate how much better was Bradman’s average is. Famously, he would have averaged over 100 if he scored 4 in his final match, but instead he got out for 0.

There are other great achievements in cricket. Current Australian player Ellyse_Perry isn’t on this list as she hasn’t played many games, but is averaging 61.71 in addition to also being on the Australian football (soccer) team. But even looking across the other stats in cricket, nothing comes close to the gap in batting average.

Do you know of any sport where the most dominant person is so far ahead of everyone else? Even for cricket fans, this graph really drills it home: every cricket player will live in Bradman’s shadow.

Robert Munro
January, 2018

Sources: The stats are from CricInfo at http://stats.espncricinfo.com/ci/content/records/284197.html and http://stats.espncricinfo.com/wi/content/records/282910.html. If you want to look into the numbers or copy up-to-date data, you can see Google Spreadsheet that I used here: https://docs.google.com/sprehttps://docs.google.com/spreadsheets/d/191Zur0_MmOb_c1BruY1UgISKipVUf1fQRW_OIQCH_x0/edit?usp=sharingadsheets/d/191Zur0_MmOb_c1BruY1UgISKipVUf1fQRW_OIQCH_x0/edit?usp=sharing.

A Step in the Right Direction for NLP

Rob — Thu, 28 Dec 2017 21:20:19 +0000

It was a delight to read Emily Bender’s post Putting the Linguistics in Computational Linguistics this week: https://naacl2018.wordpress.com/2017/12/19/putting-the-linguistics-in-computational-linguistics/.

Bender’s advice on how to bring better data practices into NLP is to: Know Your Data; Describe Your Data Honestly; Focus on Linguistic Structure at Least Some of the Time; and Do Error Analysis. I appreciate her link to my article on this site about languages at the ACL conference in 2015 (Languages at ACL this year), where I showed a shocking bias towards English.

This week, it was great to see a new data set that will help us push the boundaries. While still only in English, it tackles a new dimension, Toxic Comment classification. It is posted as a Kaggle competition, meaning that it is an open source data set that anyone can run experiments on and compare results, competing for the most accurate results: Toxic Comment Classification Challenge.

With 100,000+ comments in total, it is one of the larger data sets available for NLP. By tackling toxic comments on Wikipedia, they are hoping to enable people to build Machine Learning systems that can more broadly tackle hate speech, online harassment, cyberbullying and other negative online social behaviors. It’s important area to study, because toxic communities tend to drive out diversity in their members. Unlike real-world toxicity, where ‘toxicity’ is measured as a physical effect on someone like LD50 – dose at which 50% of people will die – online toxicity is less well understood. How do you measure when someone stops participating in a community that is toxic for them, confirm the causal link, and measure additional adverse effects on their life outside of online communities? These are all questions that we can start to answer with this data set.

Source: XKCD https://xkcd.com/1260/

The data set was created by Google/Jigsaw, and authors created this data set using our software at CrowdFlower. They also sourced a few thousand contributors through CrowdFlower to annotate the data according to what kind of toxic comment (if any) was made. See the original paper describing the data collection at: Ex Machina: Personal Attacks Seen at Scale, by Ellery Wulczyn at the Wikimedia Foundation and Nithum Thain and Lucas Dixon at Google/Jigsaw.

This data set was recently highlighted by Christopher Phipps as not conforming to Bender’s criteria: http://thelousylinguist.blogspot.co.il/2017/12/putting-linguistics-into-kaggle.html

I agree with most of his observations, but I am more optimistic about the data set and how those problems can be addressed, or even highlighted, by the data as it already exists.

One particular problem that Phipps highlighted was on quality control for the annotation process. He wrote:

“little IAA was done. Rather, the authors … used a set of 10 questions that they devised as test questions. Raters who had below 70% agreement on those ten were excluded”

Update: I was was writing this, Phipps removed this passage from his post after I pointed out the inaccuracy via Twitter. Thanks Christopher! I’ll keep this below in case other people are confused by the quality control method used.

This observation is correct but incomplete. In the article they report that in addition to 10 initial quiz questions, 10% of all remaining questions were also ones with ‘known’ answers from which they could track the accuracy of each contributor. From their original paper:

“Under the Crowdflower system, additional test questions are randomly interspersed with the genuine crowdsourcing task (at a rate of 10%) in order to maintain response quality throughout the task.”

So, they would have had 100’s or 1000’s of questions with known “gold” answers for evaluating quality, not 10. The annotators would only been allowed to continue working so long as they were accurately answering the 10% known questions, and there results included so long as they maintained accuracy across those questions throughout their work. This is standard for the CrowdFlower platform. They had 10 people rate each piece of text, which is more than what we typically see: 3 to 5 is more common. At this level of quality control, the probability of deliberately bad or misinterpreted results is very low (less 1 in a million), so the data annotation should be clean in the sense of being genuine. The authors also trialled and refined how the questions were asked in multiple iterations, before arriving at the annotation design that they launched for the 100,000+ comments that they ultimately published.

Tracking agreement with test questions is separate from Inter-annotator agreement (IAA), which looks at how much different people agreed with each other. The authors use Krippendorf’s alpha for IAA and found 0.45, which is good agreement. I think this is fine for IAA on a task like this. There are some tasks where Krippendorf’s alpha isn’t the best choice, but for a small set of questions where there were a large number of responses per human annotator, like here, it is reliable.

There are two other criticisms from Phipps which I think have deeper implications, but also have solutions: the bias in the demographics of the annotators and the bias in the demographics of the Wikipedia editors.

I believe that both can be addressed by looking at the data, and that this will be inherently interesting.

First, Phipps notes that the annotators were biased towards men (65%) and non-native English speakers. It’s possible this influenced their judgments. However, this is easily testable: the data is there to compare the annotations based on gender, age, education, and first language. If we have access to the gold/known questions, we can see if annotators from any demographic were more or less accurate.

Even without the gold/known answers information, we can look at the distributions of answers from different demographics, and see if there really are differences in response that correlate with each demographic.

Finally, we can look at the agreement on a per comment basis. We could test, for example, whether women annotators tend to agree with other women more than they agree with men (or vice versa) showing correlation bias from annotators based on gender. I don’t have intuitions about whether the demographics of annotators will correlate with their interpretations, so I’m curious to see the results! (Friendly callout: Christopher Phipps, perhaps you could run this analysis as penance for the IAA comment? ;-) )

If there are biases that correlate with annotator demographics, then in many cases they can be adjusted for. For example, if there are different responses according to gender, the data can be re-aggregated so that of the 10 judgments for each comment, an equal number of people identifying as men and women are chosen, rather than the default of 65% men. This doesn’t help with people who identified as neither men or women, as they were not represented in large enough numbers to rebalance the data in this way, but it would address any man/woman bias. The same is true for any unbalanced demographics: provided that there are a large enough number of the underrepresented demographics across all the data, we can adjust for it.

On the second more complicated problem, Phipps notes that Wikipedia contributors are a non-diverse group of people, very over-represented by white males. This is a more serious limitation of the data set. To the extent that demographics can be tracked on Wikipedia, this could also be interesting. This new data set will allow us to track which people on Wikipedia are specifically targeted with toxic comments. We can therefore track how many of them continued to be editors and how many dropped out, and see if non-white males are more likely to drop out. This could help explain why the community is already so non-diverse. So, it would be interesting from a sociological perspective to investigate this data set and the impact on those involved.

The remaining problem is that any Machine Learning model created on the data might be capable of only accurately identifying toxicity by white males. This is a valid concern and I hope there are evaluation data sets being produced that can test how true this is.

I know that the authors of the data set look at *many* different data sources in their work, so I presume the decision to release data only from Wikipedia was primarily driven by Wikipedia already being open data. I don’t work with this team at Google/Jigsaw at CrowdFlower on a regular basis, but when I last spoke to them it was to give advice on reducing bias in Machine Learning models that could come from biased data. So, from what I’ve seen, I know they are thinking about broader problems than this data set encompasses, and I’m glad that they have open sourced one data set so far!

All the best to everyone entering the Kaggle competition! In addition to building great algorithms, I hope you have the chance for some linguistic analysis, too

Robert Munro
December 2017

I’ve joined CrowdFlower!

Rob — Thu, 24 Aug 2017 00:20:50 +0000

I’m happy to announce that I’ve joined CrowdFlower as VP of Machine Learning!

I’ve been a fan of CrowdFlower since 2010, when I was helping respond to the earthquake in Haiti. At the time, I was managing 2,000 members of the Haitian diaspora to translate, categorize and geolocate emergency messages from the Haitian population, so that a plain text message in Haitian Kreyol could be used as a structured English report by the international response community. We used CrowdFlower as the technology for this work, so that we could pull out the most important and time-critical pieces of information. This was something that we launched in just 48 hours, showing how quickly the right technology can be deployed for a positive impact on the world.

Since 2010, I’ve used CrowdFlower in at least 5 or 6 different contexts, from other disaster response and epidemic tracking initiatives, to tech companies looking to leverage human annotation for machine learning. The original data from Haiti also became part of my Stanford PhD, where I researched how AI can be used to better process and prioritize time-critical information at scale during disasters and for health, especially in languages outside of English.

A presentation I gave about data and machine learning at a CrowdFlower conference in 2015

Today, CrowdFlower is the top company for providing human input to train AI systems. From the biggest research labs in multinational companies, to individual researchers and 9/10 self-driving car companies, CrowdFlower provides the human-in-the-loop technology. In other words, CrowdFlower is at the forefront of more advances in Artificial Intelligence than any other company right now. I believe that the interaction between humans and AI is the most important next step in technology for humanity.

I’m proud of what I achieved at Amazon AI in the time I was there. I lead product for Natural Language Processing (NLP) and Machine Translation at Amazon AI, taking Amazon Web Service’s (AWS) first suite of NLP products from conception to internal launch. There’s not many people who get to say they launched the first products in their area of expertise on the world’s largest cloud provider, and I am thankful to everyone that I worked with at AWS!

The opportunity to lead Machine Learning at Silicon Valley’s most successful AI startup was simply too good to pass up, so I’m happy to be joining the company after working with them for so long! Thanks also to Venture Beat for sharing this announcement.

Please stay tuned to hear more about what we are building at CrowdFlower!

best

Rob

Analogue Natural Language Processing

Rob — Sat, 25 Feb 2017 18:16:27 +0000

A recent Economist article got me thinking about the history of language technologies:

Finding A Voice, the Economist

The article give a good overview of language technologies like Machine Translation and Speech Recognition, with more depth and less exaggerations about recent advances in AI than many similar publications. They give this too-late timeline, however, starting in 1954:

The Economist’s History of Language Technology

The article, and its wrong date, reminded me of a recent conversation I had with Ron Kaplan about the early days of Artificial Intelligence. Ron is currently the Chief Scientist in Amazon’s search division, A9, and he is also responsible for my day job! When he heard I was looking for work last year, he brought me into the Amazon Web Services (AWS) division. AWS is the largest cloud services provider, and I now run product for Natural Language Processing and Machine Translation there, building language technologies for millions of people and organizations world-wide.

I first met Ron when we were both at a search startup called Powerset about 10 years ago, and again when I returned to grad school at Stanford, where he participated in a class on the history of Computational Linguistics, run by two other current/future legendary language technologists, Martin Kay and Dan Jurafsky. When we caught up recently, I brought up a story that had stuck with me since that class: Ron had helped build a commercial language technology product that predated computers.

Their technology was a spell-checker for an electronic typewriter. You inserted their device into the cable between the keyboard and the ribbon. The device had a dictionary of words and possible prefixes & suffixes (“-ing”, “-ed”, etc) and it would ‘ping’ if it thought you misspelled a word. The business model was that it would save ribbons as less words needed to be retyped. It wasn’t in the market for long before electronic typewriters got replaced by computers with word-processors, and software spell-checkers replaced their hardware spell-checker.

I still love the idea of hardware-based language technologies. Today, even the devices that seem like hardware-based devices, like personal home assistants, are generic hardware technologies where most of the specialized processing is in the software.

Going back even further, I thought, were there analogue natural language processing technologies? Before the transistor revolution, did people try to build language technology products on machines powered by vacuum tubes?

Yes, people made language technologies out of vacuum tubes!

Introducing, the Voder:

Source: www.youtube.com/watch?v=0rAyrmm7vv0

At the 1939 World Fair, Bell Telephone Laboratory’s Voder was revealed. It was invented by Homer Dudley, which is the most ‘1939’ name you can think of. Operated by a specialized keyboard, it also made it to Silicon Valley the same year, with demonstrations at the Golden Gate International Exposition on Treasure Island in the San Francisco Bay Area, as part of celebrations for engineering advances that included the recently constructed Golden Gate Bridge. I’m just going to leave this sentence to sit with you for a while:

Homer Dudley took his vacuum-tube speech synthesizer to Treasure Island to celebrate the Golden Gate Bridge

I like to imagine Homer Dudley’s long sea journey to the San Francisco Bay area with his Voder. Every day, he would stand on the deck, staring out to where the waves meet the horizon, thinking about how to create technology to better mankind. Every evening, he would descend into the cargo hold to where the Voder was stored, pat it reassuringly, and make sure that the straps keeping it secure were tight (but not uncomfortably tight). I call this journey Voder’s Odyssey.

Alas, the Voder was not widely sold. It is difficult to imagine the use cases. It could help speaking impaired people, but many conditions with speaking-impairment also have motor-control impairment, which would make transporting and operating the Voder very difficult.

Fortunately, Homer Dudley succeeded elsewhere. The knowledge that went into the Voder also went into compressing people’s speech in early telephone conversations. In addition, he worked with Alan Turing on encryption during the second world war. Finally, his work on voice synthesizers did lead to technologies that helped speaking impaired individuals, most famously Stephen Hawking. We should all aspire to be Homer Dudleys.

As an interesting aside, Stephen Hawking has chosen to maintain using his now dated sounding synthesized voice, as it has become so closely associated with him. It is a fascinating early example of how our electronic extensions will become important parts of our self identity, which will only grow as language technologies become more ubiquitous.

In terms of the Economist article, then? It looks like they started their timeline at least 20 years too late, but I still recommend reading it. It gives a great overview of the progression of the field since 1954, including a topic that is dear to my heart: extending language technologies to less widely spoken languages.

Robert Munro
February, 2017

PS: Perhaps there was some language technology product earlier than 1939 that I don’t yet know about? Drop me a line: I’d love to hear about it!

How close was the U.S.A. election?

Rob — Sun, 01 Jan 2017 01:23:25 +0000

There have been a lot of articles about the 2016 U.S.A. Presidential election and possible outcomes. Most are long essays about a changing U.S.A. that aren’t backed by real numbers. So I decided to run the numbers myself.

To give a brief background for people who aren’t in the U.S. or don’t know how elections are won here, U.S. elections are won by state. Every state has a certain number of the 538 total electoral votes. If you get the most votes for president in a state, you get every electoral vote in that state (with exceptions from just two states). So, if a candidate wins Florida by just 1%, they still get all 29 Florida electoral votes.

The minimum change to swing the election

Many states were very close: in Michigan and Pennsylvania, the combined margin for the winner, Donald Trump, was just 79,848. In other words, if just 40,000 Trump voters in Michigan and Pennsylvania instead chose Clinton, she would have won those states, and won the entire election by 18 electoral votes.

(Note: since I wrote this paragraph the counts now make minimum now looks closer to 35,000, so the argument is strengthened.)

40,000 swing voters is just 0.011% of all people living in the U.S.A. About 1 in every 10,000 people. To appreciate how small a swing that is, here is 40,000 people next to everyone in the U.S.A. by how they voted or didn’t vote for President:

2016 U.S.A. Election Results
Minimum voters needed to change result:	i
Voted for Donald Trump:	i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i
Voted for Hillary Clinton:	i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i
Voted for another candidate:	i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i
Lives in U.S.A. but didn’t vote:	i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i
Figure: Total number of votes or non-votes by every person in the U.S.A. for President in 2016. i = 40,000 people Source: http://junglelightspeed.com

Still with us? Clearly, there are many factors that could have changed such a thin margin. Even if we look at the result across all states, not just the minimum in Michigan and Pennsylvania, it’s still only a 1.5% shift among voters and less than a 1% shift across the entire U.S.A. population.

Here’s a version that you can print out for arguments over the next 4 to 8 years. Keep it handy in case your Republican family and friends try to argue that the victory was a landslide, or if your Democratic family and friends argue that Hillary Clinton winning the popular vote meant that the majority of people in the U.S.A. voted for her:

There are many other ways to look at the data to realize how close it was. Clinton received about 2 to 3 million more votes for president (although not enough to get the majority of the votes) but didn’t win enough electoral votes from the states. Within the states, candidates don’t need more than 50% of votes to win a state. In 26% of states no candidate received more than 50% of the presidential votes, because many votes went to independent and third-party candidates.

In other democratic countries where I have lived, votes are counted differently and would have produced different outcomes.

In Sierra Leone, the 2007 Election produced a result with no candidate receiving more than 50% of the vote for president, so an additional run-off election was called where people voted just for the top two candidates. It was a tense time to be living there, but thankfully was a largely peaceful pass-off to the newly elected party. If the U.S.A. had this system, there would have been an additional election with only Clinton and Trump as candidates.

In Australia, votes are preferential, which gives a quicker method than a run-off when there is no clear winner. For example, everyone can choose their preferred candidates in order. In Utah, for example, Trump received 46% of the vote, Clinton 29%, McMullin 21%, Johnson 3.4%, and Stein 0.8%. If it had a preferential system, the McMullin, Johnson, Stein voters could have also expressed their preference for Trump or Clinton, giving one of the two candidates more than 50% and ensuring that every vote counted. In Australia it is common for minor parties to determine election outcomes when preferences are calculated. In several past U.S.A. elections, this would have given a losing party the majority, and in general it would make it easier for new parties to start attracting enough votes to battle the two major ones as people wouldn’t feel like their votes were wasted.

I also lived in one arcane country that has a house of government restricted to leaders of the state-sanctioned religion and the descendants of wealthy land-owners. The other house of government is theoretically more open, even though they meet in a Medieval Palace decorated with giant religious scriptures.

… but enough about the United Kingdom. Jokes aside, the U.K. is clearly a democratic nation today, that would never be so medieval as to declare the earth was flat or deny what continent they are in.

In any of these countries, a difference of 40,000 people is tiny. To put it in perspective in the U.S.A., a little more than 40,000 people visit Disneyland each day. For American football, even the least popular NFL matches attract more than 40,000 people to the stadium. I enjoyed visiting Disneyland as a child, and I enjoyed watching the Seahawks demolish the Rams two weeks ago, cheering along with the other 70,000 people at CenturyLink Field. But I really hope those 40,000 swing voters put more consideration into their votes, than into their visits to sporting events and theme parks. As people left CenturyLink in the fourth quarter, 40,000 did not seem like a lot of people as I looked around from my seat at the half empty stadium.

Both Clinton and Trump played the U.S.A. voting system by campaigning more in swing states and knew this was a potential outcome that could advantage either candidate. So whichever way the election went, you can’t complain that your candidate lost. But if you are affected by the outcome of the election, you can complain about how few people determined the result, the method by which that result is determined, and the consequences it has for your life. You can, and should, continually question the democratic process in any country that you live in or that has influence over you.

A Split U.S.A.?

Don’t believe the long articles talking about how this election showed a newly fractured U.S.A. because of the appeal to right wing extremists. The same reporters who wrote meaningless articles with terrible predictions in advance of the election are now trying to write long articles explaining the election results as if it was a landslide or dramatic change. It wasn’t. It was 40,000 people, just 0.01% of the population, which any number of factors could have swung either way.

The U.S.A. is slowly embracing equality and obviously has a long way to go. The first Back to the Future film was released closer to when Rosa Parks decided to sit at the front of a bus than today. The Civil Rights Act that finally ended all state and local segregation laws only came into law in 1964. Or to put it another way, Janet Jackson’s Rhythm Nation was released closer to a time with legal racial segregation in the U.S.A. than today.

Hint: it’s only in black & white for stylistic reasons – it’s not that long ago. The clip has 97.5% positive votes on YouTube (including mine – I was permitted to vote on this clips, but not the election, like the majority of people in the U.S.A.). So it’s clear the U.S.A. has majority who believe in a nation united in rhythm.

Discrimination against minorities and foreigners is going to take longer, but it’s moving in the right direction. As I shared in my last post, every California county voted in favor of allowing bilingual education:

Every California county voted to repeal English immersion & reinstate bilingual education. A unified state in support of diversity. pic.twitter.com/sNDGLyq8Op

— Robert Munro (@WWRob) November 9, 2016

The U.S.A. is still a country that overwhelmingly embraces plurality and equality. The average U.S.A. household has 2.5 people. So the not only did the majority of people not vote for Trump, the majority of U.S.A. households did not have a single person who voted for Trump. The same is true for Clinton. Neither candidate represents the majority the nation, but we expressed our support for diversity in many other ways.

Robert Munro
December, 2016

California unified in voting for diversity

Rob — Thu, 10 Nov 2016 19:46:26 +0000

An interesting but overlooked result of the election earlier this week: every California county voted to repeal (mandatory) English immersion classes and reinstate bilingual education.

Some of you probably saw me tweet this yesterday:

Every California county voted to repeal English immersion & reinstate bilingual education. A unified state in support of diversity. pic.twitter.com/sNDGLyq8Op

— Robert Munro (@WWRob) November 9, 2016

Bilingual education is more common than monolingual education, globally, which is sometimes overlooked in countries like the USA with large swathes of monolingual speakers.

For every county in California to support any proposal is rare. California is the world’s 5th largest economy, so it is positive to see such an influential part of the world come together in support of diversity.

Robert Munro
November, 2016