November 23rd, 2013 by Rob

Cross-post from

I enjoyed taking part in the workshop on Social Impact Through Big Data & Crowdsourcing this week at the Wharton School in San Francisco.

The application of scalable information processing to public good is still fairly nascent. This is primarily because scalable information processing is something we are still trying to solve in general, not just for social development. The most advanced companies in for profit information processing are also working on using their technology for positive social change, and we were fortunate to have many of them in the room together this week. While all advances in information processing can improve the world, some advances have more direct and more amplified impacts.


The speakers

The day was framed by Devin Thorpe of Forbes, Doug Collom, the Vice Dean of the Wharton School in San Francisco, and Katherine Klein, the Vice Dean for Social Impact. They tied their knowledge of global economics and entrepreneurial ventures to the social good and sustainable development.

Lukas Biewald, CEO of CrowdFlower, spoke about engaging a global workforce and the meritocracy of purely performance-based reputation, no matter where you are in the world. His examples of the complexities of quality control were impressive: ensuring good work from distributed human computing is a difficult task.

Deepak Puri, Director of BD at VMware, volunteer at CauseBrigade and organizer of the workshop, spoke about the importance of providing employment for those at the bottom of the economic pyramid, talking about iMerit’s work in India. He was joined by Captain Ryan O’Connor, who spoke about high unemployment for returning veterans and the problems of finding additional work with the gap that service can leave in your resume.

Olivier Delarue, Innovation Lead for UN Refugee Agency (UNHCR), showed just how diverse this workforce is for those ‘in the basement’ of the economic pyramid. That is, the refugees are often undocumented, do not have bank accounts, but have an incredible diversity of skills that are in demand globally. There is a huge opportunity to provide employment for refugees within the information industry, creating sustainable employment while supporting the global economy.

Diving into analytics, Peter Fader, Professor of Marketing at the Wharton School, used data from donation patterns to explain predictive modeling. Given years of data about past borrowing patterns from individuals, how can a charity decide who are the next people to target for future donations? He gave clear examples of where statistical models were superior to human intuition, but also where care is needed to be taken not to overfit the data and make erroneous predictions.

Representing Social Networks, Peter Skomoroch, Former Principal Data Scientist at LinkedIn, spoke about a number of collaborative tools, like the popular ‘endorsement’ functionality that he spearheaded. He gave an interesting insight into the balance between user-generated categories and set ontologies, with the trade-offs between coverage, accuracy, and user acceptance. He made a solid argument that sustainable impact does not happen with one-off events, but with continued professional engagement.

The work I knew the least about before the workshop was Palantir’s Philanthropic work. I was particularly impressed with their work with Team Rubicon, a non-profit disaster relief organization that dispatches volunteers from its network of 5,000 military veterans. Ari Gesher’s description of their disaster response work showed the benefits of using top engineers and reliable well-tested software in critical environments.

Katherine Townsend of USAID talked about a range of initiatives for engaging more people in social good, for lowering the barriers to donate through ‘donation by SMS’, and to opening up data about not-for-profits both for transparency and to allow analysis by a broader range of interested parties.

I spoke about work in global epidemic tracking, showing how a combination of crowdsourcing and machine learning allows billions of data points in dozens of languages to be quickly filtered and reviewed by a handful of domain experts, allowing the early identification of outbreaks and rapid response. I also gave examples of where big data can result in more harm than good, for privacy, security, and the perception of data alone as meaningful product.

Paul Arnpriester, who works at Non-Profit BDM and CDW, concluded the day with talk about his work at the intersection of non-profits and corporations, developing strategic and mutually beneficial partnerships that allow economic sustainability within the non-profits.

Overall thoughts

I was impressed with the sophistication of efforts on a number of fronts. The people creating the cutting edge data science in industry are all applying their technology for social good with the same care. This was also a common point of overlap for many of the organizations. Peter Skomoroch reported that LinkedIn for Good was inspired by talks he and Dylan Field had with me about my crowdsourced disaster response work in Haiti in 2010. That work was run on theCrowdFlower platform. Idibon’s CTO, Schuyler Erle, ran damage assessments for FEMA following Hurricane Sandy which (I learned at the workshop) were incorporated in Palantir’s dashboard for disaster response professionals and volunteer allocation. Before moving to the US, I worked for the UN Refugee Commission in Liberia. So while not all of our activities overlap, a large number of organizations in big data and crowdsourcing are regularly coming together for social development.

The flipside of the sophisticated development is that the technology needed for big data and crowdsourcing at scale requires the best and the brightest. The margins for crowdsourcing are relatively small, and the technologies — which need to be run across multiple servers — cannot be downloaded as an application and run by small organizations. In order for people at the base of the economic pyramid to participate in the market, this means direct engagement with the large crowdsourcing companies and access to services provided by the organizations that live and breathe scale.

Rob Munro
November 2013

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>