Data-Pop Alliance is the first think-tank on Big Data and Development, co-created by the Harvard Humanitarian Initiative, MIT Media Lab, and Overseas Development Institute to promote a people-centered Big Data revolution.
We function as a distributed network, with our headquarters in New York, where we are incubated by ThoughtWorks NYC, co-directors and staff at HHI, MIT Media Lab and ODI, research affiliates in various organizations, and individual an institutional partners around the globe.
Our vision is a future where Big Data is leveraged to improve decisions and empower people, especially at the community level, in ways that avoid the pitfalls of a new digital divide, de-humanization and de-democratization.
Our work is organized along there main strategic streams:
Advancing Knowledge and Innovation with Big Data
Building Capacities and Connections in Big Data
Crafting Ethical and Equitable Systems for Big Data
We operate through research, training, technical assistance, convening, knowledge curation, and advocacy, with systematic engagement of local partners and communities.
Our six thematic areas of focus are official statistics, socio-demographic measures and methods, conflict and crime, humanitarian action, climate change and environmental stress, and data literacy and ethics.
Democratizing Big Data to Address Climate Change: The Science and Political Potential of Google Earth Engine
Last month I marched with 400,000 people in the streets New York City for the biggest climate demonstration in history. Yet we are never going to fix problems as scientifically, culturally and economically complex as climate change unless we democratize data. Imagine if anyone could type their zip code in their browser and see how climate change will change their community? Or if any community around the world, no matter how small or poor, could do sophisticated hydrological assessments to prepare for coming changes? What if the 400,000 in New York last month could lobby their city planner or congressperson this months with localized climate impact data?
Our project answers these questions through one example – vulnerability flooding. As a hydrologist and a climate change communications strategist, we believe the next innovation in Big Data will not be a scientific discovery or a more technical early warning system. The next breakthrough should challenge the dominant power dynamics and empower people by putting data into their hands.
We are building a climate change prediction and visualization tool for one hazard: flooding. As the magnitude and frequency of US flooding increases, there is a need for faster, more precise, and more reliable flood vulnerability mapping that accounts for social and physical factors, predicts who is at risk from oncoming storms, and projects scenarios for a changing climate. Through the state of the art online geo-technology platform Google Earth Engine, our algorithm uses publicly available physical and social data to show governments, businesses and the public the science behind their vulnerability to disaster. The prototype of the model built in 2013 draws on data in the cloud (including elevation, satellite imagery, and census data) to dynamically refine a surface of risk inside a coarse resolution flood prediction zone from a weather service, typically NOAA.
This year, we are partnering again with Google Earth Outreach, Google Crisis Maps, and Data-Pop Alliance to use the model for climate change predication and publicize the results in everyone’s Google Maps. Over the next year we will refine and validate the social and ecological predictors of vulnerability and convert the algorithm to the Google Earth Engine platform to make it available in the cloud. After this phase we will have a socio-ecological model that identifies hotspots of vulnerability. In the second phase will input various climate predictions for average precipitation, population, emissions scenarios and more to adjust vulnerability in a changing climate and in different policy futures.
The final product will be a web interface that shows what flood events will look like in the future. The localized science and analysis will help individuals understand the climate crisis and take control when preparing and responding to hazards. We could imagine the planner in Boulder Colorado using this model to show the importance of adaptation when rebuilding after their recent floods. We have spoken with an aid organization in Somalia who wishes to fortify communities from devastating floods but does no know which regions are most at risk. A citizen in a small town might logon to notice that their risk to flooding will increase in the next 10 years and take additional preparation steps. Simply put, the application facilitates adaptation, preparation, risk communication.
Our model turns the common Big Data model on its head demanding that we focus on getting more data to people rather than just ingesting data from people. We hope this project will be a springboard for questioning our obsession with acquiring more and better data, as well what and who our current science and technology privileges
Data-Pop Alliance organized two sessions on Big Data and M&E that I moderated at the M&E and Tech Conference in DC on September 25th and at its sister Deep Dive in NYC on October 3rd, sponsored by The Rockefeller Foundation, FHI360, and the GSMA. Two months and an evaluation later, I am seizing the opportunity handed out by the co-organizers Wayan Vota, from FHI360 and Kurante, and Linda Raftree, from Kurante and many places, to summarize these discussions and my reflections on the topic.
This is especially timely as I am finalizing, with co-authors Sally Jackson, M&E specialist at UN Global Pulse Jakarta Lab and Ana Aerias, our program Manager, a chapter on the topic for a forthcoming book on Complex Evaluation to be published by SAGE next year—the “Year of Evaluation”.
The DC session, titled “How to Leverage Big Data for Better M&E in Complex Interventions?”, was a panel discussion with Kalev Leetaru, Terry Leach, Andrew Means, Jack Molyneaux and Bridget Sheerin; my objective was to frame the ‘big picture’ and surface how Big Data could be relevant to M&E activities, starting from the simple (and simplistic) notion that so much ‘new real-time data’ could surely be ‘leveraged’ for their purpose. The NY session, which I chose to call “M&E of complex interventions in the age of Big Data: obsolescence or emergence?” was designed to be more interactive, with only two panelists—Brian d'Alessandro and Evan Waters—and get closer to the crux of the applications and implications of Big Data for M&E of complex interventions—understood as large-scale and multi-actors.
Both sessions attracted about 50 participants and, despite their differences, raised three main sets of considerations.
The first set can be called definitional-conceptual. As several panelists and participants underscored, it is important to get a common understanding of what we mean by Big Data since changing how we define ‘it’ will change the question of how ‘it’ may affect M&E. A few participants considered Big Data as ‘just’ large datasets—by that token, as one participant noted, ‘Big Data’ would be old news. And despite a general consensus about the novelty and associated opportunities and challenges presented by Big Data, specifics are often lacking.
In both sessions—and in NY in particular—I presented my vision of Big Data, articulated in an article and our recent draft primer, I argued that Big Data should no longer be considered as ‘big data’ characterized by the intersection of 3Vs (for Volume, Velocity and Variety), but rather as the union of 3Cs—for Crumbs (personal digital data particles), Capacities (new tools and methods) and Community (new actors with incentives)—creating a new complex system in its own right.
Considering Big Data as a complex system allows, or forces, us to reconsider the whole question of its relevance for and effect on M&E through a systems lens, and see the question’s complexity emerges more clearly. The question is not whether and how Big Data may improve existing M&E systems by providing more data. It is how this new complex system that reflects and affects the behaviors of complex human ecosystems subject to complex policies and programs may fundamentally change the theory and practice of monitoring and/or evaluating the effects of interventions. We also spent time ‘unpacking’ M&E between its ‘M’ and ‘E’ parts, and thought about how the ‘question’ may yield different answers for each.
This veered the discussion towards the question’s theoretical dimensions, and their practical implications. One hypothesis that stirred great interest in both sessions is whether Big Data may mean the advent of a new kind of M, as a mix of monitoring and (predictive) modeling, and the demise of ‘classic’ E, tilting the practice scale towards the former.
In a blog post circulated ahead of the Conference, and during the DC session, Andrew Means criticized ‘classic’ program evaluation for being too reflective and not predictive enough, backward looking rather than forward looking, too focused, in his words on “proving rather than improving”.
Many participants—and commentators (and Andrew Means), disagreed with the notion that Big Data meant the death of evaluation; rather it may accelerate its adaptation and a blending of real-time monitoring, predictive modeling and impact evaluation depending on specific contexts.
These considerations point to the centrality and difficulty of causal inference. In contrast to the ideal-typical case of a perfect Randomized Controlled Trial, it is indeed very difficult to infer causality between treatment and effect (to find ‘what works’) in complex systems with feedback loops, recursive causality etc.
On the one hand, Big Data may indeed enhance the evaluability challenge by adding additional feedback loops and complexity overall. On the other hand, highly granular data lend themselves to finding or creating natural or quasi-natural experiments, as a recent paper argued. A panelist in DC noted that new methods were being built and tools available to infer causality in times series. This suggests that causal inference—and related issues of sample biases and their implications for internal and external validity—will receive significant attention as one of the next frontiers of Big Data’s applications to social problems.1
The third set of issues included the questions’ ethical and institutional aspects. Several participants noted that term ‘Monitoring’ itself had taken a new connotation in the post-Snowden era. As several participants but also controversies and conferences recently underlined, the rise of Big Data—as a system—may require the development of new codes of ethics to reflect current technological abilities. In both sessions, panelist and participants agreed that M&E practitioners, given their experience and expertise in balancing competing demands for privacy (by and of individual and groups) and evidence and accountability (by and for citizens, donors..) should play a more active role in these debates in the near future. To contribute to this process, comments and feedback are welcome!
‘What is the future of official statistics in the Big Data era?’ – a workshop and public event, co-organized by Data-Pop Alliance, on January 19th, 2015 at the UK Royal Statistical Society, London
Data-Pop Alliance, together with Overseas Development Institute and the Royal Statistical Society, is organizing an invitation-only half-day closed workshop followed by a public moderated discussion that aim to explore the challenging questions at the interface of Big Data and Official Statistics.
The events will take place at the Royal Statistical Society on the afternoon and evening of 19 January.
The aim of the workshop will be to discuss whether and how Big Data may complement official statistics, but also the extent to which it may be a potential substitute, and the deeper questions that this possibility evokes.
The concept note describes the events in more detail.
The public event will focus on the question ‘What is the future of official statistics in the Big Data era?’ The panel discussion brings together leading experts who will explore the potential of Big Data to transform national statistical systems as well as concerns about the reliability and representativeness of these data, and over ethics, privacy, and the blurring of lines between formal and informal data sources.
- Chair: Denise Lievesley (Dean of Faculty, King's College)
- Kenneth Cukier (Data Editor at The Economist)
- Haishan Fu (Director of Development Data Group at the World Bank)
- Alex ‘Sandy’ Pentland (Professor at MIT and Academic Director of Data-Pop Alliance)
- Nuria Oliver (Scientific Director at Telefonica)
- John Pullinger (UK National Statistician)
The public event will take place on January 19th from 6:00 pm – 7.30 pm and will be followed by a reception.
To register for this event, please email firstname.lastname@example.org
The invitation to the public event can be found here.
People and energy flowed in ThoughtWorks' gallery space for several hours; several MIT Media Lab students and graduates and Data-Pop Alliance's affiliates demoed their work to guests who could also look at data-inspired cartoons while enjoying food and drinks.
As a reflection of the festive nature of the event, we had prepared an accordion fold brochure to present our team, strategy, and ongoing work, which aim at achieving 3 main outcomes to promote a people-centered Big Data revolution:
- Advancing Knowledge and Innovation with Big Data
- Building Capacities and Connections in Big Data
- Crafting Ethical and Equitable Systems for Big Data
We are and will be pursuing these through research, training, technical assistance, content curation, and convening, leveraging the unique strengths and unusual collaboration of our three parent institutions, partners and affiliates in a number of priority thematic areas, including official statistics, conflict and humanitarian action, climate change and environmental stress, to improve policies and empower people.
Two draft documents—a Primer on Big Data and Development and a White Paper on Big Data and Official Statistics—were uploaded and are made available for comments here; we hope to receive extensive feedback and suggestions that will help us finalize them in the next couple of weeks. Other documents including a white paper titled "Reflections on the Ethics of CDR Analytics" will be uploaded shortly.
The launch reception followed a strategy meeting of our Advisory Board, Steering Committee, and funding and technical partners, held at The Rockefeller Foundation in the afternoon, to present our progress to date and plans for 2015:
The Data-Pop Alliance team conveys its gratitude to our founding institutions HHI, MIT Media Lab and ODI (whose Executive Director had sent us this short video), incubator ThoughtWorks NYC, main funder The Rockefeller Foundation, initial supporters, partners, and affiliates, and all of those who attended the event or sent expressions of support and interest. We would also like to thank our guests who traveled from afar, notably our partners from AFD, Philippe Orliange and Thomas Roca, and from DANE Colombia, Diego Silva.
Video of Kevin Watkins, Executive Director of the Overseas Development Institute (ODI):
Launch reception today
After months of collective work, it is our pleasure, on behalf of Data-Pop Alliance and its three founding institutions—the Harvard Humanitarian Initiative (HHI), MIT Media Lab and the Overseas Development Institute (ODI)—to announce our launch reception on Monday, November 17th, from 5:30pm–9:00pm, co-hosted by The Rockefeller Foundation and our incubator ThoughtWorks NYC.
The event will be held in ThoughtWorks NY’s modern work and gallery space located at 99 Madison avenue to celebrate our Rockefeller grant, thank our initial supporters, partners and affiliates, showcase some of our ongoing and upcoming projects and events, and, above all enjoy good conversations, food and drinks with about 250 invitees working at the intersection of data and development.
The event's schedule will be as follows:
5:30pm: Doors open, networking, bubbles
6:30pm: Welcoming remarks
6:45pm: Showcasing of selected projects and events
9:00pm: We say goodbye
We hope you will join us! RSVP here.
Data-Pop Alliance will release three papers and make them available for comment online. A data science competition will also be announced.
This post summarizes the main highlights and perspectives of the discussions at and since the Ethics of Data in Civil Society Conference held on September 15 and 16 at Stanford University Center on Philanthropy and Civil Society (Stanford PACS) where I represented Data-Pop Alliance.
The conference, which gathered over 100 participants from academia, civil society organizations, and the private sector, was co-organized by Lucy Bernholz and Mark Hansen, both members of Data-Pop’s Advisory Board, as well as Rob Reich, and co-hosted by Stanford PACS, the Columbia School of Journalism-Stanford School of Engineering Brown Institute for Media Innovation, and one of Data-Pop Alliance’s three co-founding institutions, the Harvard Humanitarian Initiative. Data-Pop Alliance’s Director Emmanuel Letouzé and co-Director Patrick Vinck had both served on the Conference’s planning committee with several of our partners since September 2013.
The idea of holding this conference arose in the wake of Edward Snowden’s revelations in June of last year. More recently, the spate of serious data leaks of personally identifiable information, and the public outcry surrounding NHS England’s plan to share medical records data through care.data program have further contributed to making the ‘ethics of data’ a red-hot topic. Over the past 18 months, there has been growing recognition, as our Academic Director Alex ‘Sandy’ Pentland reiterated recently, that failure to effectively deal with the ethical dimensions and implications of Big Data may offset its potential benefits for societies, or just bring out the darkest sides of Big Data.
The conference was structured around work-streams that met over both days to tackle a wide array of topics—from the ‘big picture’ question of whether data may need its own ethical code to the details of operationalizing the responsible use of data by civil society organizations and citizens in the age of Big Data.
Does Big Data need its own ethical code?
Several groups tackled this central question: is data ethics an issue different from the broader discussion and standards of bioethics altogether? Or is it merely an issue of consent, autonomy, and privacy applied to a new context? Is the institutional framework created to deal with the ethical issues surrounding the advancements of medicine and science, such as Institutional Review Boards and the professionalization of bioethics, something that could be adopted for data ethics—and would it be sufficient? Or does Big Data raise new questions and risks that cannot be adequately addressed within existing ethical and legal frameworks, such as codes adopted by the medical professions or humanitarian organizations? A main message from these work-streams was the need to foster dialogue and connections between data practitioners and ethicists.
One group looked specifically at the case of the OECD Privacy Principles, developed in the late 1970s and adopted in 1980, which currently underlie many of the privacy regulations and laws around the world, and asked whether they were good enough for Big Data. The group identified various scenarios in which they were not sufficient. For example, the OECD Principle of Collection Limitation is at odds with the current practice of collecting as much data as possible. Similarly, the Purpose Specification Principle is not well suited to the age of Big Data, since potential uses of data can be difficult to anticipate at the time of initial collection. For instance, in 2004, Kaiser Permanente was able to identify adverse reactions to a popular painkiller that was subsequently pulled off the market by linking clinical and cost data for 1.4 million patients that were not initially collected for that purpose. Perhaps a more appropriate principle for a Big Data age might include a Collection Maximization Principle for Non-profits.
Guidelines for the responsible use of data by practitioners
Much in the same way that airplane pilots and medical doctors follow pre-defined step-by-step processes that have led to fewer airplane and medical accidents, two work-streams proposed data responsibility ‘checklists’. The proposed checklists would bring together ethical decisions that have to be made by organizations gathering data to try to mitigate potential sources of harm across the data pipeline–that is, before, during and after data collection. The Ethical Data Lifecycle includes an initial risk assessment taking into into account the sensitivity of the data, as well as the organizational capacity to handle data security and constant monitoring of the data in order to guard against data breaches.
‘Malpractice’ risks are real: crowd-sourced data used to draw maps meant to improve aid delivery and enhance transparency can also be used to gather intelligence by repressive governments and those that seek to target activists and volunteers in conflict zones. Lack of technical and ethical standards on the data collected and its use means that crisis mappers may inadvertently put themselves and the communities they are trying to help in danger.
A problem is that some of the requirements in the checklist—especially those relating to ensuring data security—require financial and technical capacities that many grassroots organizations do not have. In the development world, bootstrapped operations and relying on volunteers is an ethos, but the checklist is there to remind us of the risks of not ensuring a responsible use of data.
How do we empower and protect citizens in a data-driven society?
As we depend more and more on algorithms to automate parts of our life, privately-held algorithms may have potential negative societal effects. One example discussed at the Conference was the fact that Google search results for traditionally “black” names were more likely to display ads about arrest records and other negative information. Although there was a large consensus to resist the prospects of fully-automated algorithmic future, some participants argued that algorithms do have the benefit of being potentially auditable; in contrast, when a judge hands out a sentencing decision, it is hard to know how much came down to the judge’s racial prejudices or to the judge being hangry.
Many participants echoed calls for the auditing of algorithms by external third parties and a group sketched out just what this auditing would entail. While ensuring the secrecy of a company’s algorithm in light of competition concerns, a trusted third-party, referred to as an “AlgoCompass”, would get access to the source code in order to evaluate the training data used, the problem specification, the existence of self-reporting tools and even the cultural values of the algorithm creators. The audit would generate certification badges and consumer-facing reports as to the algorithm’s integrity while business themselves would benefit from a better quantification of the business risk and reputational risk associated with potential outcomes.
Deep learning algorithms are trickier, in that they sometimes yield results that not even their creators could have foreseen. While these results may be ‘rational’ from a purely statistical standpoint, the consensus was that a strong ethical argument existed for human override to bypass those that reinforced stereotypes, such as Google autocomplete results potentially perpetuating racist, sexist and homophobic stereotypes.
Towards a “creative commons” of data ethics
The work-stream I participated in looked at how information about terms of service could be provided in ways that would be understandable by all or most. Currently, many are so inscrutable that most users accept all terms and conditions without actually reading them—for all they know, they could very well have signed away their firstborn.
My group was inspired by the Creative Commons movement that simplified copyright notices and made them more accessible by creating six licenses with associated visual symbols and an easy-to-understand, one-page explanation of rights. This movement sought to give creators the right to decide how their works should be distributed. We explored options for icons related to Data Deletion, No Resale and Data Security. The main impediment or requirement that we identified was the necessity of a third-party to verify the compliance of organizations with the icons they adopt. Other projects that have sought to do something similar include Disconnect.me and the Mozilla privacy icons project, created by Aza Raskin who also wrote this article on the possibilities and limitations of this idea.
Follow-up and way forward
A “First quick summary” of the Conference by Lucy Bernholz can be found on her blog. As Lucy notes, work-streams have already generated follow up action plans. For example our friends at the engine room are fleshing out the Ethical Data Lifecycle started at the Conference during their Responsible Data Sprints.
Another area that is poised to get increasing attention is the notion of ‘group privacy’, understood as rights held by a group as a group rather than by its members individually. We at Data-Pop Alliance are writing a chapter dedicated to the issue for an upcoming book co-edited by Linnet Taylor.
Also, as we noted in a recent blog post, technical solutions like OpenPDS—short for Personal Data Store—developed at MIT Media Lab by a team led by Yves-Alexandre de Montjoye are being developed to support the vision of a New Deal on Data put forth as early as 2009, which would give people greater rights over the use of the data they emit—their data.
Harvard Business Review interview with Alex ‘Sandy’ Pentland on Big Data, The Internet of Things, Privacy, and The New Deal on Data
In an interview available in a Google Hangout video and edited formats, conducted for the November issue of HBR on The Internet of Things, HBR senior editor Scott Berinato recently spoke with MIT Professor and Data-Pop Alliance Academic Director Alex "Sandy” Pentland about Big Data, the Internet of Things, Privacy and the New Deal on Data he started talking about 6 years ago.
Professor Pentland, who pioneered the use of sensor technology to collect data about people’s actions and interactions, has long warned against the backlash and deleterious effects that may result from our collective failing to adequately address privacy concerns in the age of Big Data.
The gist of Professor Pentland’s proposed New Deal on Data is to recognize that companies that collect data about people do not hold the rights to ownership of these data—rather, that people should, giving them greater control over their data and their lives, and a say in all discussions on the future of Big Data.
Professor’s Pentland Human Dynamics group at MIT Media Lab has also developed tools to empower consumers by allowing them to gain control of their data. A team led by doctoral student and Data-Pop Alliance Research Affiliate Yves-Alexandre de Montjoye has developed openPDS, a personal data management tool that allows individuals to collect and store their data and grant differential access to third parties.
What’s being done with data?
Bloomberg Beta, a venture capital firm which launched in June of last year is using predictive analtytics to try to find entrepreneurs before they even start a company, using algorithms to evaluate work history, educational history, all information that is publicly available online.
Perhaps most interesting is that many of the future entrepreneurs they identified challenge common stereotypes about entrepreneurs. For example, forty percent were over the age of 40 and some didn't have any technical experience.
Counting trees to save the woods: using big data to map deforestation
Global Forest Watch (GFW) is an online platform that combines hundreds of thousands of satellite images, high-tech data processing and crowd-sourcing, to provide near-real time data on the world’s forests. Their goal is “to enable governments, companies, NGOs, and the public to better manage forests, track illegal deforestation and more”. Since its inception, GFW has faced challenges common to initiatives that aim to enable public use of big data problems, such as: a lack of public data, barriers to participation, and confusion over terminology. In this article they offer the main lessons learned.
What should watch out for?
When the over reliance on predictive technologies makes us predictable, "facsimiles of ourselves".
Just the Facts? This Dossier Goes Further
A journalists experiences the risks of data profiling when she discovers her data profile for sale. Compiled by a databroker from public sources like voter records, real estate records, motor vehicle records and the author’s public social media accounts, the profile contained factual errors, but most disturbingly, speculated on her personal motives or feelings, with the potential to affect her reputation.
Transparency, Big Data and Internet Activism
In this age of Big Data, its still true that “Important data flows upward, not sideways.” Sifry calls the current paradigm of personal data ownership “digital sharecropping”. He explains:
“The original sharecroppers were people who were allowed to live on agricultural land in exchange for giving the landlord a share of the crops they produced. Often they barely eked out a subsistence living. Today, people who post their content on platforms like Facebook, LinkedIn, Google, YouTube, and the like are all being digitally sharecropped. As the saying goes, “If you’re not paying for it, you’re the product.” The real customers are advertisers and other people who want to make money from our data”
Big Data, by allowing the colletion of massive amounts of information and providing real-time feedback loops could theoretically make the socialist dream of centralized government planing a reality, without the market distortions of past experiments. But as Morozov argues, success on this front ultimately depends on getting the governance of Big Data right. He writes:
“For all its utopianism and scientism, its algedonic meters and hand-drawn graphs, Project Cybersyn got some aspects of its politics right: it started with the needs of the citizens and went from there. The problem with today’s digital utopianism is that it typically starts with a PowerPoint slide in a venture capitalist’s pitch deck. As citizens in an era of Datafeed, we still haven’t figured out how to manage our way to happiness. But there’s a lot of money to be made in selling us the dials.”
The data revolution is coming and it will unlock the corridors of power
As the UN Expert Advisory Group on the Data Revolution meets in New York ,Claire Melamed, official raconteur of the group and Data-Pop Co-director, argues that the main task of the expert group is it to figure out:
“How to bring together the old and the new worlds of data – the government statisticians with the Silicon Valley developers; the citizen movements mapping their unmapped cities with the custodians of global numbers in UN agencies. All of these groups, and more, have something to contribute to increase the quantity, quality and usability of data – and to put it to work to improve people’s lives.”
As part of preparing for our panel “How to Leverage Big Data for Better M&E in Complex Interventions?” at the M&E Tech conference we’ve spent some time thinking lately about how big data is changing the nature of M&E and what this means for practitioners. Here are some of the articles that informed our debate:
On why a theory-less data science is useless in informing policy and the need for theory-driven models.
The death of program evaluation and a follow up article on The role of data
Argues that traditional program evaluation is obsolete in the age of big data.
Evaluation is Dead! Long Live Evaluation!
Counterpoint to Andrew Mean’s articles above; argues instead that evaluation as we know it is evolving.
Attributing causation in complex systems with big data is, well, complex. The following articles discuss some important methodological concerns raised by using big data for M&E:
Big data is not immune to classic “small data” problems: sampling populations, confounders, multiple testing, bias, and overfitting.
‘Data mining’ with more variables than observations
Discusses methodological issues when dealing with “fat” data sets with more variables than observations.
Is machine learning less useful for understanding causality, thus less interesting for social science?
Crowdsourced discussion ongoing on statistics and machine learning forum CrossValidated.
Two recent articles included in the inaugural post of our "Links We Like" series have focused on what may be missing in the current specialized discourse and thinking about the data revolution’s implications and requirements.
First, Evgeny Morozov’s piece, tellingly titled "The rise of data and the death of politics", provides an in-depth assessment of the far-reaching democratic implications of the possible advent of an "algorithmic regulation". His starting point is how technology in general—and in particular the advent of the Internet of Things—is fueling a new approach to governance in the US. As Morozov describes, smart sensors and meters ubiquitously found in everyday appliances and devices create real-time data ripe for automated analysis of patterns and trends that can be used to give dietary advice or alert the National Security Agency of suspicious activity. This way of monitoring our actions to automatically implement policy is what Morozov calls "algorithmic regulation".
But in Morozov's view, data analysis should not replace other means of designing and implementing policy. "By assuming that the utopian world of infinite feedback loops is so efficient that it transcends politics," he says, "the proponents of algorithmic regulation fall into the same trap as the technocrats of the past." His argument is that while such methods can appear apolitical, they pose, in fact, a real threat to democracy: by entrusting the task of monitoring and directing human behavior to algorithms, we would deprive individuals of traditional policy-making means and systems. Thus, such approaches are not apolitical; rather they leave the political decisions to those who design the algorithms. Typically these decisions also aim to deal with symptoms (which can be tracked by sensors and devices) rather than causes (which are often much harder to trace and go beyond the reach of algorithms and datasets).
Morozov’s point is best illustrated by the case of law enforcement tactics. Algorithmic regulation can help identify whether people are paying their taxes in accordance with the law. But this information becomes irrelevant if the tax code contains loopholes that allow certain people to pay less than their fair share of taxes. This gets at the crux of the issue: the question of what constitutes a ‘fair share’ of taxes is fundamentally a political one, and not one that can be resolved by algorithms. To answer it, Morozov argues, we must stick to traditional politics and avoid adopting the technocratic bias that currently dictates our use of data. In Morozov's words, "algorithmic regulation is perfect for enforcing the austerity agenda while leaving those responsible for the fiscal crisis off the hook."
In a post on the ICTs for Development blog titled "The Data Revolution Will Fail Without A Praxis Revolution", Richard Heeks makes a similar argument regarding the use of Big Data in development. Just as algorithmic regulation should not replace political debates about fiscal issues, Heeks stresses that Big Data is no panacea for development: what's missing in this case, he says, is a "praxis revolution". Praxis refers to the process through which policy decisions are implemented in the form of actions, and Heeks suggests that data-revolution-for-development proponents have put too much focus on turning data into information, and not enough focus on using that information to decide which actions will lead to desired results.
Heeks argues that the resources spent on data collection and analysis might better be directed towards improving praxis: by this, he means rethinking the way we design data-revolution-for-development projects to focus more on decision-making and concrete action. Like Morozov, Heeks warns against relying on Big Data at the implementation level of the policy-making chain, stressing that the technocratic approach "assumes digital decisions and actions are some apolitical and rational optimum", "denies the importance of politics and thus neuters political debate", and "diverts attention from the causes of society's ills to their effects with the attitude: ‘there's an app for that’."
Morozov and Heeks bring up important issues that are too often left aside in discussions about the potential and implications of the data revolution—including the risks posed by a technocratic-technological overreliance on and overconfidence in the power and soundness of data-driven models. There needs to be a greater, richer, public debate to ensure that better data and solid data analytics reinforce, but do not replace, democratic processes.
Note: Haishan Fu, Director of the Data Development Group at the World Bank, and Emmanuel Letouzé, Director of Data-Pop Alliance, will discuss related considerations in a forthcoming blogpost.
Read more on our blog
About data privacy and ownership
First of three posts leading up to the Conference on the Ethics of Data in Civil Society, which will be hosted by the Stanford Center on Philanthropy and Civil Society's Digital Civil Society Lab on September 15 and 16. The conference will bring together scholars, activists, policy makers and funders to discuss the ethical challenges raised by digital data. The Harvard Humanitarian Initiative, one of Data-Pop Alliance’s founding institutions, is one of the hosting organizations and Data-Pop Alliance co-directors, Patrick Vinck and Emmanuel Letouzé are on the conference planning committee.
Health Apps can sell your data to insurance companies, and there’s nothing you can do about it
Explores some of the fundamental privacy related issues around data in mobile health and health apps and argues that there are currently no regulations safeguarding patient information collected by health-tracking apps.
On the requirements and risks of the data revolution
Explores what turning data into outcomes requires, arguing that "we have not yet connected the data revolution to a praxis revolution for development".
The rise of data and the death of politics
Discusses the limits of the new data-based approach to governance – 'algorithmic regulation' - in guiding policy, and asks "if technology provides the answers to society's problems, what happens to governments?".
The Data Revolution: Orwellian nightmare or boon to people power?
Lays out a vision where data empowers citizens, arguing that we need to "harness the political energy around the data revolution to ensure some concrete improvements in data, accountability and ultimately outcomes for the world’s poorest people".
Big Data, privacy, and civil disobedience
Makes the case that the rising use of data collected from self-tracking devices in informing health policy may lead people to believe that health issues, such as obesity, are the sole responsibility of the individual and their "bad" behaviour and choices, rather than being the result of political/economic/social structures.
On strengthening national statistical systems
Summarizes the outcomes of a recent discussion at an Experts' Workshop on "Towards a Strategy for the Data Revolution", held July 11-12, organized by ODI, stating that "country action should drive the [data] revolution, bottom-up not top-down."
Data revolution in development: countries come first!
Emphasizes the importance of international support for increasing national statistical capacity, arguing that "without quality data produced at the country level, along with serious coordination between international players, regional institutions, and countries themselves, we will once again be left with assumptions, estimates and guesses".
In my view, Big Data can fuel a Data Revolution. Much of its appeal stems from its potential — true or false — to find and refine data to yield ‘insights’ about human populations that can power more agile and better targeted policies and programmes. I have at least three issues with this general line of reasoning, and propose instead a “knowledge security-centred” approach to Big Data for human development.
My three issues are the following. First, the ‘insights’ approach says nothing about how data will actually be turned into policy; it, largely erroneously as history shows, assumes that bad policies and outcomes result primarily from lack of data or information on the part of decision makers — such that better ‘insights’ about poverty will somehow mechanistically lead to less of it.
Second, what qualifies as an ‘insight’, or, more fundamentally what ‘insights’ are is unclear; the term is used to avoid having to talk about ‘information’ —and even more so, knowledge — both of which have well-defined meanings backed by significant theoretical work.
Third, its explicit or implicit reference to fossil fuel (Big data being the “new oil”) overlooks or downplays the negative impacts that the ‘old oil’ has typically had on human development, which led to the development of the ‘resource-curse’ theory — rooted in elite capture. It also overlooks the historical and historiographical lessons of another Revolution — the Green one— and of much of the literature on food security since then — with their central message that defeating hunger and famine is as much a political as a techno-scientific endeavour; as much about press freedom as about fertilizer use.
With this in mind, I wonder: how can the ‘Big Data revolution’ serve human development and avoid the advent of a ‘techno-scientifically induced data curse’ caused by the de-humanization and de-democratization of decision-making processes; a situation where only a handful can access and analyse data and have the ability to extract and use the resulting ‘insights’? In other words, what is or should — in a normative sense — the ‘Big Data revolution’ be about?
My answer is: knowledge security.
Knowledge is commonly considered to be the last stage of the data-information-knowledge transformation chain — in much the same way that nutrition is the ultimate goal of the food chain. The parallel between food and data as inputs in processes affecting human populations’ well-being through their bodies and minds is an especially rich one.
And I want to argue that a ‘real’ or desirable Big Data revolution entails and requires putting in place the conditions necessary for societies to enjoy knowledge security, a concept mirrored on that of food security.
According to the United Nations, food security “exists when all people, at all times, have physical and economic access to sufficient, safe and nutritious food that meets their dietary needs and food preferences for an active and healthy life”. Knowledge security is centred on data as inputs and its four pillars or preconditions are the same as those of food security: availability, access, utilization and stability.
What would the pillars of knowledge security look like? In what follows, the only major changes to the description of the official FAO food security framework is the substitution of ‘data’ or ‘knowledge’ for ‘food’, as appropriate, plus a few minor edits left apparent.
Promoting ‘knowledge security’ in the age of Big Data may entail improving:
1. Data availability — i.e. “the availability of sufficient quantities of data of appropriate quality, supplied through domestic production or imports (including data aid)”.
This principle brings out the importance of producing data that meets societal demands and needs. It also stresses how ‘quality’, as characterized in the 2nd Fundamental Principle of Official Statistics, has to remain a central concern in the production of data by official statistical systems — i.e. official statistics.
2. Data access — i.e. “the access by individuals to adequate resources (entitlements) for acquiring appropriate data for a nutritious diet to enhance their knowledge. Entitlements are defined as the set of all commodity bundles over which a person can establish command given the legal, political, economic and social arrangements of the community in which they live”.
This highlights the importance of transparency, user-friendliness and visibility in the presentation of data. For example, as underlined by Enrico Giovannini, knowing that “95% of Google users do not go beyond the first page, it is clear that either institutes of statistics structure their information in such a way as to become easily findable by such algorithms, or their role in the world of information will become marginal.”
3. Data utilization — i.e. “the utilization of data through adequate diet, clean water, sanitation and health care individual and collective processing to reach a state of nutritional well-being knowledge where all physiological information needs are met. This brings out the importance of non-data inputs in knowledge security.”
This critical point stresses the fundamental importance, mentioned by Enrico Giovannini, of “considering how [information] is brought to the final user by the media, so as to satisfy the greatest possible number of individuals …, the extent to which users trust that information (and therefore the institution that produces it), and their capacity to transform data into knowledge (what is defined as statistical literacy)”—to which I prefer the concept of data literacy and add that of graphic literacy (or ‘graphicacy’).
4. Data stability — i.e. “to be knowledge secure, a population, household or individual must have access to adequate data at all times. They should not risk losing access to data as a consequence of sudden shocks (e.g. an economic or climatic crisis) or cyclical events (e.g. seasonal data insecurity). The concept of stability can therefore refer to both the availability and access dimensions of knowledge security.”
Sustainable legal and policy frameworks are needed to ensure steady and predictable access to some data — aggregated, anonymized — held by corporations,in contrast to the ad hoc way researchers have tended to access CDRs in recent months (excepting the recent Orange D4D challenge).
This knowledge security framework would complement the Fundamental Principles of Official Statistics by sketching the four preconditions of knowledge security in the Big Data age. As with the food security framework, it does not provide concrete guidance as to how these preconditions can be met. But recognizing knowledge security as a central objective and key determinant of a Big Data revolution that would serve human development broadly may be an important first step.Source: http://post2015.org/2014/04/01/the-big-data-revolution-should-be-about-knowledge-security/
Read more on our blog
Why we created the Data-Pop Alliance: a global alliance & call for a people-centered Big Data revolution
As our lives have become increasingly digital, the amount and variety of data that the world’s population generates every day is growing exponentially, as are our capacities to extract ‘insights’ from them. The potential of ‘Big Data’ for human development and humanitarian action has stirred a great deal of both excitement and skepticism since the concept became mainstream at the dawn of the decade. But simply opposing the ‘promise and perils’ of Big Data is a dead end; recognizing their co-existence a mere starting point.
Looking a generation ahead, observing the persistent prevalence of absolute poverty, the rise of global inequality, and the many walls and ceilings impeding well-being, we wondered: what will it take for Big Data to have by then served the cause of human progress to the best of its ability and ours, as part of the larger “data revolution”? Our answer—our contribution—is the creation of Data-Pop.
There is no shortage of valuable publications and conferences, initiatives and working groups, proofs of concepts and lab projects, in the fast expanding universe of ‘Big Data for social good’. But we are frustrated by its high level of institutional fragmentation and corresponding lack of a coherent intellectual direction—especially in relation to the context and concerns of poor developing countries. Individual projects and research do not sufficiently build upon or learn from each other, and movement beyond the project and pilot stage towards the use of Big Data at scale will thus be difficult and probably inefficient. Too many discussions are rooted in ideologies and assumptions rather than in solid empirical findings and a clear theory of social change.
What we saw and see as missing is ‘something’—a player or a group of players—serving as a connecting hub, sounding board, and driving force, with the credibility and agility, the intent and capacity, to promote the kind of ‘Big Data revolution’ we feel is needed. What brought us and our organizations together is the conviction that Big Data must increase and not reduce the power of citizens: that the kinds of low granularity, high frequency, digital personal data (these 'digital breadcrumbs') passively emitted by humans ought to be leveraged to impact policies and politics for the benefit of people. We want to see Big Data amplify the voice and knowledge of the emitters of data, not just improve the insights and means of surveillance of corporations and governments. This will require a better informed, more empowered, global citizenry, and a deeper understanding of the appropriate balance between individual, social, governmental, and commercial interests—with the overarching ethical dimensions and implications.
This is why we created Data-Pop: to spur a ‘humanistic’, people-centered, Big Data revolution, cautiously, humbly, but resolutely, by providing an enabling environment for learning, information sharing, experimentation, evaluation and capacity building; to catalyze and coordinate developments and innovations in the use of Big Data to help serve the cause of human progress.
Data-Pop will be a place for the exchange of ideas and information and a broker and implementer of projects. We believe that structural impact will only come about through a range of connected activities, rather than through a single big initiative or a myriad of disjointed projects. We don’t know yet how Big Data can be best used for human development and social progress. Answers will come from a combination of opportunistic and strategic decisions and actions both on the supply and demand sides of the field. But these should be taken with an eye on the main prize: a future where Big Data improves lives and reduces inequalities, rather than one characterized by a new and widening digital divide.
It is only by linking and leveraging skills, perspectives, and resources in an inter-disciplinary, systematic, and collegial manner that we will collectively be able to make the most of the tremendous potential offered by Big Data to create more agile and more accountable sociopolitical ecosystems, while avoiding its main traps and pitfalls. In this, we are fortunate enough to be joined by an incredible number of institutional and individual partners in a wide range of fields and sectors, from computer science to humanitarian assistance, official statistics to statistical machine-learning, working in small non-governmental organizations and large international institutions, official bodies and academic establishments.
Of course, differences of views are and will be represented in Data-Pop—along, and at times at odds with, ‘expected’ political lines and economic interests. An obviously contentious question is: in a post-Snowden era, how much, how, by and for whom, when and for what purpose, should cell-phone data be collected, shared and analyzed? Addressing that question—and many others—won’t be easy. But our conviction, based on the lessons of past revolutions and our own experiences, is that the confrontation of competing perspectives coupled with the constant recall of our common objectives is the best and indeed only way to create constructive change.
And so this ‘launch blog post’ is also a call to action and connection to everyone willing to contribute to our mission statement: promoting a people-centered Big Data revolution for development and social progress.
Read more on our blog
We work through a variety of means, including collaborative research, training and capacity building, technical assistance, convening, knowledge brokering and curation, and advocacy.
Our work focuses on a number of thematic areas, notably official statistics, socio-economic and demographic methods, climate change and environment, conflict and crime, literacy and ethics, and human capital.
This is reflected in our work tracks.
Our work tracks:
1. Knowledge brokering and curation
We are developing a customizable curated e-library of key contributions on Big Data and development, tagged by themes and structured around questions, with analytic summaries and bibliographical references.
The e-library, to be completed in early 2015, will be the centerpiece of Data-Pop's Resources content, which aims to provide comprehensive information on the Big Data and development space. Funding for the library is provided by the United Nations Population Fund (UNFPA), with additional support from the International Peace Institute (IPI).
We are also curating a calendar of key events on Big Data and Development jointly with the World Bank Innovation Labs. You can suggest events!
2. Big Data and official statistics
An key component of our work for the next three years deals with the applications and implications of Big Data for official statistics and the systems, institutions and people that produce them.
The initial output of this track will be a White Paper titled on Big Data and Official Statistics in partnership with with Paris21, available in mid-November.
Another activity involves providing technical assistance and advice to support DANE (Colombia's National Statistical Office)'s Big Data strategy as part of their overall modernization efforts; this project will end by March 2015.
In addition, we will be organizing several events related to Big Data and official statistics, including a dedicated workshop and public discussion in January at the UK Royal Statistical Society, and another on Big Data and the future of policy at MIT Media Lab in April .
3. Socio-economic and demographic methods
We have developed and are fundraising for a multi-year research program dedicated to building sound methods the purposes of analyzing key indicators of human welfare, such as poverty, inequality, mobility, and vital statistics, notably in fragile and conflict regions.
The first research paper of the program, funded by the Agence française de développement (AFD) and involving researchers from five partner institutions, will focus on sample bias correction and small area estimation.
We are also working with QCRI to analyze millions of tweets from Egypt to understand poverty and inflation patterns and trends.
We will also be contributing a chapter on the opportunities and challenges of Big Data for M&E for an upcoming book on Complex Evaluation.
4. Big Data literacy and capacity
We are developing a research and training program to understand the determinants of and enhance Big Data literacy (and ‘graphicacy’) and capacities of key actors in developing countries, notably journalists, researchers, officials and community organizers, through a range of means and activities (including visual art).
The stepping stone of this track, funded by and implemented with our partner Internews, will be the publication of a White Paper on "Data Literacy in the Age of Big Data", followed by a dedicated workshop in mid-December 2014.
We are also working with our partner The Partnership for African Social and Governance Research (PASGR), Carnegie Mellon University in Rwanda, SciDev.Net and the UNSSC to develop and roll out a series of courses on Big Data and development for various constituencies in 2015.
5. Ethics and human rights
Another priority component and concern of our work focus on the ethical and human rights dimensions and implications of Big Data, notably cell-phone meta-data analysis.
We are currently finalizing, in partnership with the D4D Team, a White Paper that proposes an analytical framework and suggest operational avenues for the responsible collection, sharing and analysis of cell-phone metadata.
We are also developing a text mining project to study and fight discrimination against women with our partners the International Center for Advocates against Discrimination and data science consultancy CKM Advisors.
We are also writing a chapter on Big Data for a forthcoming book "Group Privacy: New Challenges of Data Technologies", edited by the Oxford Internet Institute.
6. Climate change and environment
Enhancing populations resilience to climate change and environmental shocks is one of Big Data's most promising uses.
One project, led by Data-Pop Affiliates Beth Tellman and Bessie Schwartz, has been awarded a Google Grant to develop a socio-ecological model in Google Earth Engine to estimate and address community-level vulnerability to flooding.
This socio-ecological model will be augmented with Call Detail Records (CDRs) from Senegal as part of the Data for Development Challenge organized by Orange.
This work track aims to produce outputs and ideas to bolster the Big Data and climate change agenda in the lead up to and during the United Nations Climate Change Conference, which will take place in Paris on Nov. 30th-Dec. 11th, 2015.
7. Conflict and Crime
Data-Pop Alliance will leverage the expertise of its leadership (Patrick Vinck, Phuong Pham), affiliates (Bruno Lepri, Jen Welch) and partners (HHI, IPI) to advance the responsible use of Big Data to understand, predict and prevent violence.
Two journal articles will be published in 2015:
Another ongoing project with the World Bank Innovation Labs focuses on the analysis of crime patterns and trends in Bogotá, using demographic and public transportation data.
Data-Pop Alliance is also a partner of the upcoming Build Peace 2015 conference.
9. Fellowship program
Starting in 2015, we will launch the development of an ambitious Fellowship program to facilitate collaborative work and exchanges between students, researchers and practitioners of partnering organizations, with a focus on South-North collaborations.
We are actively seeking funding for this exciting and important track.
8. Convening and connecting
An important role that Data-Pop Alliance intends to play is that of a convener and connector in the Big Data and development space, as well as with and within with the larger 'data revolution' ecosystem.
Recently, we organized two sessions on Big Data and M&E at the M&E and Tech Conference that took place in DC on September 25th, and its sister Deep Dive held in NYC on October 3rd.
One of our first major events will be an expert workshop followed by a public moderated discussion on “What is the future of Official Statistics in the Big Data era?” organized with the UK Royal Statistical Society and ODI at the RSS in London, on 19 January 2015.
On April 20-22, we will be co-hosting the Cartagena Data Festival, together with our partners ODI, UNDP, CEPEI and PARIS21, which will bring together 300 participants from across the globe. Data-Pop Alliance is leading two tracks: the Big Data and development track, and the Big Data challenge, to be announced soon.
We will also organize and lead a Big Data track at the Build Peace 2015 conference in Cyprus, on April 25-26 to discuss how to use Big Data for peacebuilding and conflict transformation with activists and technologists from around the world.
Data-Pop Alliance is also actively involved in discussions and efforts around the post-2015 'Data Revolution' as a core member of the Partners for a People-Centered Data Revolution group.
One of our priorities is to provide easy-to-access and searchable resources about Big Data and development to anyone interested in learning or knowing more about the field.
This section currently features a recent article on key papers and actors in the field, and provides links to additional content. Many more resources will be made available soon.
As described in the Projects section, we are currently developing a fully customizable and curated library of key academic papers, institutional reports, press articles, etc, tagged by themes and structured around ten key questions, with analytical summaries and bibliographical references of all contributions, which will be the centerpiece of Data-Pop's Resources content.
This article was written by Emmanuel Letouzé as part of SciDev.Net's Spotlight on Big Data for development published April 15th, 2014
Big data: early years and foundational pieces
An early mention of the upcoming “Industrial Revolution of data” can be found in a blog by Joe Hellerstein, a computer scientist at the University of California, Berkeley. It was published in November 2008, a few months after Wired had claimed that ‘data deluge’ would signify “the end of theory” and make the ‘scientific method’ obsolete as numbers would speak for themselves. Then, in 2009, a group of leading computer and social scientists published a commentary in Science describing a new academic field that explores data to reveal patterns of individual and group behaviours: computational social science. In early 2010, The Economist ran an article on the data deluge as part of a special report that stirred significant interest and remains highly informative today. The Wall Street Journal’s The really smart phone feature published in 2011, and the New York Time’s The age of Big Data opinion article published in 2012 both had a similar impact.
Recently, there has been an explosion in the number of publications about big data and international development, but three reports published within a few months in 2011-2012 can be considered as seminal pieces in the field: the McKinsey Global Institute’s Big data: the next frontier for innovation, competition and productivity, The World Economic Forum’s Big data, big impact: new possibilities for international development and UN Global Pulse’s Big data for development: challenges and opportunities. Other noteworthy contributions include Martin Hilbert’s literature review Big data for development: from information- to knowledge societies and the chapter ‘Big data for conflict prevention’ in a report by the International Peace Institute. [...]
Read more in Full resources
|Algorithms (and algorithmic future)||in mathematics and computer science, an algorithm is a series of predefined instructions or rules written in a programming language designed to tell a computer how to sequentially solve a recurrent problem through calculations and data processing. The use of algorithms for decision-making has grown in several sectors and services such as policing and banking. This has led to hopes — and worries — about the advent of an ‘algorithmic future’ where algorithms may replace human functions, or even become an instrument for repression.|
|Big data||an umbrella term that, simply put, stands for one or more of three trends: the growing volume of digital data generated daily as a by-product of people’s use of digital devices; the new technologies, tools and methods available to analyse large data sets that are not designed for analysis; and the intention to extract policymaking insights from these data and tools.|
|Call Detail Records (CDRs)||the technical name for mobile phone data recorded by all telecom operators. CDRs contain information about the locations of those sending and receiving calls or text messages through operators’ networks, as well as data on time and duration.|
|Data revolution||a common term in development discourse since the High-Level Panel of Eminent Persons on the Post-2015 Development Agenda called for a ‘data revolution’ to “strengthen data and statistics for accountability and decision-making purposes”. It refers to a larger phenomenon than big data or the ‘social data revolution’ — defined as the shift in human communication patterns towards greater personal information sharing, and the implications of this.|
|Data scientist or data science||a professional or a field that focuses on solving real-world problems using large amounts of data by combining skills from often distinct areas of expertise: maths, computer science (for example, hacking and coding), statistics, social science and even storytelling or art.|
|(New) Digital divide||the differential access and ability to use information and communications technologies between individuals, communities and countries — and the resulting socioeconomic and political inequalities. The skills and tools required to absorb and analyse the growing amounts of data produced by such technologies may lead to a ‘new digital divide’.|
|False positives versus false negatives (or type I versus type II errors)||a false positive or type I error refers to a prediction or conclusion that turns out to be false — for example, a fire alarm going off when there is no fire, or an experiment indicating a medical treatment has worked when it had not. A false negative or type II error refers to cases when a study or a monitoring system fails to identify an event or effect that has occurred. Attempts to predict rare events, such as political revolutions, using increasingly rich data and powerful tools are expected to lead to more false positive than false negative results (also known as over-prediction).|
|Internal versus external validity||internal validity refers to the extent to which a causal relationship can be confidently established between two phenomena — a reduction in speed limit and a fall in road deaths, for example. This requires all other factors that may affect the outcome and offer alternative explanations to be taken into account; in this case, this would include a change in drinking habits. External validity refers to the extent to which a study’s conclusions can be confidently generalised to other situations and people. In other words, whether they would hold beyond the area and time for which they were established.|
|Statistical machine learning||a subset of data science, falling at the intersection of traditional statistics and machine learning. Machine learning refers to the construction and study of computer algorithms — step-by-step procedures used for calculations and classification — that can ‘learn’ when exposed to new data. This enables better predictions and decisions to be made based on what was experienced in the past, as with filtering spam emails, for example. The addition of “statistical” reflects the emphasis on statistical analysis and methodology, which is the main approach to modern machine learning.|
Listen to our Academic director Alex 'Sandy' Pentland and Advisory Board members Patrick Ball and Philipp Schönrock interviewed by Lou del Bello for SciDev.Net's Spotlight on Big Data for development
Additional resources are available on our Full resources page and by clicking on the links below:
Ali Kamil, Research Associate
Bruno Lepri, Research Affiliate
Julia Manske, Research Affiliate
Yves-Alexandre de Montjoye, Research Associate
Maurice Nsabimana, Research Affiliate
Nuria Oliver, Research Affiliate
Espen Beer Prydz, Research Affiliate
Thomas Roca, Research Affiliate
Nadia Said, Designer
Bessie Schwarz, Research Affiliate
Fredrik M Sjoberg, Research Affiliate
Romesh Silva, Research Affiliate
Jacopo Staiano, Research Affiliate
Beth Tellman, Research Affiliate
Jennifer Welch, Research Affiliate
Emilio Zagheni, Research Affiliate
Data-Pop’s governance structure involves a 15-member Advisory Board and a 15 member Steering Committee, whose members bring deep expertise in a wide array of fields that will be critical to Data-Pop's work and development.
- The Advisory Board will meet at least once a year and provide high-level feedback and guidance to Data-Pop; some AB members may also supervise or participate in specific projects.
- The Steering Committee will aim to meet twice a year to provide more detailed and project-specific advice; some members will also animate thematic working groups and participate in projects.
Executive Director, ODI
Director of HHI and Professor, Harvard University
Professor and Director of IQSS, Harvard University
Director of the Brown Institute, Columbia & Stanford Universities
Director, World Bank Development Data Group
Executive Director, Center for Effective Global Action & Development Impact Lab , Berkeley
Visiting Scholar, Stanford University Center on Philanthrophy and Civil Society
General Director, Colombia National Statistical Office (DANE)
Professor, University of Southampton
Executive Director, Human Rights Data Analysis Group
Director, Centro De Pensamiento Estrategico Internacional (CEPEI)
Executive Director, Partnership For African Social and Governance Research (PASGR)
Distinguished Fellow at the Center for Policy Dialogue, Chair of Southern Voice on Post-MDG International Development Goals
Director, Technical Division, United Nations Population Fund (UNFPA)
Head of Post-2015 Team, United Nations Development Programme (UNDP)
Assistant Professor, Stanford University
Head of Research, Oxfam UK
Lead Economist, World Bank Development Data Group
Executive Director, Internews Center for Innovation and Learning
Vice President of Marketing Anticipation, Orange
Data Technical Specialist, United Nations Population Fund (UNFPA)
Head of Data-driven development, World Economic Forum USA
Manager, Paris21 Initiative
Head of Big Data task Force, Eurostat
Alliance Program Director and Adjunct Associate Professor, Columbia University
Director, Paris Île-de-France Complex Systems Institute (ISC-PIF)
Senior Director of Research, International Peace Institute
General Partner, Susa Ventures
Special Advisor, ICT4Peace Foundation
Co-founder, Build Up
“Data-Pop Alliance is the result of a very unusual collaboration between three of the world's leading institutions in their fields, joined by a myriad of institutional and individual actors that have decided to combine their unique strengths and perspectives on the opportunities and challenges of the Big Data revolution to make it a humanistic one.”
— Alex 'Sandy' Pentland, Academic director
The Harvard Humanitarian Initiative is a Harvard university-wide center created in 2005 to provide expertise in public health, social science, and other disciplines to relieve human suffering in war and disaster by advancing the science and practice of humanitarian response. HHI is supported by the Office of the Provost, the Harvard School of Public Health, and the participation of Faculty from over 12 Harvard schools.
Established in 1985, the MIT Media Lab actively promotes a unique, antidisciplinary culture, exploring beyond known boundaries and disciplines and encouraging the most unconventional mixing and matching of seemingly disparate research areas. The Lab is committed to looking beyond the obvious to ask the questions not yet asked–questions whose answers could radically improve the way people live, learn, express themselves, work, and play.
The Overseas Development Institute is a leading development policy think tank in the United Kingdom with an established international reputation. For 50 years the institute has been working with with public and private sector partners in developing countries to reduce poverty, alleviate suffering, and achieve sustainable livelihoods.
Core seed funding ($400,000) was generously provided by The Rockefeller Foundation "in support of establishing [...] Data-Pop Alliance, a global network that brings together individual and institutional actors involved in the ‘Big Data revolution’ to advance common objectives through the use of new digital data and analytics tools and methods that can foster human development and societal progress".
Additional project-based funding has also been provided by the Internews Center for Innovation and Learning, the International Peace Institute, the Agence Française de Développement (AFD), and the World Bank Innovation Labs.
We are incubated in New York City by ThoughtWorks NY, "a software company and a community of passionate, purpose-led individuals who think disruptively to deliver technology to address their clients' toughest challenges, all while seeking to revolutionize the IT industry and create positive social change".
As a global alliance, Data-Pop relies on and helps connect individuals and institutions operating in four main sectors: academia, non-governmental and civil society, official and intergovernmental, and private sector.
Institutions and individuals have and may become network members or 'partners' under various modalities, ranging from a Memorandum of Understanding to a handshake. We are currently finalizing a lightweight charter stating our strategic objective and guiding principles which we will ask network members and partners to adhere to.
Besides its three founding institutions, initial members of Data-Pop's network include:
- Academic and research institutes: Harvard University Institute for Quantitative Social Science, UC Berkeley Center for Effective Global Action and Development Impact Lab, Columbia University Brown Institute and Alliance Program, The University of Southampton Web Science Institute, Yale University Climate and Energy Institute and School of Forestry and Environment Studies, Stanford University Digital Civil Society Lab, the Institut des Systèmes Complexes-Paris Île-de-France, the Qatar Computing Research Institute (QCRI), Bruno Kessler Foundation (FBK).
- Non-governmental and civil society organizations: Internews, the International Peace Institute (IPI), Oxfam UK, the engine room, the Partnership for African Social and Governance Research (PASGR), The Centro de Pensamiento Estratégico Internacional (CEPEI), the International Center for Advocates Against Discrimination, The United States Institute of Peace, the Partnership on Open Data, the World Economic Forum USA.
- Official and intergovernmental institutions: The United Nations Population Fund (UNFPA), the Agence Française de Développement (AFD), the Paris21 initiative, Eurostat's Big Data unit, Colombia's Departamento Administrativo Nacional de Estadística (DANE), The World Bank Institute and Data Development Group.
- Individuals in both the for and not-for-profit sectors: Jay Ulfelder (independent consultant), Alison Cole (Open Society Justice Initiative), Linda Raftree (Kurante), Gary Milante (Stockholm International Peace Institute), Antoine Heuty (ULULA), Fredrik Sjoberg (New York University), Maurice Nsabimana (World Bank), Romesh Silva (John Hopkins School of Public Health), Simone Sala (Columbia University), Andreas Stuhlmüller (Stanford University), Mark Latonero (USC Annenberg), Nathan Eagle (Jana).
Who and what is Data-Pop Alliance?
Data-Pop (short for 'Big Data & People Project') Alliance is a new ‘think-&-do’ global initiative jointly created by the Harvard Humanitarian Initiative (HHI), the MIT Media Lab and the Overseas Development Institute (ODI), starting as a three-year joint initiative. In addition, Data-Pop brings together individual and institutional actors involved in the ‘Big Data revolution’ to advance common principles and objectives.
Data-Pop Alliance's leadership is composed of MIT Professor Alex ‘Sandy’ Pentland as Academic Director, co-founder and director Emmanuel Letouzé, co-founder and co-director Patrick Vinck (HHI), Phuong Pham (HHI) as HHI co-director, and Claire Melamed and Emma Samman as ODI co-directors. We have offices in Cambridge (USA), New York City, London, and a global network of members and partners.
Data-Pop Alliance’s activities are supported by a team of research affiliates, associates, and assistants, which we hope to see grow over time; its governance structure also involves a 15-member Advisory Board and a 15-member Steering Committee.
Data-Pop Alliance’s mission statement is to promote a 'humanistic', people-centered ‘Big Data revolution’ to foster human development and societal progress. Data-Pop was created to help fill gaps and connect dots and aims to become, as articulated in our launch blog post, a "connecting hub, sounding board, and driving force" in the 'Big Data for social good' space and the “Data revolution” at large.
The name 'Data-Pop', suggested by Advisory Board member Paul Ladd, intends to convey our primary thematic anchoring in the exploding 'Big Data' field and our key strategic objective of ensuring that these data and tools serve the interests of people across the globe, especially those of poor and vulnerable populations.
Our starting point is the recognition of Big Data's promise and perils, with optimists positing that Big Data presents a historical opportunity for more agile and targeted policies and programmes, and pessimists (or healthy skeptics) warning that Big Data will not mechanistically lead to human and societal progress, especially for the poor, and may have detrimental effects if it leads to the creation of a new digital divide and an overly technocratic approach to data collection, usage and decision-making.
We see three main challenges to overcome, which provide the rationale for creating Data-Pop:
- a scientific-technological bias in many ongoing discussions, at the expense of more careful consideration of the sociopolitical implications—including ethical and human rights dimensions
- poor institutional connectivity between humanitarians, development actors, data and computer scientists, and ethicists—characterized and caused by the lack of mechanisms to facilitate knowledge sharing
- limited political channels and technical capacities for the primary producers and users of data—local communities and groups, governmental bodies and officials, researchers, journalists—to be engaged fully and systematically in shaping the Big Data revolution.
What does or will Data-Pop try to do?
Data-Pop will seek to contribute to five key strategic outcomes:
- Big Data ethics, to promote the ethical use of personal data within the context of development and humanitarian action, with specific emphasis on strengthening societies’ ability to weigh in related debates
- Big Data literacy, to enhance ordinary citizens and social amplifiers’ ability to use and understand data and graphics
- Big Data capacity to evaluate, improve, design and/or help apply Big Data methodologies and tools, notably on issues of poverty measurement and sample bias
- Big Data strategy, to evaluate, improve, design and/or help implement strategies, policies and programs around the use of Big Data
- Big Data community, to provide an exchange platform and sounding board for all partners and interested individuals to share ideas, information and questions, and more.
Data-Pop's six initial thematic areas of focus are:
- Urban poverty
- Conflict prevention
- Humanitarian action
- Data journalism
- Official statistics
- Environmental vulnerability
We will be operating through seven key modalities:
- A website featuring an active blog that aims to draw international and national experts, give a voice to local actors, spur and facilitate knowledge creation and sharing, as well as a fully customizable and curated library of key resources in the field
- Training materials, modules and curriculums, both one line and on site, developed and implemented leveraging the tremendous resources of our network
- Research and policy papers, with a focus on evaluation and cooperation between researchers in the network
- Technical assistance on Big Data strategies and policies, for example working with Official Statistical Offices and cities in developing countries
- Working groups around specific themes, animated and coordinated by Data-Pop partners
- A Data-Pop Fellowship Programme allowing onsite collaboration between researchers and fellows from partnering institutions
- A Data-Pop event series, with conferences, thematic workshops and Datadives in developing countries.
How will Data-Pop Alliance operate?
Data-Pop will function as a broker and implementer of activities and ideas serving its mission, leveraging and connecting the resources and needs in the network. It will do so by soliciting, developing, seeking funding for, supporting and implementing projects leveraging the skills and resources of its members.
The Advisory Board will aim to meet yearly and provide high-level guidance to the network. The Steering Committee will meet at least twice per year to provide more detailed and project-specific advice.
Data-Pop seeks to raise both core and additional project-based funding for the 1st year of operations (September 2014 - August 2015). Contributions from funding partners may include monetary as well as in-kind or personnel resources. Partial seed funding has been provided by the three founding institutions as well as through generous project-based support from Internews, The Agence française de développement (AFD), the World Bank Institute and the International Peace Institute.
Other funding partners currently being approached for both core and project-based funding include major foundations, bilateral donors, intergovernmental and multilateral organizations, universities, and as well as philanthropists and private corporations.
Data-Pop was turned from an idea into a reality in less than six months, but we still have a lot of hard and exciting work ahead. An initial strategy meeting was held at ODI in London on January 17th, 2014, gathering about 20 key partners from the US East Coast and Europe.
Other initial key milestones are:
|September 2013 - April 2014||
|May - August 2014||
|September 2014 - August 2015||
|September 2015 - September 2017||
How can I get in touch or involved?
The best way is to contact us by filling out the form below. (We will not share nor sell your data.)
If you have specific questions you can also email us at email@example.com .
Logo design: Nadia Said
Web design: Emmanuel Letouzé & Nadia Said
Web development: Emmanuel Letouzé & Gabriel Pestre