Data-Pop Alliance is an initiative on Big Data and development jointly created by the Harvard Humanitarian Initiative (HHI), the MIT Media Lab, and the Overseas Development Institute (ODI).
Our goal is to promote 'a people-centered Big Data revolution': i.e. spur a fundamental change in how personal digital data is leveraged to improve decision-making processes and empower people in ways that avoid the pitfalls of a new digital divide, de-humanization and de-democratization.
Data-Pop Alliance serves as a designer, broker and implementer of ideas and activities that brings together institutions and individuals around common principles and objectives, focusing on a limited number of thematic areas, notably official statistics, conflict and violence, demo-economic methods, climate change, data literacy and ethics, and human capital.
We work through a variety of means, including collaborative research projects, training and capacity building, technical assistance and strategic support, knowledge creation and curation, and advocacy.
Data-Pop Alliance has representatives and office spaces in Cambridge (Massachusetts), London, and New York City, where we are incubated by ThoughtWorks NYC. Our main funder is the Rockefeller Foundation.
Harvard Business Review interview with Alex ‘Sandy’ Pentland on Big Data, The Internet of Things, Privacy, and The New Deal on Data
In an interview available in a Google Hangout video and edited formats, conducted for the November issue of HBR on The Internet of Things, HBRsenior editor Scott Berinato recently spoke with MIT Professor and Data-Pop Alliance Academic Director Alex "Sandy” Pentland about Big Data, the Internet of Things, Privacy and the New Deal on Datahe started talking about 6 years ago.
Professor Pentland, who pioneered the use of sensor technology to collect data about people’s actions and interactions, has long warned against the backlash and deleterious effects that may result from our collective failing to adequately address privacy concerns in the age of Big Data.
The gist of Professor Pentland’s proposed New Deal on Data is to recognize that companies that collect data about people do not hold the rights to ownership of these data—rather, that people should, giving them greater control over their data and their lives, and a say in all discussions on the future of Big Data.
Professor’s Pentland Human Dynamics group at MIT Media Lab has also developed tools to empower consumers by allowing them to gain control of their data. A team led by doctoral student and Data-Pop Alliance Research Affiliate Yves-Alexandre de Montjoye has developed openPDS, a personal data management tool that allows individuals to collect and store their data and grant differential access to third parties.
What’s being done with data?
Bloomberg Beta, a venture capital firm which launched in June of last year is using predictive analtytics to try to find entrepreneurs before they even start a company, using algorithms to evaluate work history, educational history, all information that is publicly available online.
Perhaps most interesting is that many of the future entrepreneurs they identified challenge common stereotypes about entrepreneurs. For example, forty percent were over the age of 40 and some didn't have any technical experience.
Counting trees to save the woods: using big data to map deforestation
Global Forest Watch (GFW) is an online platform that combines hundreds of thousands of satellite images, high-tech data processing and crowd-sourcing, to provide near-real time data on the world’s forests. Their goal is “to enable governments, companies, NGOs, and the public to better manage forests, track illegal deforestation and more”. Since its inception, GFW has faced challenges common to initiatives that aim to enable public use of big data problems, such as: a lack of public data, barriers to participation, and confusion over terminology. In this article they offer the main lessons learned.
What should watch out for?
When the over reliance on predictive technologies makes us predictable, "facsimiles of ourselves".
Just the Facts? This Dossier Goes Further
A journalists experiences the risks of data profiling when she discovers her data profile for sale. Compiled by a databroker from public sources like voter records, real estate records, motor vehicle records and the author’s public social media accounts, the profile contained factual errors, but most disturbingly, speculated on her personal motives or feelings, with the potential to affect her reputation.
Transparency, Big Data and Internet Activism
In this age of Big Data, its still true that “Important data flows upward, not sideways.” Sifry calls the current paradigm of personal data ownership “digital sharecropping”. He explains:
“The original sharecroppers were people who were allowed to live on agricultural land in exchange for giving the landlord a share of the crops they produced. Often they barely eked out a subsistence living. Today, people who post their content on platforms like Facebook, LinkedIn, Google, YouTube, and the like are all being digitally sharecropped. As the saying goes, “If you’re not paying for it, you’re the product.” The real customers are advertisers and other people who want to make money from our data”
Big Data, by allowing the colletion of massive amounts of information and providing real-time feedback loops could theoretically make the socialist dream of centralized government planing a reality, without the market distortions of past experiments. But as Morozov argues, success on this front ultimately depends on getting the governance of Big Data right. He writes:
“For all its utopianism and scientism, its algedonic meters and hand-drawn graphs, Project Cybersyn got some aspects of its politics right: it started with the needs of the citizens and went from there. The problem with today’s digital utopianism is that it typically starts with a PowerPoint slide in a venture capitalist’s pitch deck. As citizens in an era of Datafeed, we still haven’t figured out how to manage our way to happiness. But there’s a lot of money to be made in selling us the dials.”
The data revolution is coming and it will unlock the corridors of power
As the UN Expert Advisory Group on the Data Revolution meets in New York ,Claire Melamed, official raconteur of the group and Data-Pop Co-director, argues that the main task of the expert group is it to figure out:
“How to bring together the old and the new worlds of data – the government statisticians with the Silicon Valley developers; the citizen movements mapping their unmapped cities with the custodians of global numbers in UN agencies. All of these groups, and more, have something to contribute to increase the quantity, quality and usability of data – and to put it to work to improve people’s lives.”
As part of preparing for our panel “How to Leverage Big Data for Better M&E in Complex Interventions?” at the M&E Tech conference we’ve spent some time thinking lately about how big data is changing the nature of M&E and what this means for practitioners. Here are some of the articles that informed our debate:
On why a theory-less data science is useless in informing policy and the need for theory-driven models.
The death of program evaluation and a follow up article on The role of data
Argues that traditional program evaluation is obsolete in the age of big data.
Evaluation is Dead! Long Live Evaluation!
Counterpoint to Andrew Mean’s articles above; argues instead that evaluation as we know it is evolving.
Attributing causation in complex systems with big data is, well, complex. The following articles discuss some important methodological concerns raised by using big data for M&E:
Big data is not immune to classic “small data” problems: sampling populations, confounders, multiple testing, bias, and overfitting.
‘Data mining’ with more variables than observations
Discusses methodological issues when dealing with “fat” data sets with more variables than observations.
Is machine learning less useful for understanding causality, thus less interesting for social science?
Crowdsourced discussion ongoing on statistics and machine learning forum CrossValidated.
Two recent articles included in the inaugural post of our "Links We Like" series have focused on what may be missing in the current specialized discourse and thinking about the data revolution’s implications and requirements.
First, Evgeny Morozov’s piece, tellingly titled "The rise of data and the death of politics", provides an in-depth assessment of the far-reaching democratic implications of the possible advent of an "algorithmic regulation". His starting point is how technology in general—and in particular the advent of the Internet of Things—is fueling a new approach to governance in the US. As Morozov describes, smart sensors and meters ubiquitously found in everyday appliances and devices create real-time data ripe for automated analysis of patterns and trends that can be used to give dietary advice or alert the National Security Agency of suspicious activity. This way of monitoring our actions to automatically implement policy is what Morozov calls "algorithmic regulation".
But in Morozov's view, data analysis should not replace other means of designing and implementing policy. "By assuming that the utopian world of infinite feedback loops is so efficient that it transcends politics," he says, "the proponents of algorithmic regulation fall into the same trap as the technocrats of the past." His argument is that while such methods can appear apolitical, they pose, in fact, a real threat to democracy: by entrusting the task of monitoring and directing human behavior to algorithms, we would deprive individuals of traditional policy-making means and systems. Thus, such approaches are not apolitical; rather they leave the political decisions to those who design the algorithms. Typically these decisions also aim to deal with symptoms (which can be tracked by sensors and devices) rather than causes (which are often much harder to trace and go beyond the reach of algorithms and datasets).
Morozov’s point is best illustrated by the case of law enforcement tactics. Algorithmic regulation can help identify whether people are paying their taxes in accordance with the law. But this information becomes irrelevant if the tax code contains loopholes that allow certain people to pay less than their fair share of taxes. This gets at the crux of the issue: the question of what constitutes a ‘fair share’ of taxes is fundamentally a political one, and not one that can be resolved by algorithms. To answer it, Morozov argues, we must stick to traditional politics and avoid adopting the technocratic bias that currently dictates our use of data. In Morozov's words, "algorithmic regulation is perfect for enforcing the austerity agenda while leaving those responsible for the fiscal crisis off the hook."
In a post on the ICTs for Development blog titled "The Data Revolution Will Fail Without A Praxis Revolution", Richard Heeks makes a similar argument regarding the use of Big Data in development. Just as algorithmic regulation should not replace political debates about fiscal issues, Heeks stresses that Big Data is no panacea for development: what's missing in this case, he says, is a "praxis revolution". Praxis refers to the process through which policy decisions are implemented in the form of actions, and Heeks suggests that data-revolution-for-development proponents have put too much focus on turning data into information, and not enough focus on using that information to decide which actions will lead to desired results.
Heeks argues that the resources spent on data collection and analysis might better be directed towards improving praxis: by this, he means rethinking the way we design data-revolution-for-development projects to focus more on decision-making and concrete action. Like Morozov, Heeks warns against relying on Big Data at the implementation level of the policy-making chain, stressing that the technocratic approach "assumes digital decisions and actions are some apolitical and rational optimum", "denies the importance of politics and thus neuters political debate", and "diverts attention from the causes of society's ills to their effects with the attitude: ‘there's an app for that’."
Morozov and Heeks bring up important issues that are too often left aside in discussions about the potential and implications of the data revolution—including the risks posed by a technocratic-technological overreliance on and overconfidence in the power and soundness of data-driven models. There needs to be a greater, richer, public debate to ensure that better data and solid data analytics reinforce, but do not replace, democratic processes.
Note: Haishan Fu, Director of the Data Development Group at the World Bank, and Emmanuel Letouzé, Director of Data-Pop Alliance, will discuss related considerations in a forthcoming blogpost.
Read more on our blog
About data privacy and ownership
First of three posts leading up to the Conference on the Ethics of Data in Civil Society, which will be hosted by the Stanford Center on Philanthropy and Civil Society's Digital Civil Society Lab on September 15 and 16. The conference will bring together scholars, activists, policy makers and funders to discuss the ethical challenges raised by digital data. The Harvard Humanitarian Initiative, one of Data-Pop Alliance’s founding institutions, is one of the hosting organizations and Data-Pop Alliance co-directors, Patrick Vinck and Emmanuel Letouzé are on the conference planning committee.
Health Apps can sell your data to insurance companies, and there’s nothing you can do about it
Explores some of the fundamental privacy related issues around data in mobile health and health apps and argues that there are currently no regulations safeguarding patient information collected by health-tracking apps.
On the requirements and risks of the data revolution
Explores what turning data into outcomes requires, arguing that "we have not yet connected the data revolution to a praxis revolution for development".
The rise of data and the death of politics
Discusses the limits of the new data-based approach to governance – 'algorithmic regulation' - in guiding policy, and asks "if technology provides the answers to society's problems, what happens to governments?".
The Data Revolution: Orwellian nightmare or boon to people power?
Lays out a vision where data empowers citizens, arguing that we need to "harness the political energy around the data revolution to ensure some concrete improvements in data, accountability and ultimately outcomes for the world’s poorest people".
Big Data, privacy, and civil disobedience
Makes the case that the rising use of data collected from self-tracking devices in informing health policy may lead people to believe that health issues, such as obesity, are the sole responsibility of the individual and their "bad" behaviour and choices, rather than being the result of political/economic/social structures.
On strengthening national statistical systems
Summarizes the outcomes of a recent discussion at an Experts' Workshop on "Towards a Strategy for the Data Revolution", held July 11-12, organized by ODI, stating that "country action should drive the [data] revolution, bottom-up not top-down."
Data revolution in development: countries come first!
Emphasizes the importance of international support for increasing national statistical capacity, arguing that "without quality data produced at the country level, along with serious coordination between international players, regional institutions, and countries themselves, we will once again be left with assumptions, estimates and guesses".
In my view, Big Data can fuel a Data Revolution. Much of its appeal stems from its potential — true or false — to find and refine data to yield ‘insights’ about human populations that can power more agile and better targeted policies and programmes. I have at least three issues with this general line of reasoning, and propose instead a “knowledge security-centred” approach to Big Data for human development.
My three issues are the following. First, the ‘insights’ approach says nothing about how data will actually be turned into policy; it, largely erroneously as history shows, assumes that bad policies and outcomes result primarily from lack of data or information on the part of decision makers — such that better ‘insights’ about poverty will somehow mechanistically lead to less of it.
Second, what qualifies as an ‘insight’, or, more fundamentally what ‘insights’ are is unclear; the term is used to avoid having to talk about ‘information’ —and even more so, knowledge — both of which have well-defined meanings backed by significant theoretical work.
Third, its explicit or implicit reference to fossil fuel (Big data being the “new oil”) overlooks or downplays the negative impacts that the ‘old oil’ has typically had on human development, which led to the development of the ‘resource-curse’ theory — rooted in elite capture. It also overlooks the historical and historiographical lessons of another Revolution — the Green one— and of much of the literature on food security since then — with their central message that defeating hunger and famine is as much a political as a techno-scientific endeavour; as much about press freedom as about fertilizer use.
With this in mind, I wonder: how can the ‘Big Data revolution’ serve human development and avoid the advent of a ‘techno-scientifically induced data curse’ caused by the de-humanization and de-democratization of decision-making processes; a situation where only a handful can access and analyse data and have the ability to extract and use the resulting ‘insights’? In other words, what is or should — in a normative sense — the ‘Big Data revolution’ be about?
My answer is: knowledge security.
Knowledge is commonly considered to be the last stage of the data-information-knowledge transformation chain — in much the same way that nutrition is the ultimate goal of the food chain. The parallel between food and data as inputs in processes affecting human populations’ well-being through their bodies and minds is an especially rich one.
And I want to argue that a ‘real’ or desirable Big Data revolution entails and requires putting in place the conditions necessary for societies to enjoy knowledge security, a concept mirrored on that of food security.
According to the United Nations, food security “exists when all people, at all times, have physical and economic access to sufficient, safe and nutritious food that meets their dietary needs and food preferences for an active and healthy life”. Knowledge security is centred on data as inputs and its four pillars or preconditions are the same as those of food security: availability, access, utilization and stability.
What would the pillars of knowledge security look like? In what follows, the only major changes to the description of the official FAO food security framework is the substitution of ‘data’ or ‘knowledge’ for ‘food’, as appropriate, plus a few minor edits left apparent.
Promoting ‘knowledge security’ in the age of Big Data may entail improving:
1. Data availability — i.e. “the availability of sufficient quantities of data of appropriate quality, supplied through domestic production or imports (including data aid)”.
This principle brings out the importance of producing data that meets societal demands and needs. It also stresses how ‘quality’, as characterized in the 2nd Fundamental Principle of Official Statistics, has to remain a central concern in the production of data by official statistical systems — i.e. official statistics.
2. Data access — i.e. “the access by individuals to adequate resources (entitlements) for acquiring appropriate data for a nutritious diet to enhance their knowledge. Entitlements are defined as the set of all commodity bundles over which a person can establish command given the legal, political, economic and social arrangements of the community in which they live”.
This highlights the importance of transparency, user-friendliness and visibility in the presentation of data. For example, as underlined by Enrico Giovannini, knowing that “95% of Google users do not go beyond the first page, it is clear that either institutes of statistics structure their information in such a way as to become easily findable by such algorithms, or their role in the world of information will become marginal.”
3. Data utilization — i.e. “the utilization of data through adequate diet, clean water, sanitation and health care individual and collective processing to reach a state of nutritional well-being knowledge where all physiological information needs are met. This brings out the importance of non-data inputs in knowledge security.”
This critical point stresses the fundamental importance, mentioned by Enrico Giovannini, of “considering how [information] is brought to the final user by the media, so as to satisfy the greatest possible number of individuals …, the extent to which users trust that information (and therefore the institution that produces it), and their capacity to transform data into knowledge (what is defined as statistical literacy)”—to which I prefer the concept of data literacy and add that of graphic literacy (or ‘graphicacy’).
4. Data stability — i.e. “to be knowledge secure, a population, household or individual must have access to adequate data at all times. They should not risk losing access to data as a consequence of sudden shocks (e.g. an economic or climatic crisis) or cyclical events (e.g. seasonal data insecurity). The concept of stability can therefore refer to both the availability and access dimensions of knowledge security.”
Sustainable legal and policy frameworks are needed to ensure steady and predictable access to some data — aggregated, anonymized — held by corporations,in contrast to the ad hoc way researchers have tended to access CDRs in recent months (excepting the recent Orange D4D challenge).
This knowledge security framework would complement the Fundamental Principles of Official Statistics by sketching the four preconditions of knowledge security in the Big Data age. As with the food security framework, it does not provide concrete guidance as to how these preconditions can be met. But recognizing knowledge security as a central objective and key determinant of a Big Data revolution that would serve human development broadly may be an important first step.Source: http://post2015.org/2014/04/01/the-big-data-revolution-should-be-about-knowledge-security/
Read more on our blog
Why we created the Data-Pop Alliance: a global alliance & call for a people-centered Big Data revolution
As our lives have become increasingly digital, the amount and variety of data that the world’s population generates every day is growing exponentially, as are our capacities to extract ‘insights’ from them. The potential of ‘Big Data’ for human development and humanitarian action has stirred a great deal of both excitement and skepticism since the concept became mainstream at the dawn of the decade. But simply opposing the ‘promise and perils’ of Big Data is a dead end; recognizing their co-existence a mere starting point.
Looking a generation ahead, observing the persistent prevalence of absolute poverty, the rise of global inequality, and the many walls and ceilings impeding well-being, we wondered: what will it take for Big Data to have by then served the cause of human progress to the best of its ability and ours, as part of the larger “data revolution”? Our answer—our contribution—is the creation of Data-Pop.
There is no shortage of valuable publications and conferences, initiatives and working groups, proofs of concepts and lab projects, in the fast expanding universe of ‘Big Data for social good’. But we are frustrated by its high level of institutional fragmentation and corresponding lack of a coherent intellectual direction—especially in relation to the context and concerns of poor developing countries. Individual projects and research do not sufficiently build upon or learn from each other, and movement beyond the project and pilot stage towards the use of Big Data at scale will thus be difficult and probably inefficient. Too many discussions are rooted in ideologies and assumptions rather than in solid empirical findings and a clear theory of social change.
What we saw and see as missing is ‘something’—a player or a group of players—serving as a connecting hub, sounding board, and driving force, with the credibility and agility, the intent and capacity, to promote the kind of ‘Big Data revolution’ we feel is needed. What brought us and our organizations together is the conviction that Big Data must increase and not reduce the power of citizens: that the kinds of low granularity, high frequency, digital personal data (these 'digital breadcrumbs') passively emitted by humans ought to be leveraged to impact policies and politics for the benefit of people. We want to see Big Data amplify the voice and knowledge of the emitters of data, not just improve the insights and means of surveillance of corporations and governments. This will require a better informed, more empowered, global citizenry, and a deeper understanding of the appropriate balance between individual, social, governmental, and commercial interests—with the overarching ethical dimensions and implications.
This is why we created Data-Pop: to spur a ‘humanistic’, people-centered, Big Data revolution, cautiously, humbly, but resolutely, by providing an enabling environment for learning, information sharing, experimentation, evaluation and capacity building; to catalyze and coordinate developments and innovations in the use of Big Data to help serve the cause of human progress.
Data-Pop will be a place for the exchange of ideas and information and a broker and implementer of projects. We believe that structural impact will only come about through a range of connected activities, rather than through a single big initiative or a myriad of disjointed projects. We don’t know yet how Big Data can be best used for human development and social progress. Answers will come from a combination of opportunistic and strategic decisions and actions both on the supply and demand sides of the field. But these should be taken with an eye on the main prize: a future where Big Data improves lives and reduces inequalities, rather than one characterized by a new and widening digital divide.
It is only by linking and leveraging skills, perspectives, and resources in an inter-disciplinary, systematic, and collegial manner that we will collectively be able to make the most of the tremendous potential offered by Big Data to create more agile and more accountable sociopolitical ecosystems, while avoiding its main traps and pitfalls. In this, we are fortunate enough to be joined by an incredible number of institutional and individual partners in a wide range of fields and sectors, from computer science to humanitarian assistance, official statistics to statistical machine-learning, working in small non-governmental organizations and large international institutions, official bodies and academic establishments.
Of course, differences of views are and will be represented in Data-Pop—along, and at times at odds with, ‘expected’ political lines and economic interests. An obviously contentious question is: in a post-Snowden era, how much, how, by and for whom, when and for what purpose, should cell-phone data be collected, shared and analyzed? Addressing that question—and many others—won’t be easy. But our conviction, based on the lessons of past revolutions and our own experiences, is that the confrontation of competing perspectives coupled with the constant recall of our common objectives is the best and indeed only way to create constructive change.
And so this ‘launch blog post’ is also a call to action and connection to everyone willing to contribute to our mission statement: promoting a people-centered Big Data revolution for development and social progress.
Read more on our blog
Data-Pop Alliance's main added value is to develop and deliver projects with the following features:
- Our projects will seek to contribute to a few key strategic outcomes, so far identified as: strengthening Big Data ethics, literacy, capacity, strategy, and community (see Question 3 in the About section for details)
- we will employ a range of means, including training, research, technical assistance, working groups, events, fellowships, and online content creation and curation, as well as art
- we will initially focus on a limited number of thematic areas and sectors, notably urban poverty, environmental vulnerability, conflict prevention, humanitarian crises, data journalism and official statistics
- our projects will be systematically developed and delivered in collaboration or consultation with partners in the communities or countries they seek to benefit, and be subject to systematic evaluation feeding into an annual report.
Projects being developed include:
1. An online library and exchange platform
We are developing a fully customizable and curated online library of key contributions (article, papers, reports, blog posts, books) on Big Data for development, tagged by themes and structured around ten key questions, with analytical summaries and bibliographical references of all contributions.
The online library will be the centerpiece of Data-Pop's Resources content, which aims to provide comprehensive information on the state of the 'Big Data for development' space. Our blog, opened to original and cross-posted contributions, will feature content both as part of thematic series and standalone pieces.
Partial funding for the development of the library and is provided by the International Peace Institute and the Harvard Humanitarian Initiative.
2. Research on welfare and poverty monitoring
We have developed and are fundraising for an ambitious three-year research program, focusing on the analysis of exhaust and social media data for the purposes of monitoring key indicators of human welfare, such as poverty and vital rates, particularly in fragile and conflict regions.
The research agenda also places emphasis on methodological improvements such as sample bias correction and causal inference, as well as on the involvement of researchers from developing countries.
The first paper of the program will be funded by a generous contribution from the Agence française de développement (AFD) and involve researchers from five partner institutions.
3. Research and training on data literacy and 'graphicacy'
We are developing a research and training program to understand the determinants of and enhance data literacy and ‘graphicacy’ of key 'social amplifiers' in developing countries, notably of journalists, with hands-on applications using Big Data, online and offline training and other means and activities including visual arts.
The stepping stone of this project, generously funded by and implemented with our partner Internews, will consist in the publication, scheduled for September, of a White Paper on data literacy, followed by a dedicated workshop.
4. Technical assistance on Big Data and official statistics
An important component of our work for the next three years will deal with the applications and implications of Big Data for official statistics and the systems, institutions and people that produce them.
The initial output of this project will be a discussion paper titled "Big Data and Official Statistics: Think Again, Act Already", which will be available by mid-June.
5. Human rights implications and applications
Another priority component and concern of our work focus on the human rights dimensions and implications of Big Data. Particular emphasis will be placed the ethical concerns and considerations raised by cell-phone meta-data analysis in terms of privacy, confidentiality, security and literacy.
As part of this work, we are currently finalizing, in partnership with the D4D team, a White Paper that aims to provide an analytical framework and suggest operational avenues for the responsible collection, sharing and analysis of cell-phone metadata.
We will also be looking at the positive applications of Big data for human rights, starting by developing a research project leveraging Big Data to study and fight violence and discrimination against women with our partner the International Center for Advocates against Discrimination.
6. Data-Pop Alliance event series
Once our first round of fundraising is completed, we will engage in the development of a 'Data-Pop event series' leveraging the network's intellectual and convening power.
The series will include a yearly conference, research workshops and thematic, more operational round-tables.
Our first four events will focus on data literacy, Big Data ethics and human rights, Big Data and official statistics, and sample bias correction methods.
7. Data-Pop fellowship programme
Starting in 2015, we will launch the development of an ambitious fellowship programme meant to facilitate and support collaborative work and exchanges between students, researchers and practitioners of partnering organisations. More information will be available in the next few weeks.
One of our priorities is to provide easy-to-access and searchable resources about Big Data and development to anyone interested in learning or knowing more about the field.
This section currently features a recent article on key papers and actors in the field, and provides links to additional content. Many more resources will be made available soon.
As described in the Projects section, we are currently developing a fully customizable and curated library of key academic papers, institutional reports, press articles, etc, tagged by themes and structured around ten key questions, with analytical summaries and bibliographical references of all contributions, which will be the centerpiece of Data-Pop's Resources content.
This article was written by Emmanuel Letouzé as part of SciDev.Net's Spotlight on Big Data for development published April 15th, 2014
Big data: early years and foundational pieces
An early mention of the upcoming “Industrial Revolution of data” can be found in a blog by Joe Hellerstein, a computer scientist at the University of California, Berkeley. It was published in November 2008, a few months after Wired had claimed that ‘data deluge’ would signify “the end of theory” and make the ‘scientific method’ obsolete as numbers would speak for themselves. Then, in 2009, a group of leading computer and social scientists published a commentary in Science describing a new academic field that explores data to reveal patterns of individual and group behaviours: computational social science. In early 2010, The Economist ran an article on the data deluge as part of a special report that stirred significant interest and remains highly informative today. The Wall Street Journal’s The really smart phone feature published in 2011, and the New York Time’s The age of Big Data opinion article published in 2012 both had a similar impact.
Recently, there has been an explosion in the number of publications about big data and international development, but three reports published within a few months in 2011-2012 can be considered as seminal pieces in the field: the McKinsey Global Institute’s Big data: the next frontier for innovation, competition and productivity, The World Economic Forum’s Big data, big impact: new possibilities for international development and UN Global Pulse’s Big data for development: challenges and opportunities. Other noteworthy contributions include Martin Hilbert’s literature review Big data for development: from information- to knowledge societies and the chapter ‘Big data for conflict prevention’ in a report by the International Peace Institute. [...]
Read more in Full resources
|Algorithms (and algorithmic future)||in mathematics and computer science, an algorithm is a series of predefined instructions or rules written in a programming language designed to tell a computer how to sequentially solve a recurrent problem through calculations and data processing. The use of algorithms for decision-making has grown in several sectors and services such as policing and banking. This has led to hopes — and worries — about the advent of an ‘algorithmic future’ where algorithms may replace human functions, or even become an instrument for repression.|
|Big data||an umbrella term that, simply put, stands for one or more of three trends: the growing volume of digital data generated daily as a by-product of people’s use of digital devices; the new technologies, tools and methods available to analyse large data sets that are not designed for analysis; and the intention to extract policymaking insights from these data and tools.|
|Call Detail Records (CDRs)||the technical name for mobile phone data recorded by all telecom operators. CDRs contain information about the locations of those sending and receiving calls or text messages through operators’ networks, as well as data on time and duration.|
|Data revolution||a common term in development discourse since the High-Level Panel of Eminent Persons on the Post-2015 Development Agenda called for a ‘data revolution’ to “strengthen data and statistics for accountability and decision-making purposes”. It refers to a larger phenomenon than big data or the ‘social data revolution’ — defined as the shift in human communication patterns towards greater personal information sharing, and the implications of this.|
|Data scientist or data science||a professional or a field that focuses on solving real-world problems using large amounts of data by combining skills from often distinct areas of expertise: maths, computer science (for example, hacking and coding), statistics, social science and even storytelling or art.|
|(New) Digital divide||the differential access and ability to use information and communications technologies between individuals, communities and countries — and the resulting socioeconomic and political inequalities. The skills and tools required to absorb and analyse the growing amounts of data produced by such technologies may lead to a ‘new digital divide’.|
|False positives versus false negatives (or type I versus type II errors)||a false positive or type I error refers to a prediction or conclusion that turns out to be false — for example, a fire alarm going off when there is no fire, or an experiment indicating a medical treatment has worked when it had not. A false negative or type II error refers to cases when a study or a monitoring system fails to identify an event or effect that has occurred. Attempts to predict rare events, such as political revolutions, using increasingly rich data and powerful tools are expected to lead to more false positive than false negative results (also known as over-prediction).|
|Internal versus external validity||internal validity refers to the extent to which a causal relationship can be confidently established between two phenomena — a reduction in speed limit and a fall in road deaths, for example. This requires all other factors that may affect the outcome and offer alternative explanations to be taken into account; in this case, this would include a change in drinking habits. External validity refers to the extent to which a study’s conclusions can be confidently generalised to other situations and people. In other words, whether they would hold beyond the area and time for which they were established.|
|Statistical machine learning||a subset of data science, falling at the intersection of traditional statistics and machine learning. Machine learning refers to the construction and study of computer algorithms — step-by-step procedures used for calculations and classification — that can ‘learn’ when exposed to new data. This enables better predictions and decisions to be made based on what was experienced in the past, as with filtering spam emails, for example. The addition of “statistical” reflects the emphasis on statistical analysis and methodology, which is the main approach to modern machine learning.|
Listen to our Academic director Alex 'Sandy' Pentland and Advisory Board members Patrick Ball and Philipp Schönrock interviewed by Lou del Bello for SciDev.Net's Spotlight on Big Data for development
Additional resources are available on our Full resources page and by clicking on the links below:
Ali Kamil, Research Associate
Bruno Lepri, Senior Research Affiliate
Julia Manske, Research Affiliate
Yves-Alexandre de Montjoye, Research Associate
Espen Beer Prydz, Research Affiliate
Nadia Said, Designer
Bessie Schwarz, Research Affiliate
Beth Tellman, Research Affiliate
Jennifer Welch, Research Affiliate
Emilio Zagheni, Senior Research Affiliate
Data-Pop’s governance structure involves a 15-member Advisory Board and a 15 member Steering Committee, whose members bring deep expertise in a wide array of fields that will be critical to Data-Pop's work and development.
- The Advisory Board will meet at least once a year and provide high-level feedback and guidance to Data-Pop; some AB members may also supervise or participate in specific projects.
- The Steering Committee will aim to meet twice a year to provide more detailed and project-specific advice; some members will also animate thematic working groups and participate in projects.
Executive Director, ODI
Director of HHI and Professor, Harvard University
Professor and Director of IQSS, Harvard University
Director of the Brown Institute, Columbia & Stanford Universities
Director, World Bank Development Data Group
Executive Director, Center for Effective Global Action & Development Impact Lab , Berkeley
Visiting Scholar, Stanford University Center on Philanthrophy and Civil Society
General Director, Colombia National Statistical Office (DANE)
Professor, University of Southampton
Executive Director, Human Rights Data Analysis Group
Director, Centro De Pensamiento Estrategico Internacional (CEPEI)
Executive Director, Partnership For African Social and Governance Research (PASGR)
Distinguished Fellow at the Center for Policy Dialogue, Chair of Southern Voice on Post-MDG International Development Goals
Director, Technical Division, United Nations Population Fund (UNFPA)
Head of Post-2015 Team, United Nations Development Programme (UNDP)
Assistant Professor, Stanford University
Head of Research, Oxfam UK
Lead Economist, World Bank Development Data Group
Executive Director, Internews Center for Innovation and Learning
Vice President of Marketing Anticipation, Orange
Data Technical Specialist, United Nations Population Fund (UNFPA)
Head of Data-driven development, World Economic Forum USA
Manager, Paris21 Initiative
Head of Big Data task Force, Eurostat
Alliance Program Director and Adjunct Associate Professor, Columbia University
Director, Paris Île-de-France Complex Systems Institute (ISC-PIF)
Senior Director of Research, International Peace Institute
General Partner, Susa Ventures
Special Advisor, ICT4Peace Foundation
Co-founder, Build Up
“Data-Pop Alliance is the result of a very unusual collaboration between three of the world's leading institutions in their fields, joined by a myriad of institutional and individual actors that have decided to combine their unique strengths and perspectives on the opportunities and challenges of the Big Data revolution to make it a humanistic one.”
— Alex 'Sandy' Pentland, Academic director
The Harvard Humanitarian Initiative is a Harvard university-wide center created in 2005 to provide expertise in public health, social science, and other disciplines to relieve human suffering in war and disaster by advancing the science and practice of humanitarian response. HHI is supported by the Office of the Provost, the Harvard School of Public Health, and the participation of Faculty from over 12 Harvard schools.
Established in 1985, the MIT Media Lab actively promotes a unique, antidisciplinary culture, exploring beyond known boundaries and disciplines and encouraging the most unconventional mixing and matching of seemingly disparate research areas. The Lab is committed to looking beyond the obvious to ask the questions not yet asked–questions whose answers could radically improve the way people live, learn, express themselves, work, and play.
The Overseas Development Institute is a leading development policy think tank in the United Kingdom with an established international reputation. For 50 years the institute has been working with with public and private sector partners in developing countries to reduce poverty, alleviate suffering, and achieve sustainable livelihoods.
Core seed funding ($400,000) was generously provided by The Rockefeller Foundation "in support of establishing [...] Data-Pop Alliance, a global network that brings together individual and institutional actors involved in the ‘Big Data revolution’ to advance common objectives through the use of new digital data and analytics tools and methods that can foster human development and societal progress".
Additional project-based funding has also been provided by the Internews Center for Innovation and Learning, the International Peace Institute, the Agence Française de Développement (AFD), and the World Bank Innovation Labs.
We are incubated in New York City by ThoughtWorks NY, "a software company and a community of passionate, purpose-led individuals who think disruptively to deliver technology to address their clients' toughest challenges, all while seeking to revolutionize the IT industry and create positive social change".
As a global alliance, Data-Pop relies on and helps connect individuals and institutions operating in four main sectors: academia, non-governmental and civil society, official and intergovernmental, and private sector.
Institutions and individuals have and may become network members or 'partners' under various modalities, ranging from a Memorandum of Understanding to a handshake. We are currently finalizing a lightweight charter stating our strategic objective and guiding principles which we will ask network members and partners to adhere to.
Besides its three founding institutions, initial members of Data-Pop's network include:
- Academic and research institutes: Harvard University Institute for Quantitative Social Science, UC Berkeley Center for Effective Global Action and Development Impact Lab, Columbia University Brown Institute and Alliance Program, The University of Southampton Web Science Institute, Yale University Climate and Energy Institute and School of Forestry and Environment Studies, Stanford University Digital Civil Society Lab, the Institut des Systèmes Complexes-Paris Île-de-France, the Qatar Computing Research Institute (QCRI), Bruno Kessler Foundation (FBK).
- Non-governmental and civil society organizations: Internews, the International Peace Institute (IPI), Oxfam UK, the engine room, the Partnership for African Social and Governance Research (PASGR), The Centro de Pensamiento Estratégico Internacional (CEPEI), the International Center for Advocates Against Discrimination, The United States Institute of Peace, the Partnership on Open Data, the World Economic Forum USA.
- Official and intergovernmental institutions: The United Nations Population Fund (UNFPA), the Agence Française de Développement (AFD), the Paris21 initiative, Eurostat's Big Data unit, Colombia's Departamento Administrativo Nacional de Estadística (DANE), The World Bank Institute and Data Development Group.
- Individuals in both the for and not-for-profit sectors: Jay Ulfelder (independent consultant), Alison Cole (Open Society Justice Initiative), Linda Raftree (Kurante), Gary Milante (Stockholm International Peace Institute), Antoine Heuty (ULULA), Fredrik Sjoberg (New York University), Maurice Nsabimana (World Bank), Romesh Silva (John Hopkins School of Public Health), Simone Sala (Columbia University), Andreas Stuhlmüller (Stanford University), Mark Latonero (USC Annenberg), Nathan Eagle (Jana).
Who and what is Data-Pop Alliance?
Data-Pop (short for 'Big Data & People Project') Alliance is a new ‘think-&-do’ global initiative jointly created by the Harvard Humanitarian Initiative (HHI), the MIT Media Lab and the Overseas Development Institute (ODI), starting as a three-year joint initiative. In addition, Data-Pop brings together individual and institutional actors involved in the ‘Big Data revolution’ to advance common principles and objectives.
Data-Pop Alliance's leadership is composed of MIT Professor Alex ‘Sandy’ Pentland as Academic Director, co-founder and director Emmanuel Letouzé, co-founder and co-director Patrick Vinck (HHI), Phuong Pham (HHI) as HHI co-director, and Claire Melamed and Emma Samman as ODI co-directors. We have offices in Cambridge (USA), New York City, London, and a global network of members and partners.
Data-Pop Alliance’s activities are supported by a team of research affiliates, associates, and assistants, which we hope to see grow over time; its governance structure also involves a 15-member Advisory Board and a 15-member Steering Committee.
Data-Pop Alliance’s mission statement is to promote a 'humanistic', people-centered ‘Big Data revolution’ to foster human development and societal progress. Data-Pop was created to help fill gaps and connect dots and aims to become, as articulated in our launch blog post, a "connecting hub, sounding board, and driving force" in the 'Big Data for social good' space and the “Data revolution” at large.
The name 'Data-Pop', suggested by Advisory Board member Paul Ladd, intends to convey our primary thematic anchoring in the exploding 'Big Data' field and our key strategic objective of ensuring that these data and tools serve the interests of people across the globe, especially those of poor and vulnerable populations.
Our starting point is the recognition of Big Data's promise and perils, with optimists positing that Big Data presents a historical opportunity for more agile and targeted policies and programmes, and pessimists (or healthy skeptics) warning that Big Data will not mechanistically lead to human and societal progress, especially for the poor, and may have detrimental effects if it leads to the creation of a new digital divide and an overly technocratic approach to data collection, usage and decision-making.
We see three main challenges to overcome, which provide the rationale for creating Data-Pop:
- a scientific-technological bias in many ongoing discussions, at the expense of more careful consideration of the sociopolitical implications—including ethical and human rights dimensions
- poor institutional connectivity between humanitarians, development actors, data and computer scientists, and ethicists—characterized and caused by the lack of mechanisms to facilitate knowledge sharing
- limited political channels and technical capacities for the primary producers and users of data—local communities and groups, governmental bodies and officials, researchers, journalists—to be engaged fully and systematically in shaping the Big Data revolution.
What does or will Data-Pop try to do?
Data-Pop will seek to contribute to five key strategic outcomes:
- Big Data ethics, to promote the ethical use of personal data within the context of development and humanitarian action, with specific emphasis on strengthening societies’ ability to weigh in related debates
- Big Data literacy, to enhance ordinary citizens and social amplifiers’ ability to use and understand data and graphics
- Big Data capacity to evaluate, improve, design and/or help apply Big Data methodologies and tools, notably on issues of poverty measurement and sample bias
- Big Data strategy, to evaluate, improve, design and/or help implement strategies, policies and programs around the use of Big Data
- Big Data community, to provide an exchange platform and sounding board for all partners and interested individuals to share ideas, information and questions, and more.
Data-Pop's six initial thematic areas of focus are:
- Urban poverty
- Conflict prevention
- Humanitarian action
- Data journalism
- Official statistics
- Environmental vulnerability
We will be operating through seven key modalities:
- A website featuring an active blog that aims to draw international and national experts, give a voice to local actors, spur and facilitate knowledge creation and sharing, as well as a fully customizable and curated library of key resources in the field
- Training materials, modules and curriculums, both one line and on site, developed and implemented leveraging the tremendous resources of our network
- Research and policy papers, with a focus on evaluation and cooperation between researchers in the network
- Technical assistance on Big Data strategies and policies, for example working with Official Statistical Offices and cities in developing countries
- Working groups around specific themes, animated and coordinated by Data-Pop partners
- A Data-Pop Fellowship Programme allowing onsite collaboration between researchers and fellows from partnering institutions
- A Data-Pop event series, with conferences, thematic workshops and Datadives in developing countries.
How will Data-Pop Alliance operate?
Data-Pop will function as a broker and implementer of activities and ideas serving its mission, leveraging and connecting the resources and needs in the network. It will do so by soliciting, developing, seeking funding for, supporting and implementing projects leveraging the skills and resources of its members.
The Advisory Board will aim to meet yearly and provide high-level guidance to the network. The Steering Committee will meet at least twice per year to provide more detailed and project-specific advice.
Data-Pop seeks to raise both core and additional project-based funding for the 1st year of operations (September 2014 - August 2015). Contributions from funding partners may include monetary as well as in-kind or personnel resources. Partial seed funding has been provided by the three founding institutions as well as through generous project-based support from Internews, The Agence française de développement (AFD), the World Bank Institute and the International Peace Institute.
Other funding partners currently being approached for both core and project-based funding include major foundations, bilateral donors, intergovernmental and multilateral organizations, universities, and as well as philanthropists and private corporations.
Data-Pop was turned from an idea into a reality in less than six months, but we still have a lot of hard and exciting work ahead. An initial strategy meeting was held at ODI in London on January 17th, 2014, gathering about 20 key partners from the US East Coast and Europe.
Other initial key milestones are:
|September 2013 - April 2014||
|May - August 2014||
|September 2014 - August 2015||
|September 2015 - September 2017||
How can I get in touch or involved?
The best way is to contact us by filling out the form below. (We will not share nor sell your data.)
Logo design: Nadia Said
Web design: Emmanuel Letouzé & Nadia Said
Web development: Emmanuel Letouzé & Gabriel Pestre