Open Technology Institute
Data and discrimination, collected essays, policy paper.
Seeta Peña Gangadharan
Oct. 27, 2014.
Despite significant political and cultural transformations since the Civil Rights movement and other social upheavals of the Sixties and Seventies, discrimination remains a problem. And while persistent inequities stem from a complex set of factors, digitally automated systems may be adding to these problems in new ways. From White House officials to civil rights advocates to “quants” and “techies,” many have begun to question the power of algorithmically driven systems to categorize, nudge, prime, and differentially treat people in ways that can exacerbate social, economic, and racial inequities. This collection of essays features contributions from eleven researchers addressing different facets of data-driven discrimination, including its political, social, and historical implications. The collection grows out of a research convening held by New America’s Open Technology Institute in May 2014.
Data and Discrimination
Collected essays.
Seeta Peña Gangadharan
Despite significant political and cultural transformations since the Civil Rights movement and other social upheavals of the Sixties and Seventies, discrimination remains a problem. And while persistent inequities stem from a complex set of factors, digitally automated systems may be adding to these problems in new ways. From White House officials to civil rights advocates to “quants” and “techies,” many have begun to question the power of algorithmically driven systems to categorize, nudge, prime, and differentially treat people in ways that can exacerbate social, economic, and racial inequities. This collection of essays features contributions from eleven researchers addressing different facets of data-driven discrimination, including its political, social, and historical implications. The collection grows out of a research convening held by New America’s Open Technology Institute in May 2014.
- open data 2
- surveillance
- discrimination
- civil rights 2
Ohio State nav bar
The Ohio State University
- BuckeyeLink
- Find People
- Search Ohio State
Data and Discrimination (Essay Collection)
Data and Discrimination . “This collection of essays features contributions from eleven researchers addressing different facets of data-driven discrimination, including its political, social, and historical implications ” Direct link to PDF
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.
Journal Articles, Book Chapters, and Peer-Reviewed Conference Papers
Written conference talks, working papers, essays, op-eds, and other writings, panel and workshop proposals, workshop position papers, examplar videos, podcasts, installations, project pages.
This site is mainly intended for danah and for those interested in knowing what she is thinking. If you want to reference or use information from unpublished papers, please contact danah first. Thank you.
Sunlight Foundation
Avoiding prejudice in data-based decisions.
It’s important to know the potential problems of releasing individual-level data that can arise from bad data management practices, as well as the practices we can implement to lessen the likelihood of individual harm. At the same time, we should also think about the problems of the release of microdata — individual-level data — in broader context. Beyond the problems faced by specific people whose private information is incorrectly released, advocates have observed regular ways that microdata release can heighten risk of harm for specific communities. A series of projects issued over the last several years have provided essential texts for thinking about how microdata — as collected into “big data” — increases existing patterns of societal discrimination.
The essence of these arguments lies in an important, and perhaps counterintuitive observation: Using data and technology in a decision-making process doesn’t make a decision automatically free of problematic (and possibly illegal) social discrimination. Advocates have observed that in a number of situations, the additional collection and use of individual-level data can result in an entrenchment of discriminatory patterns, even as it becomes harder to see how this is done. The “big data” that’s used for algorithmic judgments about financial risk, housing, insurance or employment fitness invisibly incorporates the effects of human prejudices. As a result, relying on these large datasets to operate without oversight can lead decision-makers to discriminate against people who are already more likely to face discrimination, even while these data-based judgements stem less obviously from human prejudice.
In order to prevent data-driven decision-making from reincorporating patterns of prejudice, it is essential that datasets and algorithms be evaluated and audited for potentially discriminatory effects. Reviewers should consider how data collection, machine-learning processes and training materials, and category definitions might introduce — even inadvertently — elements of bias to the analysis.
Because data stemming from criminal-justice-related events are often legally used to prevent people from accessing social goods — such as the right to vote, certain kinds of employment, or places to live — the question of whether released individual-level data can systemically harm traditional subjects of discrimination looms particularly large. Arrest data provides a good example: Because individual-level, identified arrest data can create additional problems for people who are identified within the dataset — impacting their ability to get employment, housing or credit — the release of this data may especially disadvantage people from groups that are disproportionately arrested. Particularly because arrests are not the same thing as convictions, the release of this data can incorrectly label an individual as criminal.
Just as potentially problematic as data collection and release is the nontransparent use of microdata in automatic and nontransparent decision-making. Automatic decisions, produced by computer algorithms , can effectively be discriminatory when those algorithms were developed on the basis of discriminatory materials. Algorithms are built through the use of “training sets” drawn from past, human-made decisions. Algorithms to determine who constitutes a good employment candidate , for example, are drawn from lists of characteristics of existing employees. If humans regularly made discriminatory employment choices in the materials used to develop the algorithm, the computer will as well.
The potential for microdata use and collection to regularly, if inadvertently, amplify the effects of social discrimination is a serious problem associated with microdata release, and several recent projects on rights problems in big data have articulated ways for data managers to approach the problem. One valuable collection of materials is available in concert with a year-long set of projects launched through the UC Berkeley Center of Law. In April, a symposium entitled “ Open Data: Addressing Privacy, Security, and Civil Rights Challenges ” brought together legal, technical and privacy scholars to consider the specific problems that open data posed in connection with privacy law and principles. Another valuable collection of perspectives was developed by the New America Foundation’s Open Technology Institute, which articulated a range of specific concerns in Data and Discrimination: Collected Essays . Although a separate legal regime, the comparative perspective offered by Keiron O’Hara’s examination of the problem in British context can nonetheless be useful.
Perhaps the most comprehensive set of recommendations in this area have been put forward by the Leadership Conference on Civil and Human Rights (LCCHR). Organizing a group of leading civil rights organizations, the LCCHR identified a variety of concerns about how big data (including big criminal justice data) negatively affects communities of color. The Civil Rights Principles of the Era of Big Data release puts forth a set of new rights to protect in order to prevent the following systemic problems.
Discriminatory automatic decisions
Taking the problem of problematic data-based decision-making one step further, human decision-making is sometimes removed altogether from important administrative processes and replaced with an algorithm. Particularly when this problem is combined with stigmatized social categories — like people identified as “convicted felons” — correction can be challenging and the impact can be consequential. The purging of over 50,000 voters from Florida’s list of eligible voters in advance of the 2000 elections offers an important example of how unsupervised algorithmic decision-making can exacerbate existing social discrimination. When given the instruction to develop a broad, “fuzzy-matching” list of potential felons from Florida’s voter list, ChoicePoint/DBT Online produced a large dataset; but Florida officials did not independently verify its accuracy , and tens of thousands of voters were disenfranchised. Many of these people were from already politically underrepresented groups.
Lack of control over personal information and inability to correct inaccuracies
Since the birth of a major movement to improve privacy law in the 1970s, a core principle of government data use has been the requirement to provide “notice and consent” — that is, to provide information to people so they know the use to which their information will be put, and to give them the option to refuse. The full list of “Fair Information Practice Principles” (FIPPs) includes a number of specific ways that individuals can be guaranteed fair notice and access to their information. Implementation of these practices remains the best goal for any data collection or aggregation effort, but these guidelines are very frequently disregarded both by public and by private data managers. Indeed, the problem of precisely how to best implement notice and consent principles in the current technological environment remains pressing and unsolved.
Where data are being collected and used without transparency, inaccurate or illegal uses of data also become invisible. Current documented illegal uses include cases that result in criminal charges, including LeapLab’s sale of collected personal financial data from hundreds of thousands of people to an entity that withdrew money from people’s bank accounts. Inaccurate data can be particularly harmful to individuals. For example, legal cases have arisen from data brokers sharing data that included the false status of individuals as sex offenders .
Advocates are especially concerned about how nontransparent means of data collection by both governments and private actors ultimately produce storehouses of large, complex datasets about individuals, which then get used in new and potentially discriminatory ways. Commercial data brokers , who collect, package and resell access to personally identifying data, have been the subject of official concern for a long time. Indeed, the Federal Trade Commission (FTC) identified “the lack of transparency among companies providing consumer data for credit and other eligibility determinations [as leading] … to the adoption of the Fair Credit Reporting Act.” In 2014, the FTC issued a new analysis of the current risks and benefits to American consumers posed by data brokers who use their billions of pieces of personal data to guide marketing, assess risks and detect fraud. They observed that, currently, “many data broker practices fall outside of any specific laws that require the industry to be transparent, provide consumers with access to data, or take steps to ensure that the data that they maintain is accurate.” The FTC’s recommendations to mitigate these risks include passing legislation to require data brokers to adhere to notice and consent principles about their data holdings, including with new approaches to how this works for large data aggregations.
For the law enforcement community, these recommendations are particularly important in connection with data brokers’ risk mitigation and people-search services , since the problems of nontransparent practices by data brokers affect not only consumers, but also government agencies. Law enforcement agencies contract with data brokers to obtain information about the communities they police. This practice — contentious for many rights advocates, who see it as potentially evading legal warrant requirements — was most publicly exposed in connection with the FBI’s contracting with ChoicePoint to access their data about all American consumers beginning in 2002. (ChoicePoint was acquired by LexisNexis and currently operates as LexisNexis Risk Solutions, marketing itself directly to law enforcement .) The problems of inaccurate data become greatly magnified when these data are used in law enforcement activities.
—–
As we increasingly come to depend on data and technology in our public and private decision-making, it is critical to know that while using a computer can produce important value, it does not guarantee us a substantively “objective” outcome. Because of the way that data are collected and interpreted, it is certainly possible for the additional use of individual-level data to reinforce existing problematic patterns, if users are not aware of the potential for this to occur.
Art and Design: Visual Resources Center
Data and discrimination: collected essays., bibliographic citation, list of authors, date issued.
IMAGES
VIDEO
COMMENTS
In the pages that follow, the authors address data-driven discrimination in a wide variety of contexts, for example, health, public utilities, and retail. They identify a broad range of concerns, such as the difficulty of replicating data collection and analytics processes whose inspection might reveal insights into discrimination by algorithm ...
This collection of essays features contributions from eleven researchers addressing different facets of data-driven discrimination, including its political, social, and historical implications. The collection grows out of a research convening held by New America's Open Technology Institute in May 2014.
This collection of essays features contributions from eleven researchers addressing different facets of data-driven discrimination, including its political, social, and historical implications. The collection grows out of a research convening held by New America's Open Technology Institute in May 2014.
Data and Discrimination (Essay Collection) October 28, 2014 at 1:59pm by [email protected]. Data and Discrimination. "This collection of essays features contributions from eleven researchers addressing different facets of data-driven discrimination, including its political, ...
data and discrimination: collected essays new america open technology institute edited by seeta peÑa gangadharan with virginia eubanks and solon barocas . an algorithm audit christian sandvig associate professor, communication studies and school of information, university of michigan
1 Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms Christian Sandvig*1, Kevin Hamilton2, Karrie Karahalios2, & Cedric Langbort2 Paper presented to "Data and Discrimination: Converting Critical Concerns into Productive Inquiry," a preconference at the 64th Annual Meeting of the International Communication
Abstract. Organizations often employ data-driven models to inform decisions that can have a significant impact on people's lives (e.g., university admissions, hiring). In order to protect people's privacy and prevent discrimination, these decision-makers may choose to delete or avoid collecting social category data, like sex and race. In this article, we argue that such censoring can ...
Data & Discrimination: Collected Essays (Eds. Seeta Peña Gangadharan and Virginia Eubanks): pp. 43-57. Musto, Jennifer and danah boyd. ... "Networked Employment Discrimination." Data & Society Working Paper, Future of Labor Project, October 8. Rosenblat, Alex, Tamara Kneese, and danah boyd. (2014).
Reviewers should consider how data collection, machine-learning processes and training materials, and category definitions might introduce — even inadvertently — elements of bias to the analysis. ... which articulated a range of specific concerns in Data and Discrimination: Collected Essays.
Gangadharan, Seeta Pena, Virginia Eubanks, and Solon Barocas. "Data and discrimination: Collected essays." Open Technology (2014).