Data for Diversity-Aware and Non-Discriminatory Technology

Ethical questions from the project “WeNet – the Internet of Us”

Presented at the Critical Data Studies Workshop at ICWSM 2019 in Munich, 11 June 2019 by Laura Schelenz, International Center for Ethics in the Sciences and Humanities – University of Tübingen

Recently, researchers in science and technology studies and the philosophy of technology have pointed out the problem of biased training data.

Training data informs the models that will be inferred through machine learning and result in biased algorithms that ultimately make decisions and provide recommendations to the users of software – in recruitment, college and loan applications, and criminal justice. Biased algorithms which are based on biased training data then unintentionally reinforce racism, sexism, and other forms of oppression (O’Neil 2016; 2 Noble 2018; Seaver 2017; Sandvig et al. 2016; Zarsky 2016; Wachter-Boettcher 2017; Tufekci 2015; Fefegha 2018; Criado Perez 2019). Training data is biased because data is always collected with a certain interest and prioritizes some categories of data over others.

Often, data exists only about the major ethnic group of a country but not about minorities or people not conforming with hegemonic ideas of a “normal” person. Ethnic minorities, women, gender non-conforming individuals, different-abled bodies, and other people who are marginalized in society are often not included in research, studies or government surveys, hence there is a lack of data about them.

Against this backdrop, the question is whether the collection of the massive amounts of data guarantees inclusion and justice for those who have previously been discriminated through software? Should we create large-scale diversity-aware datasets that minimize bias against people outside the norm?

For more info, download the “Data for Diversity-Aware and Non-Discriminatory Technology” presentation.

To stay updated to WeNet news and developments, subscribe to our newsletter