A new report out of Princeton University suggests that computers that learn human language will, at the same time, learn human prejudices.
Written by computer scientists, the report, "Semantics derived automatically from language corpora necessarily contain human biases," discusses the use of a typical language-learning algorithm to determine the associations between various English words. The results show similar biases to those found in traditional psychology research, as well as throughout a number of topics. The authors noted they were able to replicate each bias they had tested through the use of the computer model.
One example found a gender bias, as male names were more typically associated with words such as "management" and "salary." Meanwhile, female names were found to be associated more often with words such as "home" and "family."
The report also found instances of racial bias, where European American names were more likely to be considered to be "pleasant" when compared to African American names.
Researchers for the study tested the same type of computer program used by Google's search interface and Apple's Siri, as well as additional forms of software that connect with human language, known as a machine-learning algorithm.
The report states that this type of algorithm is only capable of learning by example. Close to one trillion words were taught to the algorithm using text from the internet. The algorithm was not specifically looking for any bias, but it determined the meaning of words by looking at their proximity to one another.
To determine how well the computer found biases, researchers looked at the strength of associations between various occupations, such as doctor and teacher, and words that describe women, such as female and woman. Researchers found the computer had quite accurately predicted the number of women working in each profession without having any understanding of jobs or work.
Algorithms are used every day to make decisions in a number of fields, including health care, criminal justice, and advertising, to name a few. Researchers for the study say that because of this, it is important to understand the biases held by computers.
A story published earlier in the year by ProPublica discussed racial bias held by systems that were used to assign "risk scores" to criminal defendants. Risk scores that were created with this bias could cause harm by placing more black people behind bars for longer periods of time than white people in a similar situation.
Researchers suggest that in order to combat this, a long-term interdisciplinary research program should be created including both cognitive scientists and ethicists. In the meantime, they state that corpora should be chosen for training machine learning that holds as little prejudice as possible. They add that more complex artificial intelligence architectures should be considered, including cognitive systems.