Algorithms are the wrong usual suspect!

Translated from French. Article originally published in Le Point on December 7th, 2017 in Le Postillon : N’accusons pas les algorithmes ! under the supervision of Said Mahrane.

While most of us are scared by killer robots and by the loss of millions of jobs when thinking of artificial intelligence, only a few of us are aware of the threat that already happens in many technologies: the algorithm discriminations. Algorithm discriminations result from excluding part of the users, who are not accurately represented in a set of technology criteria within the algorithm. The large majority of those discriminations are not intentional and are often due to unspoken thoughts from the creators of those technologies. Some experts such as Cathy O’Neil, author of Weapon of Math Destruction have even stated that algorithms are point of views embedded in code. We are not fully and explicitly aware of the biases which when revealed, become morally unacceptable or legally punishable. It became critical for developers and users, to be more responsible towards the technologies that we are using and/or developing. We tend blindly trust algorithms to make decisions for us because we think that mathematics are impartial; but the reality is more complex, and the algorithms are not guilty. We should take our responsibility!

Let’s mention a few examples of past technology discriminations.In 2016, an Afro-American research student Joy Buolamwini at the Media Lab of the Massachusetts Institute of Technology, realized that the facial recognition application she wanted to use for her research study, did not recognize her face. The bias was not detected because all the tests were conducted on white skin tone faces. In 2014, women had to wait for the second version of the application Health from Apple to be able to enter their menstruation dates. This bias was revealed by the female users, and was quickly fixed by then engineers. In the field of natural language processing, it is possible to define mathematically a level of correlation between two words. Two words that can be switched from one to the other in a sentence without modifying the meaning of it, have a high level of correlation. This correlation method is highly used in semantic analysis. In 2016, this approach was applied on Google News articles in order to categorize job names according to their gender. The goal was to relate a male job to his female homologous. The result is sadly surprising: the female position of medical doctor is nurse, and the one of software developer is house wife. Last but not least, a research study from Washington University in 2012 underlines the misrepresentation of women CEO in a Google Image result with only 11 % of them, whereas there are 30 % in reality. The examples listed above are in many cases due to the mistakes in the representativeness of the data collected, to the way it is analyzed, and to the set of assumptions and criteria retained in the model, hence the algorithm that was developed to answer a question or to make a prediction.

Artificial intelligence and more specifically the machine learning, is in a way an amplificator of the consequences of algorithm biases, because it adds some additional biases in the data on which the algorithm is trained. In machine learning, in order to answer a question, the algorithm categorizes the solutions during a supervised training. In this supervised training, the algorithm that has a priori set of already defined criteria, answers to questions to which we actually know the exact answers. The algorithm provides answers based on its original set of criteria. We indicate to the algorithm whether the answer is correct in order to develop a categorization that the algorithm is going to use along the way to define the correct answer. That being said, it is obvious that an algorithm that is trained on a set of data that is not representative, leads to what we name data and algorithm biases. Let’s take the example of the image recognition algorithm developed by Google in 2015 that was by mistake identifying people with dark tone skins as gorillas. It is most likely that the algorithm was not trained on a representative sample of images with all skin tones. During its supervised training, if no image of dark skin tone people was proposed to the algorithm, the algorithm might not recognize as human a dark tone skin person. Worse, it might identify this person as a gorilla. By contrast, the categorization process of human being is much more sophisticated. For instance, a person from a tribe in the Amazon forest who meets for the first time a person from a western country with a particular outfit and attitude, will have no difficulties to identify this person as a human being. There are significant differences between the computer and the human being that we should be aware of. The anthropomorphic characteristics that we give to any robot, computer or algorithm, may make us underestimate its real and current capabilities.

It is the right time to become more responsible to biases in artificial intelligence in order to guarantee the benefits for all of us, and to develop a critical sense to enable us to get involved in the societal debates on technologies and AI. Anyone, in the technology play game, from the developer to the user, should understand the underlying mechanisms of the current digitalo-analytic transformation, that adds to the use of digital tools, the collect and the analysis of data. There are solutions that need to be investigated and explored not only at the scale of individuals, but also at the scale of the state, in order to optimize our chances to succeed.

A diverse Tech ecosystem. Diversity on gender, ethnicity, language, religion, age or sexual orientation within teams allow them to develop better ideas and products. Diversity is not a political mission, it is actually a real and urgent mission for the society and for the economy to achieve. Americans who state Diversity is good for business already understood the benefits of diversity on the economy growth of a company, the development of its products and services, as well as its agility to adapt to a world that continuously evolves.

More Open Data platforms. By feeding current available Open Data platforms, people provide organizations with data that helps create a representative sample on which algorithms are trained. By using Open Data platforms, developers of technologies have limited control over the source of the data, hence they limit the risk of biases. In addition, anyone can extract and use this data for research purposes for instance, and conduct some analysis and correlations that can help detect inconsistency in the data itself. The city of New York successfully used its Open Data platform to improve significantly its infrastructures.

Learning how to code. Learning how to code is an eye opener to think, criticize or simply understand the underlying mechanisms of technologies. Without becoming a software developer, being exposed to coding is a powerful way to capture the semantic of this discipline, to understand the process of categorization, or the influence and impact of the choice of algorithm. It allows anyone to understand the concept of algorithm bias and technology discrimination.

The collaboration between algorithms and human beings has a long and beautiful life to go. This union will be successful through the understanding and inclusion of our differences, and by a better understanding of the technologies and its challenges, by all of us, users and developers.

Dr Computational Scientist and Entrepreneur

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store