The veil of Maya: black-boxes and Explainable AI

Tags: AI, XAI
Date: 2025-04-06

Imagine traveling back in time and finding yourself in a world that has never known electricity, appliances, telephones, computers, or the internet. Would you be able to explain how all these things work? With a few rare exceptions, probably not.

Industrial society is the first society in human history where most individuals don't understand the tools and mechanisms on which it is founded. However, what reassures us is the fact that there is always someone who thoroughly understands how any technology surrounding us functions, and because of this, we trust it. Even if we don't know how the internet works, for example, we would be able to understand it if we decided to ruin our lives by learning about it.

However, a new revolution is underway and even this certainty might fade away.

One of the intrinsic problems of AI and Machine Learning is that some models (often the best-performing ones) aren't truly comprehensible to anyone. While those who implement a machine learning model always know the mechanisms that will lead the "machine" to learn information, they may not always understand what the machine has actually learned, and consequently how it takes decisions. These type of models are called "black boxes."

Black boxes

In high school we were taught that a function from a set X to a set Y assigns to each element of X exactly one element of Y. In simpler terms, a function allows us to consistently transform an input X into an output Y.
But what happens if we have a series of X's and their corresponding Y's, but don't know the function that trasforms inputs into outputs? If we wanted to calculate the output Y for a new value of X, what should we do? We would need to find a way to obtain the function, or at least approximate it. Machine learning is nothing more than this: a technique for approximating a function.

Take, for example, the case of a bank that needs to decide which clients to grant mortgages to. The bank typically requests information from potential clients, including age, salary, marital status and property value, among others. Over time, the bank has accumulated a history of this information, including the "output”, which in this case is whether the client repaid their debt or not. The bank might decide to implement a machine learning model to determine whether to grant a mortgage to a client after they have provided the "input" — in this case, information regarding age, salary, etc.
As long as this tool simply facilitates calculations, it isn't dangerous, provided there is always someone overseeing how decisions are made. The problem arises when machine learning models evolve to the point where humans are unable to understand how decisions are being made.

These types of models are called "black boxes" because it is possible to see what goes into the box and what comes out, but not what happens inside. Neural networks and LLMs fall into this category.

Unethical AI

While it is possible to cite various examples regarding the importance of ethical and understandable AI usage, the most frequently cited example is the case of bias in the COMPAS algorithm (https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing). The COMPAS software has been used in many US jurisdictions as a support for judges in order to assess the potential risk of recidivism. Since the algorithm uses proprietary software, it is impossible to understand exactly its functioning. However, it has been demonstrated that COMPAS decisions may be biased, since black people were significantly more likely to be labeled as high risk than white people. The research highlights that biased data leads to an equally biased outcome. As extreme as this example may be, it is useful to highlight two aspects: firstly, it highlights the dangers arising from the unaware use of artificial intelligence, which, among other things, can lead to decision-making through unacceptable biases according to the ethics of our time; subsequently, it emphasizes the need to develop tools capable of mitigating these dangers.

A tool for addressing these situations is legislative intervention, such as the General Data Protection Regulation or the AI Act adopted in the European Union.

A second valuable tool in mitigating AI risks is the development of methods aimed at understanding AI itself. In this sense, the diffusion of the so-called black box classification models paved the way for the development of what is known as Explainable Artificial Intelligence (XAI), which consists of a series of methods aimed at making AI processes and outcomes human-understandable. These methods allow for transparent collaboration between humans and algorithms, which is necessary for the conscious development of the latter.
While it would be better to prefer explainable design methods when possible, given their simplicity, interpretability and lower resource consumption, there are no such methods capable of providing results comparable to those offered by modern LLMs. It is in these cases that it becomes essential to invest in research on methods capable of explaining what machines learn and how decisions are made.

Entrusting control to AI means sacrificing our ethical and moral principles in favor of the principle of optimization. If we want to remain in control, it is essential that human intervention always be decisive, and for this to happen, it is necessary to understand what happens inside the box.