Depression: an “explainable” AI assists doctors in diagnosis

Depression: an “explainable” AI assists doctors in diagnosis

Artificial Intelligence (AI) can significantly aid doctors in identifying depression, but transparency is essential: how does it arrive at its conclusions? FAITH project has developed an AI system designed to assess the risk of depression in cancer survivors. Deep Blue has focused on creating an interface that reveals the algorithm’s reasoning to doctors.

 

Artificial Intelligence, explainability is better

What should one expect from an AI algorithm? That it works, of course. But also that it is “transparent”: the logic and reasoning behind its results should be understandable, especially for those with no programming expertise. This is not a secondary requirement; AI Explainability is crucial for building trust among its users, not just in the algorithm’s results but in the reasoning that led to those outcomes. This trust is vital for the effectiveness of the human-machine partnership, in all fields, including healthcare.

 

AI IN MEDICINE: POTENTIAL AND CHALLENGES

The applications of AI in medicine are widely discussed, though not yet as pervasive as one might expect given the amount of attention they receive. AI has already established a strong foothold in diagnostic imaging: a well-trained virtual “eye” can detect potential signs of disease ­– such as a tumour – in an X-ray or histological report, sometimes even more accurately than a human doctor. Another, more futuristic area of interest is predictive AI: using algorithms to predict the risk of developing certain diseases (even years before symptoms appear, as in the case of Parkinson’s) or to forecast a patient’s response to a treatment based on their characteristics or medical history. There is also great optimism about the computational power of AI in drug discovery, particularly in finding new antibiotics.

AI-supported diagnosis and disease risk prediction extend to mental health disorders as well. The ability to detect ­– often early – certain behavioural disorder signs through data collected by wearable devices (such as smartphones or smartwatches) and analysed by AI algorithms is a valuable opportunity. However, it is also controversial, as it involves the collection of sensitive data and, as mentioned, the inherent opacity of AI may hinder these systems’ adoption by doctors. The risk, therefore, is that these models might end up being shelved, and unused.

 

EXPLAINING AI RESULTS TO DOCTORS: THE APPROACH USED IN FAITH

“One of the major challenges in AI is how to explain an algorithm’s results to people, known as XAI or explainable Artificial Intelligence. There isn’t a single approach; the explanations depend on the roles of the people they’re intended for, the subject matter, and the AI models used,” says Giuseppe Frau, Computer Scientist and Head of the Tech Division at Deep Blue. For example, Human-AI teaming is one approach: rather than simply presenting the final output, the process can be broken down into multiple stages, allowing the user to consult and intervene at intermediate steps. “Instead of just explaining the result, we reach it together, breaking the task into parts where each person can contribute, making the user more aware of how the algorithm arrived at its decision,” Frau explains. Alternatively, analogy can be used, providing similar examples that led to the same conclusions.

In FAITH, a European project aimed at developing an AI system to help doctors assess the risk of depression in cancer survivors (which we previously discussed here), the consortium’s researchers opted for a feature-ranking-based approach. Before diving into what that means, a few words about the recently concluded project. The AI-powered smartphone application developed by FAITH’s programmers was downloaded by over 200 former cancer patients from two hospitals in Spain and Portugal. Over several months, it collected data on participants’ physical activity, sleep, nutrition, and tone of voice (there’s scientific literature supporting the idea that how we speak can indicate mood disorders such as depression).

Using this data, an AI model was developed (trained on the psychological assessments made by mental health professionals at the two hospitals that recruited the former patients) to estimate the risk of depression in cancer survivors, based on information about their daily habits. “It’s a monitoring tool, not a diagnostic one, which enables doctors to decide whether to further investigate a patient’s health status, acting as a bridge during periods when in-person consultations might be more challenging (as happened during the pandemic or due to other reasons like resource scarcity or physical distance)” Frau clarifies.

 

A DOCTOR-CENTRIC INTERFACE

Creating an AI interface tailored to doctors’ needs, with a focus on Explainability, was one of the most challenging aspects of the project, as Giuseppe Frau, who led this effort at Deep Blue, explains. “We began by exploring existing XAI solutions but soon realised that they were designed for developers – those who build and refine the technology – rather than for the doctors” says Frau. “This led us to rethink how we could represent the algorithm’s ‘reasoning’ in a way that is more accessible and intuitive. We chose a feature-ranking approach, which highlights the most significant factors that contribute to a particular outcome.”

For FAITH, these “candidate” factors were selected in collaboration with doctors, psycho-oncologists, and psychiatrists, focusing on variables that are widely recognised as important for mental health. The data collection, which was based on this protocol, was detailed in a study published in BMC Psychiatry. The app gathered information from former patients on aspects such as physical activity, sleep, nutrition, and tone of voice, with each category broken down into specific metrics – like daily steps, calories burned, hours of sleep, number of nighttime awakenings, water intake, weight fluctuations, and more.

“The graphical interface presents doctors with a depression risk score, along with an explanation based on the ‘weight’ of these various factors” Frau continues. “For instance, the system might indicate that a former cancer patient has a 75% risk of developing severe depression, and that physical activity as a category and hours of sleep as a metric contributed 30% and 12%, respectively, to this risk.”

Detail of the graphical interface for doctors. In this section, the main prediction result (in this case, “Severe Depression”) for the person is presented to the doctor, along with the confidence level, the time horizon on which the prediction is based, and an initial explanation of the result. This includes the category of indicators that most influenced the result (in this case, “Activity,” meaning physical activity) and the specific indicator that had the most impact (Hours of sleep, as shown in the image).

 

At this initial level of analysis, additional, more detailed layers of insight follow, revealing the importance of different factors for the individual. These insights are provided both at a specific point in time (Latest Prediction) and in general (General Behaviour). The interface also allows for comparisons between the individual’s data (Individual Level) and broader population trends (Population Level).

 

Second detail of the graphical interface for doctors showing the initial explanation of the result provided by the AI. In this section, it is possible to view the contribution of the categories of indicators, both for the specific person and for the general population. Additionally, an indication is given regarding the contribution of factors related to the individual at the specific moment (“Latest prediction” in the image) and compared to their average behaviour (“General behaviour” in the image). This information is complemented by the amount of data used for the prediction and training.

 

Third detail of the graphical interface for doctors focusing on the differences, in terms of indicator contribution, between the individual and the population. In the image, the category of indicators related to sleep (“Sleep”) is considered more impactful for the individual (green icon) compared to the population average (yellow icon).

 

WHY AN INDIVIDUAL AND A POPULATION MODEL?

To safeguard patient privacy, FAITH implemented an approach known as Federated Learning, which involves creating an AI model for each individual, trained locally on their smartphone so that no sensitive data is sent to a central server. Each model is thus tailored to a specific person’s data, habits, and current condition. These individual models – without sharing sensitive data – are then aggregated into a central model that reflects the observed population, allowing for broader comparisons and further analysis. “The relative importance of different factors inherently generates various levels of analysis: individual, population, specific moments, and general habits and lifestyle,” explains Frau. “Recognising the diversity of these factors and applying them to the clinical reality in a way that is understandable for doctors was the most challenging part of the work”.

 

CERTAINTY ABOUT DATA RELIABILITY

The interface also provides crucial information regarding data reliability: for each category and metric, it indicates the amount of data collected, thereby assessing the reliability of its contribution to the risk determination. “This way, the doctor can understand how much trust to place in a given data point,” Frau continues. “For example, if the model indicates that physical activity is not significant for a former patient, is this because it truly isn’t important, or because the model received limited data on that aspect? Tracking the data collected and communicating which areas are covered is fundamental to transparency and Explainability.”

“While there’s much talk about AI Explainability, in practice, few solutions address this issue effectively,” Frau says. “It’s not just about incorporating a large amount of information in a suitable way, which is complex in itself. It’s necessary to first ‘design’ Explainability: identifying available data, dealing with the technicalities of AI models, and translating this into useful explanations within the specific context, which always has its intrinsic complexities.” “In this respect, FAITH has been a pioneering project. We have laid the groundwork for a model that, while still needing refinement, is already defined and focused on the needs of the ‘real’ users of the technology—professionals who, for the most part, are unfamiliar with mathematical functions but seek an AI ally that is reliable and, above all, understandable,” concludes Frau.

Get in touch with us