Ethics of Machine Learning – Uncertainty

I saw something this morning that highlights one of the problems with machine learning systems and the limits of their capabilities. It goes to an extreme but it aptly displays the point.

https://www.nytimes.com/interactive/2024/07/18/technology/spain-domestic-violence-viogen-algorithm.html

The story highlights something important about ML/AI, it can be systemically incorrect if there were material omissions from the training data. This may not be the point of the article but it’s still true. Unseen aspects of the data can also appear in the outputs, whether intended or not. The issues with the model in the article could be as simple as survivorship bias. If there was no training data that came from unreported crimes, it is challenging to evaluate how effective the model can be without testing it in vivo and generating more data which can then help re-train (read: improve) the model.

This does lead to improvement, however, and there’s still value in creating these approximations and determining ways to keep feeding new data points back into the model. Adding more parameters to allow more nuanced relationships between inputs can help drive the approximation close to 100%, but it will never be 100%. Computation takes time, and as soon as time passes, things have changed.

In the case of the algorithm in Spain, having it as a tool is helpful, however the continuous collection of data is necessary to ensure the model continues learning as time moves forward. It’s easy to focus on how we can better predict and prevent negative outcomes but if we’re missing the point, say, laws need to be tougher/enforced, cultural values have to evolve, the New York Times has to publish an article about domestic abuse in Spain to bring more attention to the matter to have meaningful change. If that’s the case, the AI did its job as the researchers that created the model intended. My guess is that the researchers just wanted to help victims, and they did that with their accurate predictions from the model they assembled. Was their goal to catch more than 50% of dangerous situations, 75%, 90%? It can’t get to 100% in an analog, continuous world. Physics struggles with the same concept, uncertainty. And that’s the biggest problem with machine learning and AI, you can’t get rid of uncertainty, ever. Physics hasn’t gotten around that one yet either.

This is a universal hazard of discretized systems. If you keep moving half-way to between your start and your destination, you’ll never get there, but you can always keep going.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.