Machine Learning and its flaws. Or, imdb’s confusion between two Steve… | by Vuk Ivanovic | Predict | Nov, 2020

Or, imdb’s confusion between two Steve McQueen-s

I don’t think I need to go deeper into the meaning of the subtitle, the image is worth a thousand words. But, in case that some of you aren’t aware of the famous actor from the 20th century or the director of the 21st century, the actor of the 20th century never directed a single movie, according to, not even a short film, and he was very much of the opposite race compared to Steve McQueen the director who, according to, only acted in short movies. Bonus, the younger McQueen also won an Oscar.

I have been seeing this happen on for quite some time. At times the image being displayed in the thumbnail for an article is using the wrong movie with the same title, or a scene from the wrong movie that happens to have the same title. At times it’s understandable, for example when the article is about a movie or a TV show that has just been announced and it’s a remake/re-imagining of one of the previous versions. It makes perfect sense to use a still or a poster, etc. from an earlier incarnation. Either way, as a human being I can read the text and compare it to the featured image and determine rather quickly whether or not it’s a mistake or not. But, whatever algorithm is being used at, can’t make the necessary distinction.

NLTK — A quick and simple explanation
NLTK — natural language toolkit. It’s a python collection of libraries dealing with NLP — natural language processing, for more in-depth tutorial click here.
The basic idea is to teach a computer how to interpret books/articles/text in general. A simple sentence has a subject, a verb, and an object. And while NLTK has a way to break the words in a sentence in that way, most of the time, the context is the trickier.
An example (nltk book has far better examples , but mine has a dog 🙂 ): Dog knocked over a vase on the table. Did the dog knock over a vase that was on the table or did the dog knock over a vase in such a way that it fell onto the table, or was the dog on the table and then knocked over the vase. I hope you get the point. But, there are also double meanings and whatnot: Duck! Where? I hope I don’t need to go into explaining this one, but if it doesn’t make you laugh, well, it makes me laugh 🙂

It’s all fun and games until something or someone confuses games and life
When you confuse the director of Tenant and the director of Interstellar when joking around with your friends — it’s a laugh. When you confuse the green light from the red light or fail to stop on the STOP sign, well, it can be deadly.
Imagine now if a flawed algorithm is used for something more serious. And, not to increase fear and paranoia, but the following screenshot proves my point:

If you didn’t catch it, AI and Machine learning are being mentioned as ways to find cures for various diseases. And while at first glance you may think, well, good, computers can make far more calculations far more correctly and much faster than a group of smartest people. But, if that’s the case then it’s a bit scary, at the very least, that there are still no cures for any of those diseases.

Just assumptions and guesses, not a single one says CURE FOUND, but a lot of uses of words could, possible, in the future, etc.

Curious, innit? But, the other side of this concern is what if a flawed algorithm is used to attack cancer cells, for example? What if it attacks the healthy cells? And yet there are far more news and articles about the benefits of A.I. But, let’s be honest it’s an easier sell than to criticize it. That’s why I’m here 🙂

Algorithm bias
In the past, as recent as 2019, according to this article from Washington Post, a flaw in an algorithm affected basically the health of real live human beings. Now, I haven’t analyzed the algorithm in question which I would have done happily had I not witnessed the imdb issue. And while imdb issue has zero impact on anyone’s health (unless if there’s confusion between discussing Steve McQueen’s movies and race in a wrong crowd), the fact that I, a (mostly) human being have been noticing that issue for many years, and yet I never bothered to contact anyone at imdb proves my point. If I can figure out what’s what, why wouldn’t others? This also means that some folks who are writing those algorithms could have been and probably still are, considering that, if there’s a flaw a human will detect it, hopefully on time.

Humans make mistakes, but some are also doing it on purpose
For the longest time, I considered that the flaw with human-made anything really had to be the flaw of the human(s) who made it. The same goes for computer algorithms. While the idea behind machine learning is for machines to eventually achieve the ability to teach themselves, getting there is where humans have a lot of hand in. And humans require sleep, shelter, food, happy life which tends to require not starving, not having to walk 10 miles in rough weather, and so on. In short, humans need money. And, aside from volunteering, human beings work for money. Some work for less some for more, some are paid fair, some aren’t. And then some people just want to have more without working too hard to get it. Which means, some algorithms are developed with bias in mind because the developer(s) want or need more money than what they are earning. I mean, there’s corruption everywhere else, especially in politics, so there’s no reason to consider the tech industry immune to it. As long as humans are involved without some heavy vetting.

To err is human, but
If you make a mistake that ends up being analyzed by a fellow human being, they will more likely assume it to be an honest mistake, and even if it’s not an honest mistake, they may still accept your apology. But, if your mistake that you failed to notice gets analyzed by a computer program that assumes that you haven’t made a mistake, because why would it assume otherwise, then you may end up getting a wrong tax return amount, a simple example.


More Posts

Send Us A Message