Why Is AI Overly Confident In Giving Incorrect Responses?

A TikTok user recently took to the platform a very strange interaction with the famous ChatGPT. On the chat, she asks the AI how many 鈥淩鈥漵 are in the word 鈥渟trawberry鈥, and with confidence, the AI answers 鈥2鈥. The user goes as far as asking the bot to spell out the word and count the amount, but with no success, the AI goes back to its original answer, even apologising for making the mistake of saying 鈥3鈥 when counting the spelled out word.

Although the interaction was humorous, it raises a huge concern on how much more information is very confidently answered, but completely incorrect. The University of Marlyand explains it perfectly in a guide, saying, 鈥淎s of 2023, a typical AI model isn鈥檛 assessing whether the information it provides is correct. Its goal when it receives a prompt is to generate what it thinks is the most likely string of words to answer that prompt. Sometimes this results in a correct answer, but sometimes it doesn鈥檛 鈥 and the AI cannot interpret or distinguish between the two. It鈥檚 up to you to make the distinction.鈥

Why Does AI Do This?

The question as to why AI answers confidently wrong has been discussed across the net, and on a Reddit post, users debate the reasoning, with one saying, 鈥淐hatGPT is just a machine without consciousness or confidence, and when it gives a wrong answer, it鈥檚 not being confidently wrong; it鈥檚 just wrong.鈥 Another user argued that even though ChatGPT creates answers based on learning patterns from large amounts of data, its responses sometimes appear confident which can mislead users into accepting incorrect information.

Meet the 鈥淭hermometer鈥 Calibration Tool

To solve this problem, researchers at MIT, and the MIT-IBM Watson AI Lab have announced another new tool called 鈥淭hermometer.鈥 This tool helps with the issue of large language models being too confident in their wrong answers or too doubtful of their right ones. Thermometer stands out because it can work across many different tasks, giving it a wide application range.

Better Or Worse Than Other Modes?

Thermometer uses less computing power than other methods. It also has a smaller model that works alongside the main LLM to adjust its confidence accurately. This method make sure that the LLM keeps its accuracy while being flexible enough to handle tasks it has never seen before. Temperature scaling is used here to match the model鈥檚 confidence with how correct its answers are.

Thermometer鈥檚 side model is initially trained across a few datasets but can adapt to new scenarios without needing new data. So if trained on algebra and medical question datasets, it can later tune an LLM to respond accurately to queries in geometry or biology.

What Are Social Media Users Saying?

On Threads, koumouz said, 鈥淚 find it pretty funny that we鈥檙e all super concerned that large language models, when posed with a question they don鈥檛 know the answer to, make things up rather than state they don鈥檛 know.
And yet this is precisely the behaviour of many (most?) social media influencers and presumed experts.鈥

This point brings up the main thing when it comes to AI, and that is: fact-checking is so important when interacting with these technologies and tools, to take away the issue of misleading responses. Combining this, and tools such as those that MIT have developed, will surely help with the misinformation concern many have with AI.