OpenAI Announces CriticGPT, A Tool That Finds ChatGPT Errors

OpenAI announced their new model, CriticGPT, which is made to identify errors in code for programmers. Part of the GPT-4 series, it鈥檚 designed to analyse responses generated by ChatGPT and help users find mistakes, mainly during the training process AI goes through.

GPTs are trained by 鈥淩einforcement Learning from Human Feedback,鈥 or RLHF for short. This method involves using human feedback to train the AI to become more human-like as it develops.

How RLHF Works In Training


Humans are referred to as 鈥AI trainers.鈥 In the process, humans review different responses the AI gives in response to their requests. The responses are then rated, and feedback is given depending on whether the response was good and accurate, or hallucinated its response.

A good example of this training would help illustrate this process if you鈥檝e ever used ChatGPT. Sometimes, the AI will ask you whether its response was helpful, accurate, incomplete, or inaccurate, and that feedback is then taken in.

So, through constantly receiving that feedback, the AI learns over time by repeating the patterns or behaviours seen in the positive feedback. The opposite would then apply with negative feedback, where it would try to avoid those patterns.

What Is CriticGPT鈥檚 Purpose For AI?


The purpose of this tool is so that AI becomes more reliable, as time has shown the tech cannot always be trusted from an accuracy point of view. 鈥淎s we make advances in reasoning and model behaviour, ChatGPT becomes more accurate and its mistakes become more subtle. This can make it hard for AI trainers to spot inaccuracies when they do occur, making the comparison task that powers RLHF much harder.

鈥淭his is a fundamental limitation of RLHF, and it may make it increasingly difficult to align models as they gradually become more knowledgeable than any person that could provide feedback,鈥 explained OpenAI, on their announcement.

How Useful Is CriticGPT?


鈥淲e found that when people get help from CriticGPT to review ChatGPT code they outperform those without help 60% of the time. We are beginning the work to integrate CriticGPT-like models into our RLHF labeling pipeline, providing our trainers with explicit AI assistance,鈥 said OpenAI.

Also, CriticGPT鈥檚 critiques were preferred over those generated by ChatGPT in 63% of cases. When there鈥檚 an error, CriticGPT will highlight the error and on the side, give a critique as to why it is an error.

Although the tool is trained to give precise and brief critiques, it is proving to be very useful. This would be something OpenAI works on changing over time, but for now, this is a very efficient solution to ChatGPT, known for making small errors.

鈥淪ometimes real-world mistakes can be spread across many parts of an answer. Our work focuses on errors that can be pointed out in one place, but in the future, we need to tackle dispersed errors as well,鈥 added OpenAI.

This means that for now, only one error at a time can be picked up. So, even though the tool isn鈥檛 100% accurate as of yet, this could just be the right solution for the hallucination issue that recently came up.