Remember when we reported a month ago or thus that Anthropic discovered that what is happening within AI models is very different from how the models themselves described their “thinking” process. Well, about this mystery, along with the latest language model (LLM), with countless others, you can now add the worst deception. And this chat is in accordance with the well -known name test in the boats, Openi.
New York Times reported It has shown that Openi’s latest GPT O3 and GPT O4-Mini major LLMS investigations have shown that they are at a much higher risk of making false information than the previous GPT O1 model.
“The company found that the O3-most powerful system-when its person’s benchmark is 33 percent time during the benchmark test, which includes answers to questions about public data. This is higher than the double rate of an open argue system, called O1.
“When another test is going on, whose name is simple, which asks for more common questions, the fraud rate for O3 and O4-mini was 51 percent and 79 percent. The previous system, O1, deceived 44 percent time.”
Open says further research needs to be understood as to why the latest models are deceitful. But according to some industry observers, the so -called “reasoning” model is the main candidate.
The Times claims that “the so-called reasoning system of companies like Openi, Google and Chinese Startup Dipsak are creating the most powerful and most powerful technologies-not less.”
In simple terms, the reasoning models are a type of LLM designed to perform complex tasks. Instead of just spitting the text -based text -based model, models of reasoning break the questions or tasks into individual actions that are equivalent to the human thinking process.
Open’s first argument model, O -1, came out last year and was claimed to perform PhD students in physics, chemistry, and biology, and defeat them in mathematics and codify thanks to the use of reinforcement techniques.
“How can a person think for a long time before answering a difficult question, O1 uses the chain of thinking when trying to solve a problem,” Openi said when O1 was released.
However, Openi has withdrawn against the story that models of reasoning are increasing the rate of deception. “Flexibility in the models of reasoning is not naturally high, though we are actively working to reduce the high rates seen in O3 and O4-mini,” Openi’s Gabby Royla told the Times.
Whatever the truth, one thing is definitely. AI models need to be largely nonsense and lies if they are just as useful as their supporters. As it stands, it is difficult to rely on the production of any LLM. A lot of things have to be double check carefully.
It’s fine for some tasks. But where the basic advantage is time or labor savings, the need to carefully check the evidence and reality defeats the purpose of using them instead of output. It remains to be seen whether the rest of the open and LLM industry can handle all these unwanted robots dreams.