
It is a strange fact that we do not understand how the larger model of language (LLM) actually works. We designed them. We made them. We trained them. But their internal works are widely mysterious. Well, they were. Now thanks to something it is less true Anthropic’s new research It was infected with a mental scanning technique and helps explain why chatboats deceive and are terrible with numbers.
The problem is that when we understand a model, we do not know how all the Zillen is weight and parameters, the relationship between the data within the model, which comes out of the training process, in fact gives rise to the fact that it is visible to the Jokjent Output.
“Open a large model of language and what you will see are billions of parameters,” says Joshua Batson, a research scientist at Entropic. MIT Technology Review), What will you get if you look inside the black box that is a fully trained AI model. He noted, “It’s not bright.”
To understand what is happening in fact, anthropic researchers developed a new technique, called the circuit tracing, so that the step -by -step decision -making process can be tracked. He then applied it to his Claude 3.5 Hyco LLM.
Anthropic says his vision was influenced by mental scanning techniques that are used in neuro science and can identify the components of the model that are active at different times. In other words, this is a bit like a brain scanner in which parts of the brain are firing during the academic process.
Anthropic made many interesting discoveries using this approach, at least not that LLMS in basic mathematics are so terrifying. “Saying to add 36 and 59 from the cloud, and this model will pass in relation to strange steps, including adding to the selection of first estimated values ​​(add 40ish and 60 Yash, add 57ish and 36ish). The end of its process, with a cost of 92, is with it, with a cost of 92, with it, with a focus on it, with a focus on it, with a cost of 92, with it. 95, “MIT Article describes.
But here’s the active butt. If you ask Claude how he got the correct answer to 95, it would apparently tell you, “I added them (6+9 = 15), took 1, then added 10s (3+5+1 = 9), resulting in 95.” But this in fact reflects the joint responses in his training data that contrary to the amount he could be completed, what he did in reality.
In other words, not only does the model use a very strange way to do mathematics, but you can’t trust what he has done right now. This is important and it shows that model outpts cannot be relied on when designing guards for AI. Their internal works also need to be understood.
Another amazing result of the research is that they do not do LLMs, as is widely assumed, just work by predicting the next word. How Claude created a poetry couple, finding out that Anthropic found that he chose the word poetry at the end of the first verses, then filled the rest of the line.
“The planning thing in the poems blown me,” Batson says. “Instead of trying to understand poetry at the very last moment, he knows where it is going.”
Anthropic also found it, among other things, that the cloud “sometimes thinks in a theoretical place that is shared between languages, it is suggested that it contains a kind of universal ‘language’.”
Anyone, with this research, has to make a long journey. According to Anthropic, “Currently we try a few hours of human efforts to understand the circuits we see, even on only tens of words.” And research does not show how the structure inside the LLM is formed at first place.
But it has highlighted at least some parts that these strange mysterious AI creatures – we have created but do not understand – work practically. And this should be a good thing.