01 Apr
01Apr

Anthropic recently unveiled the underlying mechanisms powering its AI model, Claude, offering insights into how the system plans, reasons, and composes its responses.
Through the publication of two research papers, Anthropic introduced techniques known as circuit tracing and attribution graphs, designed to illuminate the internal operations of the model. The company emphasized that Claude does not merely mimic human linguistic patterns but actively engages in a form of β€œthinking.”
For instance, when tasked with composing poetry, Claude first strategizes the rhyme scheme; when answering geography-related questions, it identifies the relevant state before naming its capital. This indicates that Claude constructs a coherent framework for its answers and employs logical reasoning to derive themβ€”eschewing the simplistic, step-by-step matching process typical of traditional search engines.
The research also explores Claude’s approach to multilingual queries. The model translates various languages into a shared abstract β€œlanguage.” For example, when prompted with words related to the concept of β€œsmall” in different languages, Claude first maps them to this universal abstraction and then identifies the appropriate linguistic equivalents. This allows the model to accurately interpret queries across languages and process cross-lingual tasks more efficiently.
Anthropic also addressed the common phenomenon of AI β€œhallucinations.” When the model detects familiar vocabulary in a query, it may proceed to generate a responseβ€”even if it lacks true understanding. If the system mistakenly assumes it possesses knowledge about a topic, it may fabricate an answer, leading to inaccuracies.
As such, Anthropic concludes that the model’s tendency to confidently assert incorrect information stems from this flawed inference process. By uncovering the specific factors that trigger such errors, researchers may eventually mitigate more significant issues and enhance the reliability of AI systems.

Comments
* The email will not be published on the website.