Building LLM-based agents that reason explicitly about their tasks helps to unlock the full potential of LLMs. It increases performance, facilitates coordination, improves explainability, and addresses governance issues. But the quality of LLM reasoning is paramount. Poor reasoning can be worse than no reasoning at all.
Read more →
Reasoning quality can be assessed by analyzing so-called reasoning traces — the argumentative texts that are generated as intermediary outputs in multi-step LLM-apps.
For this purpose, one can resort to the sophisticated tools and manifold methods of argumentation analysis developed in the critical thinking literature.
Read more →
We discuss how our Logikon demonstrator (off-the-shelf) turns the 7B OpenChat LLM into a legal reasoner who generates effective CoT traces, increasing the accuracy (above random baseline) on a LegalBench task by up to 80%.
Read more →
We show how our Logikon demonstrator (off-the-shelf) allows the Deepseek-Coder-7B LLM to self-improve its code completions on a code reasoning task.
Read more →