Latest
Evaluating AI Agents: A Hybrid Deterministic and Rubric-Based Framework
How Hebbia measures agent quality at scale with a hybrid evaluation methodology.
FFTxt: 30k Parameters Is All You Need
Hebbia researchers leveraged classic signal processing techniques to build a text detection model smaller than most of the images it classifies.
Reaching Autonomous Consensus on Agentic Outputs
We built a statistically rigorous, consensus-based framework for evaluating LLM outputs and used it to benchmark today's leading models on the tasks that matter most to finance professionals.

A Look Inside Hebbia's "Deeper" Research Agent
We built a multi-agent system that goes beyond public web search to synthesize insights for any data source, including proprietary data sources.

The Multi-Agent Redesign Behind Matrix
At the end of last year, we returned to the drawing board and redesigned Matrix Agent.

The Distributed System Behind Hebbia's High-Scale AI
We built a distributed LLM request scheduler that intelligently routes billions of tokens per day across multiple providers so high-priority work always gets through, even under rate limits.

Goodbye, RAG: How Hebbia solved Information Retrieval for LLMs
After pioneering semantic search and RAG, we found both fell short on the hardest questions so we scrapped them and built a new information retrieval system from scratch.