Beyond the Hype: Humains Achieves World-Leading 92% on Google FACTS Grounding Benchmark

Blog

Let's be honest. The AI revolution is here, and it's amazing. But for every headline-grabbing breakthrough, there's a quiet whisper of doubt: can we actually trust this thing?

We've all heard the stories. The AI chatbot that went rogue and fired a customer. The recruiting tool that somehow learned to discriminate. The trading algorithm that decided a flash crash was a good idea. These aren't just funny anecdotes for your next happy hour; they're real-world enterprise nightmares that cost millions and erode trust.

The problem? Most AI, for all its brilliance, has a tiny flaw: it can hallucinate. It can confidently invent facts, misinterpret data, and generally make things up. Great for fiction writers, disastrous for your regulated industry.

At Humains, we decided enough was enough. We didn't just build another AI. We built a truth serum for AI: our patented Cognitive OS. Think of it as the ultimate fact-checker, the reliability guardian, the AI that makes sure your AI isn't just smart, but honest.

The Ultimate Lie Detector Test: Google FACTS Grounding Benchmark

We don't just talk the talk. We put our Humains Agent to the ultimate lie detector test: the Google FACTS Grounding Benchmark. This isn't some fluffy marketing metric; it's the industry's toughest challenge for AI factuality. It measures an AI's ability to stick to the facts, based only on provided information, without veering into creative fiction.

And the results? Nothing short of a world-class achievement for a startup. Humains, running our Agent on the same test, with no open information, no cheating, precisely the same test as all other models, achieved an astounding 92% on the Google FACTS Grounding Benchmark. This isn't just a win; it's a decisive victory that places us firmly at the #1 position on the leaderboard, significantly ahead of even the largest tech giants.

To put this into perspective, here's how the top performers on the public Kaggle leaderboard stack up (as of June 19, 2025):

Our 92% score represents a 4.2% lead over the next best model, Gemini 2.5 Pro. This isn't just an incremental gain; it's a monumental leap that transforms a "very good" AI into the most reliable in its class. It means our Cognitive OS fundamentally improves AI's ability to maintain factual accuracy and avoid generating misleading information, regardless of the underlying Large Language Model (LLM) it's paired with. As our internal tests have shown, the difference between leading LLMs like Claude or GPT-4o is minimal; the real differentiator is our Cognitive OS.

Human-Level Performance at Superhuman Scale

Beyond just being factual, our Humains Agents are designed to perform at a human level, but at a superhuman scale. Imagine:

•Conversations in Less Than 2 Hours: Our agents can handle a massive volume of interactions with speed and precision.

•2x Higher Engagement Rate vs. Call Center: Customers prefer interacting with an AI that understands and responds effectively.

•2.5x Faster Conversion Rate vs. Call Center: Efficiency that directly impacts your bottom line.

•Higher CSAT Rate vs. Call Center: Happy customers, every time.

This isn't just about automation; it's about elevating every customer interaction to a new standard of excellence, consistently and at scale. No more human agents getting tripped up by patience, scale, or that spat with the boss five minutes before clocking in.

Enterprise Trust: Built on a Decade of Innovation

We've been transforming customer experiences with AI for over a decade, delivering measurable business impact for our global clients. Our track record includes:

•8 Patents: A testament to our proprietary technology, including our Dialogue-to-Action™ Cognitive OS.

•Award-Winning: Recognized by Voice Capital Partners AI Challenge and CAR Human Machine Interface Berlin Award.

•Trusted by Global Enterprises: Featured by OpenAI CEO Sam Altman at the Microsoft Ignite Summit.

Unlike most conversational AI vendors who simply wrap off-the-shelf LLMs, Humains Agents are powered by our patented Cognitive OS. This unique architecture allows our agents to reason, adapt, and execute complex tasks far beyond what generic LLM-based chatbots can do. We train our models on proprietary, real-world datasets from years of live enterprise conversations, ensuring unmatched accuracy, empathy, and enterprise-grade reliability.

The Bottom Line: AI You Can Trust

For industries where mistakes are simply not an option---like finance, insurance, and utilities---this is definitive proof. You get an AI you can trust with critical, regulated processes where accuracy isn't a feature; it's a non-negotiable requirement. Humains provides the foundational AI reliability necessary for deploying AI in regulated industries, ensuring compliance and effectively mitigating risks associated with AI-generated inaccuracies.

The era of experimental AI is evolving into a demand for production-ready, inherently reliable AI. At Humains, we are dedicated to bridging this transition, providing the core technology that transforms cutting-edge research into dependable enterprise solutions.

Beyond the Hype: Humains Achieves World-Leading 92% on Google FACTS Grounding Benchmark

Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.

7 ביולי 2025