Skip to content
Tech News
← Back to articles

Reducto releases Deep Extract

read original get Deep Extract Software Tool → more articles
Why This Matters

Reducto's Deep Extract introduces an advanced extraction method that self-verifies and corrects its output, significantly improving accuracy on complex, lengthy documents. This innovation addresses longstanding challenges in processing large-scale, detailed data, reducing reliance on manual review and enhancing efficiency for the tech industry and consumers alike.

Key Takeaways

Today we’re launching our most powerful update yet for structured extraction: Deep Extract.

Deep Extract is a new agent harness approach to extraction that verifies and corrects its own output until the results are accurate. Much like human-in-the-loop, Deep extract has an agent-in-the-loop, offloading the human reviewer’s burden with an autonomous verification cycle that holds itself accountable for accuracy.

This is particularly powerful when you're dealing with a long list of items to extract — think invoice line items, brokerage statement transactions, equipment manifests, and more. Deep Extract has already extracted over 28 million fields on documents up to 2,500 pages long in our production beta, and we're continuing to expand what's possible.

For the documents that matter most, it gets to 99–100% field accuracy, even out-performing expert human labelers on extraction tasks.

The challenge with long extraction solutions today

Over the past year, we kept hearing the same thing from customers. Their existing extraction pipelines were breaking down on long, complex documents — invoices running dozens of pages, financial statements spanning hundreds. However, totals didn't reconcile, and it flagged to teams that line items were dropped completely.When we asked how they were handling it, the answer was almost always the same: they'd hired people to have a human-in-the-loop (HITL) manually check the output.

The issue isn't that models are bad at reading documents. It's that single-pass extraction has no mechanism to catch its own mistakes, and models get lazy. Models are prone to shortcuts on long, repetitive tasks. Given a thousand line items to extract, they'll often stop short, consolidate, or skip entries rather than working through every last row.

This is amplified even more when citations are needed. For many of our customers, citations are not just a nice to have, but a need in order to prove their outputs.

Reducto’s agent harness approach

The rise of long-horizon agents and agent harness architectures pointed to a better way. If agents could reliably tackle complex, multi-step tasks in other domains, the same approach should work for extraction: break the problem down, verify the work, and iterate until it's right.

... continue reading