AI model updates and democratisation of OCR

March Edition

and

Mar 10, 2025

Welcome to the March edition of The Token 🌸 Last month, all frontier models got an update, from OpenAI's GPT4.5, to Antropic’s Claude 3.7, xAI Grok 3, and Google’s Flash 2.0, all of those models are noteworthy for different reasons but none signals any major breakthrough as we are approaching the limits of pre training scaling.

In this issue, we also dive into how vision language models are transforming document processing with next-generation OCR technology 📄 and share a case study on how we helped a healthcare non-profit automate the extraction of critical information from complex medical documents 🏥

📰 Bites from AI news

🧩 A wave of AI model upgrades: OpenAI's GPT4.5 joins Claude 3.7, Grok 3, and Flash 2.0 in the latest round of frontier model improvements. While each claims significant advances, GPT4.5 shows more modest gains despite reportedly using 10x the compute and data of its predecessor. OpenAI Announcement
🔍 DeepResearch transforms information discovery: OpenAI's autonomous research agent can independently search the web, reason through complex topics, and synthesize information from multiple sources. This tool represents a significant step toward AI systems that can conduct multi-step research tasks without constant human guidance. OpenAI DeepResearch
⚗️ AI as co-scientist accelerates breakthrough discoveries: Google's AI co-scientist platform demonstrates how AI can significantly speed up scientific discovery by generating novel research hypotheses, designing experiments, and analyzing results. In a dramatic example, an AI system recently solved a superbug problem in just two days that had stumped human scientists for years. Google Research Blog
🧠 Research shows fine-tuning unlocks latent abilities: The s1 approach reveals that fine-tuning with just 1,000 reasoning examples can enable models to reach performance comparable to more complex systems when forced to "think more." This suggests current models may already possess more capabilities than previously thought, with fine-tuning acting as a key to access these abilities rather than creating them from scratch. s1: Simple test-time scaling

💼 AI-powered medical document analysis for non-profit

We helped a large healthcare non-profit automate the extraction of critical information from complex medical product specifications. By combining rule-based table extraction with LLM processing, we created a flexible solution that handles various document formats while maintaining high accuracy. The system now enables rapid analysis of product profiles, supporting better decision-making in medical diagnostics development. Read the full case study.

💡OCR reimagined: How vision language models are transforming document processing

Traditional OCR systems have long struggled with complex document layouts, handwriting, and maintaining the logical flow of text. The new generation of OCR tools built on vision language models (VLMs) addresses these challenges by understanding documents more holistically – recognising not just text, but the relationships between text elements, tables, equations, and images.

What makes this development particularly exciting is the combination of three key benefits:

1️⃣ Superior accuracy: VLM-based OCR systems maintain reading order and properly handle complex elements like tables and equations that traditional systems often misinterpret 🎯

2️⃣ Improved efficiency: These new implementations are significantly faster and more cost-effective than previous solutions, making advanced document processing accessible to more organizations 💸

3️⃣ Open-source availability: With tools like olmOCR being released as open source, organizations can customize and integrate these capabilities into their document workflows without dependency on proprietary solutions 🔓

This affects all industries that handle large volumes of documents – from legal and financial services to healthcare and research. Organizations can now extract structured information from documents more reliably, enabling better data analysis, compliance monitoring, and knowledge management.

For businesses looking to optimize their document processing workflows, this new generation of OCR tools represents an opportunity to significantly improve both accuracy and efficiency while reducing costs 📈 Interested to see how your organisation can optimize its processes by integrating AI? Get in touch with us for a free consultancy call with one of our AI experts https://mantisnlp.com/contact/#cta

The Token

AI model updates and democratisation of OCR

March Edition

📰 Bites from AI news

💼 AI-powered medical document analysis for non-profit

💡OCR reimagined: How vision language models are transforming document processing

Discussion about this post