We focus on advancing Natural Scene Text Recognition (NST) to enable machines to accurately read and interpret text in real-world environments. From license plates to storefront signs, our research tackles challenges like distorted, occluded, or multilingual text in natural settings. These innovations have broad applications in autonomous systems, accessibility tools, and data digitization.
Ongoing Research
- End-to-End Scene Text Recognition
- Objective: Develop robust pipelines for detecting and recognizing text in complex, unstructured environments.
- Progress: Designed an advanced model combining CNNs and Transformers, achieving over 90% accuracy in cluttered urban settings.
- Multi-Lingual Text Recognition
- Objective: Build systems capable of identifying and interpreting text in multiple languages and scripts.
- Progress: Created a unified OCR model supporting 25+ languages with seamless script-switching, outperforming existing benchmarks by 15%.
- Text Recognition in Challenging Conditions
- Objective: Enhance accuracy for distorted, occluded, or low-resolution text in images.
- Progress: Developed an adaptive attention mechanism that improves recognition accuracy under poor lighting and extreme angles.
- Zero-Shot and Few-Shot Learning
- Objective: Recognize unseen fonts or languages with minimal training data.
- Progress: Introduced a meta-learning framework that achieves 85% recognition accuracy on novel fonts with only a handful of examples.
- Real-Time Applications
- Objective: Enable text recognition in real-time for mobile devices and edge systems.
- Progress: Optimized lightweight models for deployment, achieving real-time text detection on smartphones at 30 frames per second.
Key Results and Milestones
- Cross-Domain Performance: Achieved state-of-the-art results in multiple international benchmarks, including ICDAR and Robust Reading competitions.
- Accessibility Innovations: Partnered with assistive technology organizations to deploy text recognition tools for the visually impaired, enabling seamless navigation in real-world scenarios.
- Document Digitization: Improved OCR for scanned documents, boosting text extraction accuracy in historical archives by 20%.
- Retail and Logistics Applications: Delivered text recognition solutions for inventory management and automated checkout systems, increasing operational efficiency by 30%.
Future Directions
Our next steps aim to expand the capabilities and applications of NST, including:
- Improving text recognition in augmented reality for live translation and navigation.
- Enhancing security and privacy in OCR systems to safeguard sensitive data.
- Developing universal text recognition frameworks for highly diverse languages and rare scripts.
- Integrating NST with autonomous systems for improved scene understanding and navigation.