DocuSmart
Agentic Intelligent Document Processing (IDP) with Zero-Shot Extraction

The open-source toolkit for deep learning-based document image analysis and structured data extraction.
Layout Parser is a comprehensive Python-based framework designed to streamline the pipeline of document image analysis. As of 2026, it remains a critical infrastructure component for developers building high-accuracy OCR and document understanding applications. The tool provides a unified interface for state-of-the-art deep learning models, allowing for the detection of complex layouts—including tables, figures, headers, and multi-column text. It effectively bridges the gap between raw document images (scanned PDFs, photographs) and structured digital formats. By integrating with major backends like Detectron2 and PaddleDetection, it offers a plug-and-play architecture for loading pre-trained weights from the 'Layout Bank.' Its versatility extends to OCR orchestration, supporting engines such as Tesseract and Google Cloud Vision. In the 2026 market, Layout Parser is positioned as the go-to open-source alternative to proprietary solutions like Amazon Textract, favored for its flexibility in self-hosting and fine-tuning models on niche datasets. Its modularity allows enterprises to build custom parsing pipelines that maintain data privacy and reduce recurring API costs associated with commercial SaaS offerings.
A repository of pre-trained deep learning models specialized for various document types like academic papers and newspapers.
Agentic Intelligent Document Processing (IDP) with Zero-Shot Extraction
Interact with your PDF documents through intelligent, context-aware AI conversations.
Deterministic Python-based data extraction from PDF and image invoices using template matching.
Transform unstructured documents into actionable data with world-class machine learning.
Verified feedback from the global deployment network.
Post queries, share implementation strategies, and help other users.
A wrapper that provides a consistent interface for different OCR engines including Tesseract, PyTesseract, and Google Cloud Vision.
Stores layout information in a nested structure allowing for parent-child relationship tracking between blocks.
Matplotlib-based module for overlaying detected bounding boxes and segmentation masks on images.
Dual-backend support for two of the most popular computer vision frameworks.
Allows users to load their own PyTorch or PaddlePaddle model weights directly into the layout detection pipeline.
Tools to handle coordinate scaling and rotation corrections during the detection phase.
Complex multi-column layouts and varying font sizes in archived documents.
Registry Updated:2/7/2026
Extracting key-value pairs from semi-structured financial documents without manual entry.
Converting PDF research papers into Markdown or LaTeX for indexing.