Docling favicon

Docling

Docling is a comprehensive document processing platform that parses diverse formats with advanced PDF understanding and provides seamless integrations with generative AI ecosystems.

Docling demo screenshot

About Docling

Docling is an open-source document processing solution that simplifies the extraction and understanding of content from multiple document formats. It excels at advanced PDF parsing with sophisticated layout analysis, reading order detection, and structure recognition. The platform offers a unified DoclingDocument representation format and supports various export options including Markdown, HTML, and JSON. Designed for enterprise and AI applications, Docling provides local execution capabilities for sensitive data, extensive OCR support, and plug-and-play integrations with popular AI frameworks like LangChain, LlamaIndex, and Haystack.

Key Features

  • Multi-format document parsing including PDF, DOCX, PPTX, XLSX, HTML, audio files, and images
  • Advanced PDF understanding with page layout analysis, reading order, table structure, and formula recognition
  • Unified DoclingDocument representation format for consistent data handling
  • Multiple export formats including Markdown, HTML, DocTags, and lossless JSON
  • Local execution capabilities for air-gapped and sensitive data environments
  • Built-in integrations with AI frameworks like LangChain, LlamaIndex, Crew AI, and Haystack
  • Comprehensive OCR support for scanned PDFs and images
  • Visual Language Model support with GraniteDocling
  • Audio processing with Automatic Speech Recognition capabilities
  • MCP server connectivity for agentic applications
  • Structured information extraction capabilities
  • Command-line interface for easy automation

Follow us:

Built with 💚 by Pawel Boguta

Check out MCP Servers List