SmolDocling is a multimodal Image-Text-to-Text model designed for efficient document conversion. It retains Docling’s most popular features while ensuring full compatibility with Docling through seamless support for DoclingDocuments. (View Highlight)
DocTags create a clear and structured system of tags and rules that separate text from the document’s structure. This makes things easier for Image-to-Sequence models by reducing confusion. On the other hand, converting directly to formats like HTML or Markdown can be messy—it often loses details, doesn’t clearly show the document’s layout, and increases the number of tokens, making processing less efficient. DocTags are integrated with Docling, which allows export to HTML, Markdown, and JSON. These exports can be offloaded to the CPU, reducing token generation overhead and improving efficiency. (View Highlight)