Skip to content
← Utility Tools

PDF to Markdown Converter

Turn any PDF into clean Markdown with inferred headings

PDF to Markdown Converter

Markdown is the lingua franca of writing systems — Obsidian, Bear, iA Writer, Notion exports, static site generators, Hugo, Jekyll, GitHub READMEs, every modern docs platform. Getting a PDF into Markdown means getting a PDF into any of those workflows. This tool reads a PDF locally and writes clean Markdown with headings inferred from font size, bullets detected from glyph patterns, and paragraph breaks preserved — no upload, no signup, no length limit.

Why Convert PDF to Markdown

Markdown is plain text, which means it works in every editor, version-controls cleanly in git, copies anywhere, and converts to anything else (HTML, .docx, PDF) trivially. PDFs are the opposite — visual containers that resist editing and lock content away from any text-based workflow. Converting PDF to Markdown is the move that gets research papers into Obsidian, that gets meeting notes into your second brain, that gets a colleague's report into your Jekyll site, that gets the contents of a long PDF into your AI chat context window without copy-pasting page by page. The conversion preserves enough structure to be readable and strips enough formatting to be portable.

How Heading Inference Works

PDF font runs carry size data. The tool collects every font size used in the document, identifies the most common size as body text, and bands the larger sizes into Markdown headings: the largest distinct band becomes a single hash (H1), the next becomes two hashes (H2), the smallest still-larger-than-body band becomes three hashes (H3). The result is a Markdown file with real heading hierarchy that renders correctly in every Markdown viewer and that any static site generator will read as a proper outline. Bullet lists are detected from leading glyphs (•, ◦, –, -, *) and converted to standard Markdown dashes.

Use Cases for the Converted Markdown

Academic researchers convert papers to Markdown to feed them into Obsidian or Roam with proper outline structure for note-taking and linking. Writers convert source documents to Markdown to draft against in iA Writer or Bear. Engineers convert technical PDFs to Markdown to include as context in LLM prompts — Markdown survives the token economy better than verbose HTML or messy plain text. Docs teams convert legacy PDF documentation to Markdown to migrate into Docusaurus, MkDocs, or a Jekyll-based site. Knowledge workers convert downloaded PDFs to feed into Notion, where Markdown imports preserve heading structure.

How We Compare to Pandoc and Marker

Pandoc is the gold standard for document conversion and supports PDF to Markdown via its built-in parser — but the output quality on real-world PDFs is mediocre because Pandoc does not have great heading inference, and it requires installing Pandoc and any LaTeX dependencies. Marker (an open-source Python tool from Vik Paruchuri) does excellent ML-based PDF to Markdown conversion but needs a Python environment and downloads several GB of models. This tool covers the common case: in-browser, no install, no models to download, with heuristic heading inference that handles standard digital PDFs cleanly. For research-grade conversion of academic PDFs with equations and complex layouts, Marker is the better choice; for everyday PDFs, in-browser conversion is faster and zero-friction.

The tool sits in the UDT writing pipeline alongside PDF Text Extractor for plain text, HTML to Markdown for the other common source format, and Markdown Preview for verifying the output renders correctly. Markdown is the connective tissue between every text-based workflow, and getting your PDFs into Markdown is the first step to keeping them there.

Frequently Asked Questions

How does the tool decide what becomes a heading?+
The tool collects every font size used in the PDF, identifies the most common size as body text, and bands the larger sizes into Markdown headings. The largest distinct band becomes H1 (one hash), the next becomes H2 (two hashes), and the smallest still-larger-than-body band becomes H3 (three hashes). Bold body-size runs are treated as body text, not headings, because bold paragraphs are common inside body content.
Are bullet lists preserved as Markdown lists?+
Yes. Bullet glyphs at the start of a line (•, ◦, –, -, *) are detected and converted to standard Markdown dashes, with indentation preserved for nested lists. Numbered lists are detected from patterns like "1.", "2.", "a)", and emitted as proper Markdown ordered lists.
Does it upload my PDF anywhere?+
No. The PDF is parsed locally in your browser by pdf.js and the Markdown is built in memory. Nothing leaves your tab. Confidential documents, drafts, and anything sensitive can be converted with full privacy.
What does the output look like for a research paper?+
Title becomes an H1 heading. Section headings (Introduction, Methods, Results, etc.) become H2 or H3 depending on the source font sizes. Body paragraphs are preserved as standard Markdown paragraphs. References at the end come through as numbered list items when the source PDF formats them that way.
Will it preserve LaTeX equations?+
Equations rendered in PDFs are typeset as glyphs, not as LaTeX source, so the conversion captures the visible text approximation. For PDFs where equations matter, consider running them through an ML tool like Marker or Mathpix; this tool is optimized for prose-heavy documents.
Can I copy or download the Markdown?+
Both. The converted Markdown shows in a textarea on the page — click Copy to grab the whole thing, or click Download to save it as a .md file. The download uses the source PDF's filename with the .pdf extension swapped for .md.
Does the output preserve tables?+
Tables are emitted as Markdown pipe tables when the source PDF has clean rectangular column alignment. Complex tables with merged cells or visual grouping fall back to indented paragraphs. For table-heavy PDFs where the data is the point, use PDF to Excel instead and export from there.
What about scanned PDFs without a text layer?+
A scanned PDF is an image of text, not text. The tool needs a text layer to extract anything. Run the scan through the PDF OCR tool first to add a recognized text layer, then convert the OCR'd version to Markdown.

Built by Derek Giordano · Part of Ultimate Design Tools

Privacy Policy · Terms of Service