Question 1

How does the tool decide what becomes a heading?

Accepted Answer

The tool collects every font size used in the PDF, identifies the most common size as body text, and bands the larger sizes into Markdown headings. The largest distinct band becomes H1 (one hash), the next becomes H2 (two hashes), and the smallest still-larger-than-body band becomes H3 (three hashes). Bold body-size runs are treated as body text, not headings, because bold paragraphs are common inside body content.

Question 2

Are bullet lists preserved as Markdown lists?

Accepted Answer

Yes. Bullet glyphs at the start of a line (•, ◦, –, -, *) are detected and converted to standard Markdown dashes, with indentation preserved for nested lists. Numbered lists are detected from patterns like "1.", "2.", "a)", and emitted as proper Markdown ordered lists.

Question 3

Does it upload my PDF anywhere?

Accepted Answer

No. The PDF is parsed locally in your browser by pdf.js and the Markdown is built in memory. Nothing leaves your tab. Confidential documents, drafts, and anything sensitive can be converted with full privacy.

Question 4

What does the output look like for a research paper?

Accepted Answer

Title becomes an H1 heading. Section headings (Introduction, Methods, Results, etc.) become H2 or H3 depending on the source font sizes. Body paragraphs are preserved as standard Markdown paragraphs. References at the end come through as numbered list items when the source PDF formats them that way.

Question 5

Will it preserve LaTeX equations?

Accepted Answer

Equations rendered in PDFs are typeset as glyphs, not as LaTeX source, so the conversion captures the visible text approximation. For PDFs where equations matter, consider running them through an ML tool like Marker or Mathpix; this tool is optimized for prose-heavy documents.

Question 6

Can I copy or download the Markdown?

Accepted Answer

Both. The converted Markdown shows in a textarea on the page — click Copy to grab the whole thing, or click Download to save it as a .md file. The download uses the source PDF's filename with the .pdf extension swapped for .md.

Question 7

Does the output preserve tables?

Accepted Answer

Tables are emitted as Markdown pipe tables when the source PDF has clean rectangular column alignment. Complex tables with merged cells or visual grouping fall back to indented paragraphs. For table-heavy PDFs where the data is the point, use PDF to Excel instead and export from there.

Question 8

What about scanned PDFs without a text layer?

Accepted Answer

A scanned PDF is an image of text, not text. The tool needs a text layer to extract anything. Run the scan through the PDF OCR tool first to add a recognized text layer, then convert the OCR'd version to Markdown.

PDF to Markdown Converter