Skip to content
← Utility Tools

PDF to Excel Converter

Pull tables out of any PDF and into a real .xlsx spreadsheet

PDF to Excel (.xlsx) Converter

Tables locked inside PDFs are one of the most common formats people actually want to edit. Bank statements, invoices, vendor price lists, academic data appendices, and financial reports all ship as PDFs that are functionally useless until you can get the data into a spreadsheet. This tool extracts those tables and writes a real .xlsx file with one sheet per page — entirely in your browser, with the PDF never leaving your tab.

Why Browser-Based PDF to Excel Beats Upload Services

Bank statements, vendor invoices, payroll records, and confidential financial reports are exactly the kind of PDFs people want to convert — and exactly the kind of PDFs you should not be uploading to a stranger's server. This tool parses the PDF in your browser using pdf.js, runs heuristic row and column alignment on the text positions, and writes a real .xlsx file using SheetJS, all locally. Nothing is uploaded. Nothing is logged. Nothing sits on a free-tier server waiting for an unknown retention window to expire.

How the Table Detection Works

PDF does not mark tables explicitly. What it has is text runs with x and y coordinates. The tool bins items by y-coordinate (using a small tolerance to handle anti-aliased baselines), which gives you rows. Within each row, items are sorted by x-coordinate to give you columns. Column boundaries are inferred from the gaps between cell texts and aligned across rows so the same x-position lands in the same spreadsheet column across the whole table. Multi-page tables get one sheet per page, with the header row repeated when the tool detects matching column structure. The output opens in Excel, Google Sheets, Numbers, and LibreOffice Calc.

Use Cases and Limitations

Best results come from PDFs with consistent column alignment — financial statements, scientific data tables, structured invoices, and any PDF where columns line up visually. Tables with merged cells, rotated text, or visually grouped rows (alternating shading) work but may need light cleanup. Tables embedded in flowing prose (Wikipedia-style inline tables, narrative reports) often misdetect because the surrounding text gets mixed in; the workaround is to use the page-range input to convert only the pages with actual tabular content. Scanned PDFs without a text layer cannot be parsed directly — run them through OCR first.

How We Compare to Paid PDF-Table Tools

Tabula is the established open-source desktop tool for PDF table extraction and works well, but it requires a Java install and a separate download. Cometdocs, Smallpdf, and PDFTables.com all offer web-based extraction with subscriptions in the $5–$15 per month range, with the trade-off of uploading your file to their servers. ABBYY FineReader includes excellent PDF-table extraction but is a $200+ desktop purchase. This tool covers the in-between case: free, browser-based, no install, no upload — backed by pdf.js (Apache 2.0) for parsing and SheetJS (Apache 2.0) for the .xlsx writer. For occasional table extraction from digital PDFs, in-browser is faster and more private than any paid alternative.

The tool fits alongside the rest of the UDT PDF cluster: PDF Text Extractor for non-tabular text, PDF to Word for prose documents, and the JSON ↔ CSV converter for further transforms once the data is in spreadsheet form. Tables go to Excel; prose goes to Word; data interchange goes to CSV.

Frequently Asked Questions

Does the output preserve table structure as real cells?+
Yes. The tool detects rows by y-coordinate and columns by x-coordinate alignment, then writes the result to a real .xlsx file with one cell per detected cell. The output opens in Excel, Google Sheets, Numbers, and LibreOffice Calc as a proper spreadsheet, not a single column of text.
How are multi-page tables handled?+
Each PDF page becomes one Excel sheet, named Page 1, Page 2, and so on. When the column structure matches across pages, the same column positions are used for every sheet so you can copy data between them or stack them into a single sheet manually. The page-range input lets you limit conversion to specific pages.
What about merged cells or rotated text?+
Merged cells are detected when the row contains fewer text runs than the dominant row width — the merged value is placed in the first column it spans and the trailing columns are left blank. Rotated text (90 or 270 degrees) is read as a separate column because the y-coordinate dominates over the visual flow; if the rotation is in headers only, the headers may need a manual transpose.
Will my file get uploaded anywhere?+
No. The PDF is parsed locally by pdf.js, the rows and columns are detected locally, and the .xlsx is written locally by SheetJS. Nothing leaves your browser. This is the key reason to use a browser-based tool for bank statements, payroll, and other sensitive financial documents.
What if the PDF has prose mixed with tables?+
The detection works best when the page is mostly tabular. Mixed pages tend to bring some surrounding paragraphs into the spreadsheet as long single-cell rows. The cleanest workaround is to enter a page range like "3-5" before conversion to limit the output to pages that contain tables you actually want.
Can I convert specific pages only?+
Yes. The page range input accepts notation like "1-5,10,15-20" and limits conversion to those pages. This is useful for grabbing only the financial-statements pages from a long quarterly report or only the data appendix from an academic paper.
Does it work with scanned bank statements?+
Not directly. A scanned PDF is an image of text and has no positioning data to work with. The fix is to run the PDF through OCR first to add a text layer; the OCR text layer then has x and y coordinates the tool can read. Some bank statement PDFs from older systems are scanned; most modern ones include a text layer and convert cleanly.
Is there a row or page limit?+
There is no hard limit. The work happens in your browser, so the practical ceiling is your device memory. A 50-page bank statement with one table per page converts in a few seconds on modern hardware. If a conversion stalls on a very large file, narrow the page range and run it in chunks.

Built by Derek Giordano · Part of Ultimate Design Tools

Privacy Policy · Terms of Service