Skip to content
← Utility Tools

CSV Cleaner

De-dupe, trim, normalize, and fill in one pass — entirely in your browser

CSV Cleaner (De-dupe, Trim, Fill Nulls)

Real-world CSV files are messy. Export the same spreadsheet from two different tools and you get trailing whitespace in some cells, NA as a string in others, blank rows in the middle, and duplicate header rows pasted in by accident. This tool runs the four most common cleanup passes — de-duplication, whitespace trim, null normalization, and empty-cell fill — in a single sweep, entirely in your browser. Nothing leaves the tab.

Why Clean a CSV in the Browser

Most online CSV cleaners upload your file to a server and run the cleanup remotely. That is fine for fully public data, but it is the wrong default for the CSVs people actually have on their desktops: customer exports, internal payroll snapshots, sales lists, address books, and dozens of other categories where uploading to a stranger is the worst option. This tool runs the entire pipeline locally. The CSV is parsed by PapaParse in your browser, transformed in memory, and offered back as a download. No upload, no temporary file on someone else's disk, and the tab can be closed at any point without leaving a trace anywhere.

How the Four Cleanup Passes Work

First, every cell is trimmed of leading and trailing whitespace, including the byte-order mark that Windows Excel exports often add to the first cell. Second, common null markers are normalized: empty strings, NA, N/A, null, NULL, and a hyphen on its own all become true empty cells. Third, the fill pass replaces those empty cells either with a specified default (zero, an empty string, the literal text NULL) or with the previous non-empty value from the same column — useful for hierarchical exports where the parent value only appears in the first row of each group. Finally, duplicate rows are removed; by default the entire row is the key, but you can pick a subset of columns so rows with matching keys (and possibly different timestamps) collapse to one.

Use Cases That Justify a Cleanup Pass

Customer-list dedup before importing to a CRM is the single most common case — marketing exports stack up over time, and importing a CSV with 12% duplicates into HubSpot or Salesforce inflates contact counts and corrupts deduplication logic downstream. Form-submission exports always have leading whitespace and stray N/A strings because users paste them in. Spreadsheet exports from Google Sheets often have trailing blank rows that break naive Python csv.reader loops. Survey results need null normalization before any statistical tool will treat them sensibly. The same four-pass cleanup that took a manual hour in Excel becomes a 2-second browser operation.

How We Compare to Desktop Tools

Excel and Google Sheets can do all four passes, but each requires several manual steps: Remove Duplicates, Find & Replace with regex, conditional column fills via VLOOKUP or array formulas. OpenRefine is the Cadillac for this kind of work and can do far more (faceted edit, cluster, undo-history), but it is a desktop install with a learning curve. This tool covers the case where you know exactly what cleanup you want, your file is reasonable in size, and you would rather not spin up a separate application to run a one-shot job. PapaParse 5.4 powers the parsing, the same library used by Tableau Public's CSV upload path and by hundreds of data-pipeline tools.

Pair this cleaner with the rest of the UDT data cluster: CSV Viewer for a quick read of the file, CSV Row Filter for predicate-based row selection, CSV to SQL to emit INSERT statements, and JSON ↔ CSV Converter for cross-format work. Use cleanup, then filter, then convert — the sequence that catches the most issues with the least re-work.

Frequently Asked Questions

Is the CSV uploaded anywhere during cleaning?+
No. The file is parsed locally by PapaParse 5.4 running in your browser, transformed in memory, and offered back as a download. Nothing is sent to any server. You can verify this in your browser's Network tab while the tool runs.
Which markers count as null when normalizing?+
By default: empty strings, NA, N/A, null, NULL, and a standalone hyphen -. The normalization is case-insensitive. You can customize the list in the options if your data uses other markers like #N/A (Excel error code) or None (Python pandas default).
How does the de-dupe step decide which row to keep?+
By default, the first occurrence of each duplicate group is kept and the rest are dropped. If you select specific key columns, rows with matching keys collapse to one regardless of what the other columns contain. There is also a Last option that keeps the most recently seen row instead of the first — useful when later rows are presumed to override earlier ones.
Can I dedupe on a subset of columns?+
Yes. By default the full row is the dedupe key. You can pick any combination of columns from a list, and two rows are considered duplicates if their selected key columns match. This is the right setting for cases like a customer list where the email column is the natural identity and other columns may legitimately differ between exports.
What does fill-forward do?+
Fill-forward replaces empty cells with the most recent non-empty value from the same column. It is useful for hierarchical exports where a parent value (department, region, category) only appears in the first row of each group and the rest are blank. After fill-forward, every row carries its parent value explicitly, which is what every downstream tool actually needs.
Does the cleaner detect the delimiter automatically?+
Yes. PapaParse 5.4 auto-detects comma, semicolon, tab, and pipe delimiters by sampling the first few lines and picking the one that yields the most consistent column count. If the auto-detection picks the wrong delimiter (rare, but it can happen with files where the wrong character is more common in the data than between fields), you can override it manually.
How big a CSV can the cleaner handle?+
There is no hard cap because the work runs in your browser. Practical limits are set by available memory; files up to a few hundred megabytes work fine on a modern laptop. Very large files (multi-gigabyte) may stall on a single tab; in those cases split the file first, clean each piece, and concatenate the results.
Can I clean a TSV or pipe-delimited file?+
Yes. Drop the file in and the parser auto-detects the delimiter. The output will use the same delimiter as the input by default. If you want to convert between delimiters at the same time as cleaning, choose the output delimiter explicitly in the options.

Built by Derek Giordano · Part of Ultimate Design Tools

Privacy Policy · Terms of Service