Remove Duplicate Lines
Paste any list and get a deduplicated version back, with options for case sensitivity, whitespace trimming, and preserving original order vs. sorting. Works on emails, names, URLs, log lines, code identifiers, or anything separated by newlines.
Why Deduplication Comes Up Constantly
Lists from real-world sources almost always contain duplicates. Mailing lists merged from two events. URL lists scraped across multiple pages. CSV exports run twice. The duplicates are usually invisible until you're staring at the file, and they cause downstream pain — duplicate emails get flagged as spam, duplicate URLs waste crawl budget, duplicate IDs corrupt joins. Removing them is the single most common cleanup step in a data pipeline.
The trick is that duplicate is rarely as simple as byte-identical. Alice@example.com and alice@example.com are the same email for delivery purposes but different strings. hello and hello differ by leading whitespace but represent the same line. http://example.com and http://example.com/ are equivalent URLs. The dedup options below cover the cases that come up most often.
How It Works
The input gets split on newlines into an array. Each line is then normalized according to the active options — lowercased if case-insensitive, trimmed if whitespace-insensitive — and the normalized value becomes a key in a Set. If a normalized key is already in the Set, the line is dropped; otherwise it's kept. Preserve order keeps the first occurrence of each unique line in original position; sort output returns the deduplicated set in alphabetical order (which is functionally equivalent to piping through sort | uniq on Unix).
Need to dedupe and then alphabetize? Toggle both options, or pipe through the sort lines tool after. Need to dedupe a CSV column rather than a flat list? Use the CSV viewer to extract the column first, then dedupe.
Common Use Cases
Cleaning a mailing list before import — case-insensitive dedup is the standard pattern, since email addresses are case-insensitive in the local part for almost every major provider. Removing redundant entries from a sitemap or a list of canonical URLs. Deduping a list of git commit messages to find the unique work that happened across branches. Cleaning a vocabulary list, glossary, or tag set. Removing repeated log lines so you can read the unique error patterns instead of the same stack trace repeating.
How We Compare
Unix sort -u or uniq are the canonical CLI tools and well worth learning if you spend any time in a terminal. The catch with uniq specifically is that it only collapses adjacent duplicates, so you have to sort first — which loses original order. awk '!seen[$0]++' preserves original order but is the kind of incantation people Google every time. For one-off dedup work without a terminal, a web tool is faster, and this one runs entirely in your browser with no data leaving the page.
Frequently Asked Questions
Related Tools
Related ToolSort Lines Alphabetically →Built by Derek Giordano · Part of Ultimate Design Tools