HTML cleanup tool & simplifier
HTML Washer is a tool which reduces a HTML document (or fragment) to basic HTML tags and attributes – clean HTML
Paste (a piece of) HTML to clean it up
Check out our Apify actor for web scraping.
Powered by Trafilatura, a battle-tested Python library that accurately extracts main content from web pages while filtering out boilerplate like navigation, ads, and sidebars.
Ideal for building RAG pipelines, training datasets, or content analysis at scale.
Did you know? Apify offers a free tier — you get $5 to use monthly.
What exactly does it do?
- Fixes malformed HTML (unclosed tags, invalid nesting)
- Reduces the markup to:
<html lang>,<head>,<meta charset name content>,<title>,<body>,<p>,<a href title target>,<strong>,<em>,<br>,<ul>,<ol>,<li>,<h1>,<h2>,<h3>,<h4>,<h5>,<h6>,<img src alt width height>,<table>,<thead>,<tbody>,<tr>,<th colspan rowspan>,<td colspan rowspan> - Replaces:
<b>to<strong>,<i>to<em> - Reformats the HTML (line breaks, indents)