What is HTML?

HTML (abbreviation of Hypertext Markup Language) is a markup language used to create web pages. It's an application of SGML (Standard Generalized Markup Language).(read more >>)

Why would I need to clean HTML?

There are various reasons, like
  • Migration to a new website or system

    When you are a developer rewriting an old app, you need to resuse the existing HTML from the old app, but you need to reformat it. Before reformatting, it is better get rid of the old formatting. This is what HTML Washer is good for.
  • Utilizing a generated HTML

    When you export a HTML from a system, it is sometimnes very sofiticatelly formated, but perhaps you don't need such a complicated formatting
  • Utilizing a HTML from someone else

    When someone gives you HTML texts, you want the HTML structure but don't like their formatting, you need to clean it up to the basics

How do I clean the HTML?

You can do that by the copy&pasting the text or uploading a HTML file.
  • Copy-Pasting

    Go to the Homepage and copy&paste the text there, then hit the Wash button. Then Copy&paste your clean HTML.
  • Uploading

    Go to the Upload page, then uppload a HTML file, then you will be able to download the cleaned-up HTML.

What exactly does it do?

  • Fixes or removes non-well formed tags and attributes (e.g. adds alt attributes to images if missing)
  • Converts the markup to HTML5 (if it is XHTML for example)
  • Reduces the markup to: <a href>, <body>, <h1>, <h2>, <h3>, <h4>, <h5>, <h6>, <head>, <hr>, <html>, <i>, <img src width height alt>, <li>, <ol>, <p>, <ruby>, <strong>, <table>, <tbody>, <td colspan rowspan>, <th colspan rowspan>, <title>, <tr>, <ul>
  • Replaces: <b> to <strong>, <div> to <p>
  • Reformats the HTML (line breaks, indents)


Input: <p class="funny" onlick="alert('LOL')">bla bla</p>
Will be simplified to: <p>bla bla</p>