TidyMarkup
TidyMarkup is a Data Format that uses the TagSoup to tidy up HTML. It can be used to parse ugly HTML and return it as pretty wellformed HTML.
Camel eats our own -dog food- soap
TidyMarkup only supports the unmarshal operation as we really don't want to turn well formed HTML into ugly HTML
Example
An example where the consumer provides some HTML
from("file://site/inbox").unmarshal().tidyMarkup().to("file://site/blogs");
Dependencies
To use TidyMarkup in your camel routes you need to add the a dependency on camel-tagsoup which implements this data format.
If you use maven you could just add the following to your pom.xml, substituting the version number for the latest & greatest release (see the download page for the latest versions).
<dependency> <groupId>org.apache.camel</groupId> <artifactId>camel-tagsoup</artifactId> <version>1.6.0</version> </dependency>