TSaxon

TSaxon is a minor but convenient repackaging of Michael Kay's Saxon 6.5.5 XSLT 1.0 processor to make it understand HTML as well as XML input. TSaxon's version of saxon.jar is a drop-in replacement for Saxon's. The only changed source code is com.icl.saxon.StyleSheet.java

Say java -jar saxon.jar -H html-doc style-doc to process HTML input. All other options are as in Saxon.

If you prefer using a later version of Saxon that supports XSLT 2.0 and XQuery, you can use the standard Saxon -x org.ccil.cowan.tagsoup.Parser option, after making sure that TagSoup is on your Java classpath.