[html5] Using validator.nu as a standalone library

Trubin, Stanislav trubin at amazon.com
Thu Apr 21 17:34:37 PDT 2011


Hello all!
I am trying to build an offline solution to provide HTML4/5 validation with as little dependencies as possible. I got stuck creating a basic Java application that would use validator.nu’s core as a standalone library. I would greatly appreciate any help in providing code examples that would help creating such application.

Here is a rough outline of the program that I’d like to achieve:

Foo.java
import …

public class Foo {
     public static void main(String[] args){
           String input = “<html><head><title>test</title></head></html>”;

           // initialize parser

           // parse providing html chunk as a String input (or URL)

           // print out all results in some sort of for-loop
           for(){
                System.out.println(“Error #” + i + “: Line “ + ParserResult[i].getLineNumber() + “, Char “ + ParserResult[i].getColumnNumber() + “, Message “ + ParserResult[i].getMessage();
           }
}
}

This is how far I’ve got and got stuck:
import java.io.IOException;
import java.io.OutputStreamWriter;
import nu.validator.htmlparser.sax.HtmlParser;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;
import nu.validator.htmlparser.sax.HtmlSerializer;
import nu.validator.htmlparser.test.TreeDumpContentHandler;
import nu.validator.xml.SystemErrErrorHandler;

public class Foo {
    public static void main(String[] args) throws SAXException, IOException {

        TreeDumpContentHandler treeDumpContentHandler = new TreeDumpContentHandler(new OutputStreamWriter(System.out, "UTF-8"));
        ContentHandler serializer = new HtmlSerializer(System.out);
        SystemErrErrorHandler eh = new SystemErrErrorHandler();
        HtmlParser htmlParser = new HtmlParser();
        htmlParser.setContentHandler(serializer);
        htmlParser.setLexicalHandler(treeDumpContentHandler);
        htmlParser.setErrorHandler(eh);

        htmlParser.parse("http://www.google.com");
    }
}

I get a few basic errors back (such as un-escaped ampersand character in the URL), but nothing HTML-specific (for instance, “Attribute height not allowed on element tr at this point.”)

Thank you and best regards,
Stan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/help-whatwg.org/attachments/20110421/66f72dbd/attachment-0001.htm>


More information about the Help mailing list