From t.broyer at gmail.com Mon Sep 10 01:41:41 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Mon, 10 Sep 2007 10:41:41 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body Message-ID: Hi all, I'm having small problems with the tree-construction test cases: it seems the expected parse errors are erroneous in some cases. Namely, in the fifth and seventh test from test1.dat ("" and ""), shouldn't there be a second parse error raised while processing the EOF token (in the "main" phase) due to the stack of open elements having two elements with second not being "body"? I'm hacking the EOF processing in Twintsam to always generate a head and body (the "Big issue" in the current draft); I now have the correct output but I generate 2 parse errors (missing doctype and unexpected EOF in head) while the tests expect just one (missing doctype). I'd say the tests (and html5lib) are wrong but maybe someone could enlighten me? or should we just ignore such errors until the "big issue" is solved? -- Thomas Broyer From annevk at opera.com Mon Sep 10 15:31:41 2007 From: annevk at opera.com (Anne van Kesteren) Date: Tue, 11 Sep 2007 00:31:41 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: On Mon, 10 Sep 2007 10:41:41 +0200, Thomas Broyer wrote: > I'd say the tests (and html5lib) are wrong but maybe someone could > enlighten me? or should we just ignore such errors until the "big > issue" is solved? Given that , , , , and are all optional in the language it didn't seem logical to make this a parse error. I like to believe I'm correct in that interpretation. (Incidentally, I also wrote the implementation. Incidentally, this was tested against testcases written by Hixie himself.) -- Anne van Kesteren From t.broyer at gmail.com Mon Sep 10 23:21:06 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Tue, 11 Sep 2007 08:21:06 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: 2007/9/11, Anne van Kesteren: > On Mon, 10 Sep 2007 10:41:41 +0200, Thomas Broyer wrote: > > I'd say the tests (and html5lib) are wrong but maybe someone could > > enlighten me? or should we just ignore such errors until the "big > > issue" is solved? > > Given that , , , , and are all > optional in the language it didn't seem logical to make this a parse > error. I like to believe I'm correct in that interpretation. > (Incidentally, I also wrote the implementation. Incidentally, this was > tested against testcases written by Hixie himself.) Hmm, that's a pretty good point! ;-) ...so let's fix the spec (or rather, note it for when we'll solve the "big issue") -- Thomas Broyer From t.broyer at gmail.com Tue Sep 11 01:00:55 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Tue, 11 Sep 2007 10:00:55 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: 2007/9/11, Thomas Broyer: > 2007/9/11, Anne van Kesteren: > > Given that , , , , and are all > > optional in the language it didn't seem logical to make this a parse > > error. I like to believe I'm correct in that interpretation. > > (Incidentally, I also wrote the implementation. Incidentally, this was > > tested against testcases written by Hixie himself.) > > Hmm, that's a pretty good point! ;-) > > ...so let's fix the spec (or rather, note it for when we'll solve the > "big issue") FYI, I've fixed it in Twintsam by testing for "head" in addition to "body" in the EOF case of the main phase. The spec could read (changes marked with ): <<< An end-of-file token Generate implied end tags. If there are more than two nodes on the stack of open elements, or if there are two nodes but the second node is not a head node or a body node, this is a parse error. Otherwise, if the parser was originally created as part of the HTML fragment parsing algorithm, and there's more than one element in the stack of open elements, and the second node on the stack of open elements is not a head node or a body node, then this is a parse error. (fragment case) Stop parsing. >>> Note that I've also changed the "fragment case", though I'm really not sure it should be changed that way too. At least it doesn't change anything in the available test cases (or rather it doesn't make Twintsam pass/fail more/less tests; but Twintsam is far from finished). N.B.: If you're interested in how Twintsam handles EOF (and how it ensures every produced document has a head and a body), look for "ProcessEndOfFile" in Keep in mind that the HtmlReader class is a System.Xml.XmlReader subclass and that it "generates tokens" (its goal is to "fix" the markup to produce well-formed XML). I'll soon add a tree-builder class to complement the HtmlReader and handle reparenting cases (title goes into the head, things inside a table but not in a cell are moved outside the table, etc.) I'm not yet sure it's even feasible, but let's try doing it. -- Thomas Broyer From annevk at opera.com Tue Sep 11 02:38:48 2007 From: annevk at opera.com (Anne van Kesteren) Date: Tue, 11 Sep 2007 11:38:48 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: On Tue, 11 Sep 2007 10:00:55 +0200, Thomas Broyer wrote: > FYI, I've fixed it in Twintsam by testing for "head" in addition to > "body" in the EOF case of the main phase. The spec could read (changes > marked with ): FWIW, I would like the specification to reflect html5lib where we did away with insertion modes and turned them all into phases (as the note in the specification suggests). I don't feel too strongly about it, but I think it would make the specification easier to read and maybe also more straightforward to implement. -- Anne van Kesterend From t.broyer at gmail.com Tue Sep 11 03:24:41 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Tue, 11 Sep 2007 12:24:41 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: 2007/9/11, Anne van Kesteren: > On Tue, 11 Sep 2007 10:00:55 +0200, Thomas Broyer > wrote: > > FYI, I've fixed it in Twintsam by testing for "head" in addition to > > "body" in the EOF case of the main phase. The spec could read (changes > > marked with ): > > FWIW, I would like the specification to reflect html5lib where we did away > with insertion modes and turned them all into phases (as the note in the > specification suggests). I don't feel too strongly about it, but I think > it would make the specification easier to read and maybe also more > straightforward to implement. Well, having separate phases and insertion modes allows for switching from any phase back to the main phase without loosing the insertion mode (for instance, I implemented the "general CDATA/RCDATA parsing algorithm" as an additional phase) and without having to deal with storing the "phase where you were in when were switched to the XXX phase", which doesn't make the specification easier to read (YMMV). There's such a "switch back to the attribute value state that you were in when were switched into this state" in the tokenisation section which is a bit of a mess: why doesn't the "consume an entity" algorithm deal with the "if nothing is returned" case and the "entity in attribute value" and the "entity data state" just go away? On the other hand, adapting the "global" EOF case in the main phase to always build head and body elements is trivial (at least for the head, since we have a "head element pointer"; it's a bit less easier for the body because of the body/frameset duality, but it could be solved by just looking at the insertion mode: the insertion is never switched back to "before head", "in head" or "after head" ?there only are "process as if we were in the XXX insertion mode" instructions?, so if, at EOF, the insertion mode is one of these three values, it means the tree has no body or frameset element, and we can safely append a body element without attributes to the root node). Proposed wording: <<< End end-of-file token: Generate implied end tags. If there are more than two nodes on the stack of open elements, or if there are two nodes but the second node is not a head node or a body node, this is a parse error. Otherwise, if the parser was originally created as part of the HTML fragment parsing algorithm, and there's more than one element in the stack of open elements, and the second node on the stack of open elements is not a head node or a body node, then this is a parse error. (fragment case) If the head element pointer is null, create an element node with the tag name "head" and append it to the first element in the stack of open elements (the html element). If the insertion mode is one of "before head", "in head", "in head noscript" or "after head", create an element node with the tag name "body" and append it to the first element in the stack of open elements (the html element). Stop parsing. >>> It could also be solved with "act as if a XXX token with the tag name YYY and no attribute had been seen and reprocess the current token" (which would be more accurate given that the argument of not generating a parse error is that head, body and html start and end tags are optional): <<< If the insertion mode is "before head", act as if a start tag token with the tag name "head" and no attribute had been seen and reprocess the current token. Otherwise, if the insertion mode is "in head noscript", act as if an end tag token with the tag name "noscript" had been seen and reprocess the current token. Otherwise, if the insertion mode is "in head" or "after head", act as if a start tag token with the tag name "body" and no attribute had been seen and reprocess the current token. >>> No need to duplicate the whole thing into the fifteen insertion modes with only small variations in four of them. N.B.: there probably needs to be some special handling for the "fragment case", in which one I suppose the head element shouldn't always be implied. -- Thomas Broyer From hsivonen at iki.fi Tue Sep 11 03:34:07 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Tue, 11 Sep 2007 13:34:07 +0300 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: On Sep 11, 2007, at 13:24, Thomas Broyer wrote: > Well, having separate phases and insertion modes allows for switching > from any phase back to the main phase without loosing the insertion > mode (for instance, I implemented the "general CDATA/RCDATA parsing > algorithm" as an additional phase) and without having to deal with > storing the "phase where you were in when were switched to the XXX > phase", which doesn't make the specification easier to read (YMMV). I agree with Anne. I also flattened phases and modes and introduced a variable for remembering the phase the tree builder was in before switching to trailing end. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From hsivonen at iki.fi Tue Sep 11 07:04:22 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Tue, 11 Sep 2007 17:04:22 +0300 Subject: [imps] Validation result format for review Message-ID: <25E94F4F-CE00-468E-9D55-634BF597BE95@iki.fi> I'd like to enable the use of Validator.nu as a RESTful Web service. To this end, I have designed a Validator.nu-native XML response format: http://wiki.whatwg.org/wiki/Validator.nu_XML_Output I'd appreciate comments on the format--especially from people who can foresee wanting to write clients. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From hsivonen at iki.fi Tue Sep 11 12:12:33 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Tue, 11 Sep 2007 22:12:33 +0300 Subject: [imps] Validation result format for review In-Reply-To: References: <25E94F4F-CE00-468E-9D55-634BF597BE95@iki.fi> Message-ID: <1A987CD4-E561-46CF-969C-5120766B1530@iki.fi> On Sep 11, 2007, at 21:22, ryan wrote: > On Sep 11, 2007, at 7:04 AM, Henri Sivonen wrote: > >> I'd like to enable the use of Validator.nu as a RESTful Web service. >> To this end, I have designed a Validator.nu-native XML response >> format: >> http://wiki.whatwg.org/wiki/Validator.nu_XML_Output >> >> I'd appreciate comments on the format--especially from people who can >> foresee wanting to write clients. > > I notice that you're reusing vocabulary from HTML, why not just use > HTML? I already offer class-annotated HTML and XHTML output (append &out=xhtml to the URI to get XHTML). Recently, I added both POSTing content and plain text output (append &out=text to the URI) intended to be dumpable to terminal and then human readable. So far, I've observed that in a Web service context, people (n=2) prefer scraping plain text over scraping HTML or XHTML. This suggests to me that (X)HTML is too crufty for the purpose. (Am I right? Lachy? Philip?) However, the plain text format isn't really designed for safe scraping. I am assuming that minimally crufty custom XML format and a custom JSON format would be best fits for the Web service scenario as they'd be more reliable than scraping the text output ad hoc and less crufty than (X)HTML. Moreover, as a (perhaps silly) design principle, I have decided that the XML format should expose all features to the point that one could theoretically re-create the HTML front-end with the XML service. The ability to support the planned-but-unimplemented elaboration feature in XML is there mostly for completeness as it will be easy to throw in there once the feature exists for (X)HTML. I'm not planning on exposing the HTML elaboration in JSON. And based on IRC comments today, I may abandon this principle as far as the parse tree goes. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From hsivonen at iki.fi Wed Sep 12 06:14:22 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Wed, 12 Sep 2007 16:14:22 +0300 Subject: [imps] Another validation result format for review Message-ID: I'd like to enable the use of Validator.nu as a RESTful Web service. To this end, I have designed a Validator.nu-native JSON response format: http://wiki.whatwg.org/wiki/Validator.nu_JSON_Output I'd appreciate comments on the format--especially from people who can foresee wanting to write clients. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From t.broyer at gmail.com Wed Sep 12 09:10:17 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Wed, 12 Sep 2007 18:10:17 +0200 Subject: [imps] Another validation result format for review In-Reply-To: References: Message-ID: Hi Henri, 2007/9/12, Henri Sivonen: > I'd like to enable the use of Validator.nu as a RESTful Web service. > To this end, I have designed a Validator.nu-native JSON response format: > http://wiki.whatwg.org/wiki/Validator.nu_JSON_Output I just skimmed through your two docs and the first comment that came to mind is: why is he using types/subtypes rather than a few more types? type:info, no subtype => type:info type:info, subtype:warning => type:warning type:error, no subtype => type:error type:error, subtype:fatal => type:fatal-error type:non-document-error, no subtype => type:non-document-error type:non-document-error, subtype:io => type:io-error type:non-document-error, subtype:schema => type:schema-error type:non-document-error, subtype:internal => type:internal-error My second comment is: it seems the "indeterminate" result is dependent upon a non-document-error message; couldn't they be merged? I.e. there's no type:non-document-error and, if a "non document error" happen, the result:indeterminate has specific properties related to the "non document error" which leaded to this state. Example in JSON: "result": { "type": "indeterminate", "errors": [ { "type": "io", "message": "...", "url": "..." } ], } > I'd appreciate comments on the format--especially from people who can > foresee wanting to write clients. I'm not such a person, but I thought you could nevertheless be interested by my comments ;-) -- Thomas Broyer From hsivonen at iki.fi Wed Sep 12 10:53:50 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Wed, 12 Sep 2007 20:53:50 +0300 Subject: [imps] Another validation result format for review In-Reply-To: References: Message-ID: <4ED5E49C-E59A-4AF7-A862-0CD12A82BF6C@iki.fi> On Sep 12, 2007, at 19:10, Thomas Broyer wrote: > 2007/9/12, Henri Sivonen: >> I'd like to enable the use of Validator.nu as a RESTful Web service. >> To this end, I have designed a Validator.nu-native JSON response >> format: >> http://wiki.whatwg.org/wiki/Validator.nu_JSON_Output > > I just skimmed through your two docs and the first comment that came > to mind is: why is he using types/subtypes rather than a few more > types? > type:info, no subtype => type:info > type:info, subtype:warning => type:warning > type:error, no subtype => type:error > type:error, subtype:fatal => type:fatal-error > type:non-document-error, no subtype => type:non-document-error > type:non-document-error, subtype:io => type:io-error > type:non-document-error, subtype:schema => type:schema-error > type:non-document-error, subtype:internal => type:internal-error Yeah, that's exactly the current internal flat taxonomy. The reason for the design in forward-compatible extensibility. The assumption is that the three main types will be cast in concrete, but subtypes may be added without breaking client written to the current spec. It looks cruftier in JSON than in XML, though. :-/ > My second comment is: it seems the "indeterminate" result is dependent > upon a non-document-error message; couldn't they be merged? I.e. > there's no type:non-document-error and, if a "non document error" > happen, the result:indeterminate has specific properties related to > the "non document error" which leaded to this state. Example in JSON: > "result": { > "type": "indeterminate", > "errors": [ > { "type": "io", "message": "...", "url": "..." } > ], > } The type of the result is completely redundant. It could be computed by the client from the top-level message types. My initial design didn't have explicit results at all due to this redundancy. I introduced explicit results for two reasons: 1) To carry the same human-readable message that you get from the (X) HTML output. 2) To make it trivial for clients to query the result format for the overall result. I hesitate merging non-document-errors into results, because the results would have to take on locator features (at least url for IO errors) that messages already have. However, if potential users of the Web service interface don't care about my human-readable characterization for the result (they differ for the HTML5 facet and for the generic facet?that's all), I'd be happy to zap the precomputed result altogether from the XML and JSON formats. Do people care about the precomputed result and the associated UI- level message? >> I'd appreciate comments on the format--especially from people who can >> foresee wanting to write clients. > > I'm not such a person, but I thought you could nevertheless be > interested by my comments ;-) I am. Thank you. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From lachlan.hunt at lachy.id.au Tue Sep 18 07:39:32 2007 From: lachlan.hunt at lachy.id.au (Lachlan Hunt) Date: Wed, 19 Sep 2007 00:39:32 +1000 Subject: [imps] Table Inspector Bug Message-ID: <46EFE324.5060504@lachy.id.au> (Resending to the correct mailing list address, sorry for the duplicate James) Hi, There's a strange bug in the Table Inspector [1]. I discovered this bug while inspecting a table for a TV Guide [2]. When there's a comment inside a , the associations between cells and headers gets messed up. Compare the results of the following 2 tables. In table 1, with the comment, using either the HTML4, HTML5 or Experemental algorithm: * Cell A does not get associated with any headers * Cell B is associated with headers X and ROW * Cell C is associated with headers Y and ROW In Table 2, without the comment, the cells are associated as expected. * Cell A is associated with headers X and ROW * Cell B is associated with headers Y and ROW * Cell C is associated with headers Z and ROW In both cases, the Smart Colspan algorithm behaves the same, except that it doesn't associate any cells with the header ROW. Table 1:
X Y Z
ROW A B C
Table 2:
X Y Z
ROW A B C
[1] http://james.html5.org/tables/table_inspector.html [2] http://www.ebroadcast.com.au/tv/static/SydneyNight.html (Note: unfortunately, that page uses so many layout tables and scripts to generate its content, it seems to overload the table inspector and is not possible to analyse the table directly. It works if you serialise the DOM from the browser and remove the noise, such as font and script elements, and irrelevant attributes.) -- Lachlan Hunt http://lachy.id.au/ From westonruter at gmail.com Sun Sep 30 11:19:43 2007 From: westonruter at gmail.com (Weston Ruter) Date: Sun, 30 Sep 2007 11:19:43 -0700 Subject: [imps] XHTML 1.0 + Web Forms 2.0 DTDs Message-ID: I've modified the XHTML 1.0 Strict and Transitional DTDs to include the changes specified by the current draft of Web Forms 2.0. XHTML 1.0 Strict + Web Forms 2.0: http://webforms2.googlecode.com/svn/trunk/DTD/xhtml1-strict-wf2.dtd XHTML 1.0 Transitional + Web Forms 2.0: http://webforms2.googlecode.com/svn/trunk/DTD/xhtml1-transitional-wf2.dtd I'd appreciate any feedback or suggestions you may have. Weston -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.broyer at gmail.com Mon Sep 10 01:41:41 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Mon, 10 Sep 2007 10:41:41 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body Message-ID: Hi all, I'm having small problems with the tree-construction test cases: it seems the expected parse errors are erroneous in some cases. Namely, in the fifth and seventh test from test1.dat ("" and ""), shouldn't there be a second parse error raised while processing the EOF token (in the "main" phase) due to the stack of open elements having two elements with second not being "body"? I'm hacking the EOF processing in Twintsam to always generate a head and body (the "Big issue" in the current draft); I now have the correct output but I generate 2 parse errors (missing doctype and unexpected EOF in head) while the tests expect just one (missing doctype). I'd say the tests (and html5lib) are wrong but maybe someone could enlighten me? or should we just ignore such errors until the "big issue" is solved? -- Thomas Broyer From annevk at opera.com Mon Sep 10 15:31:41 2007 From: annevk at opera.com (Anne van Kesteren) Date: Tue, 11 Sep 2007 00:31:41 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: On Mon, 10 Sep 2007 10:41:41 +0200, Thomas Broyer wrote: > I'd say the tests (and html5lib) are wrong but maybe someone could > enlighten me? or should we just ignore such errors until the "big > issue" is solved? Given that , , , , and are all optional in the language it didn't seem logical to make this a parse error. I like to believe I'm correct in that interpretation. (Incidentally, I also wrote the implementation. Incidentally, this was tested against testcases written by Hixie himself.) -- Anne van Kesteren From t.broyer at gmail.com Mon Sep 10 23:21:06 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Tue, 11 Sep 2007 08:21:06 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: 2007/9/11, Anne van Kesteren: > On Mon, 10 Sep 2007 10:41:41 +0200, Thomas Broyer wrote: > > I'd say the tests (and html5lib) are wrong but maybe someone could > > enlighten me? or should we just ignore such errors until the "big > > issue" is solved? > > Given that , , , , and are all > optional in the language it didn't seem logical to make this a parse > error. I like to believe I'm correct in that interpretation. > (Incidentally, I also wrote the implementation. Incidentally, this was > tested against testcases written by Hixie himself.) Hmm, that's a pretty good point! ;-) ...so let's fix the spec (or rather, note it for when we'll solve the "big issue") -- Thomas Broyer From t.broyer at gmail.com Tue Sep 11 01:00:55 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Tue, 11 Sep 2007 10:00:55 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: 2007/9/11, Thomas Broyer: > 2007/9/11, Anne van Kesteren: > > Given that , , , , and are all > > optional in the language it didn't seem logical to make this a parse > > error. I like to believe I'm correct in that interpretation. > > (Incidentally, I also wrote the implementation. Incidentally, this was > > tested against testcases written by Hixie himself.) > > Hmm, that's a pretty good point! ;-) > > ...so let's fix the spec (or rather, note it for when we'll solve the > "big issue") FYI, I've fixed it in Twintsam by testing for "head" in addition to "body" in the EOF case of the main phase. The spec could read (changes marked with ): <<< An end-of-file token Generate implied end tags. If there are more than two nodes on the stack of open elements, or if there are two nodes but the second node is not a head node or a body node, this is a parse error. Otherwise, if the parser was originally created as part of the HTML fragment parsing algorithm, and there's more than one element in the stack of open elements, and the second node on the stack of open elements is not a head node or a body node, then this is a parse error. (fragment case) Stop parsing. >>> Note that I've also changed the "fragment case", though I'm really not sure it should be changed that way too. At least it doesn't change anything in the available test cases (or rather it doesn't make Twintsam pass/fail more/less tests; but Twintsam is far from finished). N.B.: If you're interested in how Twintsam handles EOF (and how it ensures every produced document has a head and a body), look for "ProcessEndOfFile" in Keep in mind that the HtmlReader class is a System.Xml.XmlReader subclass and that it "generates tokens" (its goal is to "fix" the markup to produce well-formed XML). I'll soon add a tree-builder class to complement the HtmlReader and handle reparenting cases (title goes into the head, things inside a table but not in a cell are moved outside the table, etc.) I'm not yet sure it's even feasible, but let's try doing it. -- Thomas Broyer From annevk at opera.com Tue Sep 11 02:38:48 2007 From: annevk at opera.com (Anne van Kesteren) Date: Tue, 11 Sep 2007 11:38:48 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: On Tue, 11 Sep 2007 10:00:55 +0200, Thomas Broyer wrote: > FYI, I've fixed it in Twintsam by testing for "head" in addition to > "body" in the EOF case of the main phase. The spec could read (changes > marked with ): FWIW, I would like the specification to reflect html5lib where we did away with insertion modes and turned them all into phases (as the note in the specification suggests). I don't feel too strongly about it, but I think it would make the specification easier to read and maybe also more straightforward to implement. -- Anne van Kesterend From t.broyer at gmail.com Tue Sep 11 03:24:41 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Tue, 11 Sep 2007 12:24:41 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: 2007/9/11, Anne van Kesteren: > On Tue, 11 Sep 2007 10:00:55 +0200, Thomas Broyer > wrote: > > FYI, I've fixed it in Twintsam by testing for "head" in addition to > > "body" in the EOF case of the main phase. The spec could read (changes > > marked with ): > > FWIW, I would like the specification to reflect html5lib where we did away > with insertion modes and turned them all into phases (as the note in the > specification suggests). I don't feel too strongly about it, but I think > it would make the specification easier to read and maybe also more > straightforward to implement. Well, having separate phases and insertion modes allows for switching from any phase back to the main phase without loosing the insertion mode (for instance, I implemented the "general CDATA/RCDATA parsing algorithm" as an additional phase) and without having to deal with storing the "phase where you were in when were switched to the XXX phase", which doesn't make the specification easier to read (YMMV). There's such a "switch back to the attribute value state that you were in when were switched into this state" in the tokenisation section which is a bit of a mess: why doesn't the "consume an entity" algorithm deal with the "if nothing is returned" case and the "entity in attribute value" and the "entity data state" just go away? On the other hand, adapting the "global" EOF case in the main phase to always build head and body elements is trivial (at least for the head, since we have a "head element pointer"; it's a bit less easier for the body because of the body/frameset duality, but it could be solved by just looking at the insertion mode: the insertion is never switched back to "before head", "in head" or "after head" ?there only are "process as if we were in the XXX insertion mode" instructions?, so if, at EOF, the insertion mode is one of these three values, it means the tree has no body or frameset element, and we can safely append a body element without attributes to the root node). Proposed wording: <<< End end-of-file token: Generate implied end tags. If there are more than two nodes on the stack of open elements, or if there are two nodes but the second node is not a head node or a body node, this is a parse error. Otherwise, if the parser was originally created as part of the HTML fragment parsing algorithm, and there's more than one element in the stack of open elements, and the second node on the stack of open elements is not a head node or a body node, then this is a parse error. (fragment case) If the head element pointer is null, create an element node with the tag name "head" and append it to the first element in the stack of open elements (the html element). If the insertion mode is one of "before head", "in head", "in head noscript" or "after head", create an element node with the tag name "body" and append it to the first element in the stack of open elements (the html element). Stop parsing. >>> It could also be solved with "act as if a XXX token with the tag name YYY and no attribute had been seen and reprocess the current token" (which would be more accurate given that the argument of not generating a parse error is that head, body and html start and end tags are optional): <<< If the insertion mode is "before head", act as if a start tag token with the tag name "head" and no attribute had been seen and reprocess the current token. Otherwise, if the insertion mode is "in head noscript", act as if an end tag token with the tag name "noscript" had been seen and reprocess the current token. Otherwise, if the insertion mode is "in head" or "after head", act as if a start tag token with the tag name "body" and no attribute had been seen and reprocess the current token. >>> No need to duplicate the whole thing into the fifteen insertion modes with only small variations in four of them. N.B.: there probably needs to be some special handling for the "fragment case", in which one I suppose the head element shouldn't always be implied. -- Thomas Broyer From hsivonen at iki.fi Tue Sep 11 03:34:07 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Tue, 11 Sep 2007 13:34:07 +0300 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: On Sep 11, 2007, at 13:24, Thomas Broyer wrote: > Well, having separate phases and insertion modes allows for switching > from any phase back to the main phase without loosing the insertion > mode (for instance, I implemented the "general CDATA/RCDATA parsing > algorithm" as an additional phase) and without having to deal with > storing the "phase where you were in when were switched to the XXX > phase", which doesn't make the specification easier to read (YMMV). I agree with Anne. I also flattened phases and modes and introduced a variable for remembering the phase the tree builder was in before switching to trailing end. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From hsivonen at iki.fi Tue Sep 11 07:04:22 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Tue, 11 Sep 2007 17:04:22 +0300 Subject: [imps] Validation result format for review Message-ID: <25E94F4F-CE00-468E-9D55-634BF597BE95@iki.fi> I'd like to enable the use of Validator.nu as a RESTful Web service. To this end, I have designed a Validator.nu-native XML response format: http://wiki.whatwg.org/wiki/Validator.nu_XML_Output I'd appreciate comments on the format--especially from people who can foresee wanting to write clients. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From hsivonen at iki.fi Tue Sep 11 12:12:33 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Tue, 11 Sep 2007 22:12:33 +0300 Subject: [imps] Validation result format for review In-Reply-To: References: <25E94F4F-CE00-468E-9D55-634BF597BE95@iki.fi> Message-ID: <1A987CD4-E561-46CF-969C-5120766B1530@iki.fi> On Sep 11, 2007, at 21:22, ryan wrote: > On Sep 11, 2007, at 7:04 AM, Henri Sivonen wrote: > >> I'd like to enable the use of Validator.nu as a RESTful Web service. >> To this end, I have designed a Validator.nu-native XML response >> format: >> http://wiki.whatwg.org/wiki/Validator.nu_XML_Output >> >> I'd appreciate comments on the format--especially from people who can >> foresee wanting to write clients. > > I notice that you're reusing vocabulary from HTML, why not just use > HTML? I already offer class-annotated HTML and XHTML output (append &out=xhtml to the URI to get XHTML). Recently, I added both POSTing content and plain text output (append &out=text to the URI) intended to be dumpable to terminal and then human readable. So far, I've observed that in a Web service context, people (n=2) prefer scraping plain text over scraping HTML or XHTML. This suggests to me that (X)HTML is too crufty for the purpose. (Am I right? Lachy? Philip?) However, the plain text format isn't really designed for safe scraping. I am assuming that minimally crufty custom XML format and a custom JSON format would be best fits for the Web service scenario as they'd be more reliable than scraping the text output ad hoc and less crufty than (X)HTML. Moreover, as a (perhaps silly) design principle, I have decided that the XML format should expose all features to the point that one could theoretically re-create the HTML front-end with the XML service. The ability to support the planned-but-unimplemented elaboration feature in XML is there mostly for completeness as it will be easy to throw in there once the feature exists for (X)HTML. I'm not planning on exposing the HTML elaboration in JSON. And based on IRC comments today, I may abandon this principle as far as the parse tree goes. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From hsivonen at iki.fi Wed Sep 12 06:14:22 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Wed, 12 Sep 2007 16:14:22 +0300 Subject: [imps] Another validation result format for review Message-ID: I'd like to enable the use of Validator.nu as a RESTful Web service. To this end, I have designed a Validator.nu-native JSON response format: http://wiki.whatwg.org/wiki/Validator.nu_JSON_Output I'd appreciate comments on the format--especially from people who can foresee wanting to write clients. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From t.broyer at gmail.com Wed Sep 12 09:10:17 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Wed, 12 Sep 2007 18:10:17 +0200 Subject: [imps] Another validation result format for review In-Reply-To: References: Message-ID: Hi Henri, 2007/9/12, Henri Sivonen: > I'd like to enable the use of Validator.nu as a RESTful Web service. > To this end, I have designed a Validator.nu-native JSON response format: > http://wiki.whatwg.org/wiki/Validator.nu_JSON_Output I just skimmed through your two docs and the first comment that came to mind is: why is he using types/subtypes rather than a few more types? type:info, no subtype => type:info type:info, subtype:warning => type:warning type:error, no subtype => type:error type:error, subtype:fatal => type:fatal-error type:non-document-error, no subtype => type:non-document-error type:non-document-error, subtype:io => type:io-error type:non-document-error, subtype:schema => type:schema-error type:non-document-error, subtype:internal => type:internal-error My second comment is: it seems the "indeterminate" result is dependent upon a non-document-error message; couldn't they be merged? I.e. there's no type:non-document-error and, if a "non document error" happen, the result:indeterminate has specific properties related to the "non document error" which leaded to this state. Example in JSON: "result": { "type": "indeterminate", "errors": [ { "type": "io", "message": "...", "url": "..." } ], } > I'd appreciate comments on the format--especially from people who can > foresee wanting to write clients. I'm not such a person, but I thought you could nevertheless be interested by my comments ;-) -- Thomas Broyer From hsivonen at iki.fi Wed Sep 12 10:53:50 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Wed, 12 Sep 2007 20:53:50 +0300 Subject: [imps] Another validation result format for review In-Reply-To: References: Message-ID: <4ED5E49C-E59A-4AF7-A862-0CD12A82BF6C@iki.fi> On Sep 12, 2007, at 19:10, Thomas Broyer wrote: > 2007/9/12, Henri Sivonen: >> I'd like to enable the use of Validator.nu as a RESTful Web service. >> To this end, I have designed a Validator.nu-native JSON response >> format: >> http://wiki.whatwg.org/wiki/Validator.nu_JSON_Output > > I just skimmed through your two docs and the first comment that came > to mind is: why is he using types/subtypes rather than a few more > types? > type:info, no subtype => type:info > type:info, subtype:warning => type:warning > type:error, no subtype => type:error > type:error, subtype:fatal => type:fatal-error > type:non-document-error, no subtype => type:non-document-error > type:non-document-error, subtype:io => type:io-error > type:non-document-error, subtype:schema => type:schema-error > type:non-document-error, subtype:internal => type:internal-error Yeah, that's exactly the current internal flat taxonomy. The reason for the design in forward-compatible extensibility. The assumption is that the three main types will be cast in concrete, but subtypes may be added without breaking client written to the current spec. It looks cruftier in JSON than in XML, though. :-/ > My second comment is: it seems the "indeterminate" result is dependent > upon a non-document-error message; couldn't they be merged? I.e. > there's no type:non-document-error and, if a "non document error" > happen, the result:indeterminate has specific properties related to > the "non document error" which leaded to this state. Example in JSON: > "result": { > "type": "indeterminate", > "errors": [ > { "type": "io", "message": "...", "url": "..." } > ], > } The type of the result is completely redundant. It could be computed by the client from the top-level message types. My initial design didn't have explicit results at all due to this redundancy. I introduced explicit results for two reasons: 1) To carry the same human-readable message that you get from the (X) HTML output. 2) To make it trivial for clients to query the result format for the overall result. I hesitate merging non-document-errors into results, because the results would have to take on locator features (at least url for IO errors) that messages already have. However, if potential users of the Web service interface don't care about my human-readable characterization for the result (they differ for the HTML5 facet and for the generic facet?that's all), I'd be happy to zap the precomputed result altogether from the XML and JSON formats. Do people care about the precomputed result and the associated UI- level message? >> I'd appreciate comments on the format--especially from people who can >> foresee wanting to write clients. > > I'm not such a person, but I thought you could nevertheless be > interested by my comments ;-) I am. Thank you. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From lachlan.hunt at lachy.id.au Tue Sep 18 07:39:32 2007 From: lachlan.hunt at lachy.id.au (Lachlan Hunt) Date: Wed, 19 Sep 2007 00:39:32 +1000 Subject: [imps] Table Inspector Bug Message-ID: <46EFE324.5060504@lachy.id.au> (Resending to the correct mailing list address, sorry for the duplicate James) Hi, There's a strange bug in the Table Inspector [1]. I discovered this bug while inspecting a table for a TV Guide [2]. When there's a comment inside a , the associations between cells and headers gets messed up. Compare the results of the following 2 tables. In table 1, with the comment, using either the HTML4, HTML5 or Experemental algorithm: * Cell A does not get associated with any headers * Cell B is associated with headers X and ROW * Cell C is associated with headers Y and ROW In Table 2, without the comment, the cells are associated as expected. * Cell A is associated with headers X and ROW * Cell B is associated with headers Y and ROW * Cell C is associated with headers Z and ROW In both cases, the Smart Colspan algorithm behaves the same, except that it doesn't associate any cells with the header ROW. Table 1:
X Y Z
ROW A B C
Table 2:
X Y Z
ROW A B C
[1] http://james.html5.org/tables/table_inspector.html [2] http://www.ebroadcast.com.au/tv/static/SydneyNight.html (Note: unfortunately, that page uses so many layout tables and scripts to generate its content, it seems to overload the table inspector and is not possible to analyse the table directly. It works if you serialise the DOM from the browser and remove the noise, such as font and script elements, and irrelevant attributes.) -- Lachlan Hunt http://lachy.id.au/ From westonruter at gmail.com Sun Sep 30 11:19:43 2007 From: westonruter at gmail.com (Weston Ruter) Date: Sun, 30 Sep 2007 11:19:43 -0700 Subject: [imps] XHTML 1.0 + Web Forms 2.0 DTDs Message-ID: I've modified the XHTML 1.0 Strict and Transitional DTDs to include the changes specified by the current draft of Web Forms 2.0. XHTML 1.0 Strict + Web Forms 2.0: http://webforms2.googlecode.com/svn/trunk/DTD/xhtml1-strict-wf2.dtd XHTML 1.0 Transitional + Web Forms 2.0: http://webforms2.googlecode.com/svn/trunk/DTD/xhtml1-transitional-wf2.dtd I'd appreciate any feedback or suggestions you may have. Weston -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.broyer at gmail.com Mon Sep 10 01:41:41 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Mon, 10 Sep 2007 10:41:41 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body Message-ID: Hi all, I'm having small problems with the tree-construction test cases: it seems the expected parse errors are erroneous in some cases. Namely, in the fifth and seventh test from test1.dat ("" and ""), shouldn't there be a second parse error raised while processing the EOF token (in the "main" phase) due to the stack of open elements having two elements with second not being "body"? I'm hacking the EOF processing in Twintsam to always generate a head and body (the "Big issue" in the current draft); I now have the correct output but I generate 2 parse errors (missing doctype and unexpected EOF in head) while the tests expect just one (missing doctype). I'd say the tests (and html5lib) are wrong but maybe someone could enlighten me? or should we just ignore such errors until the "big issue" is solved? -- Thomas Broyer From annevk at opera.com Mon Sep 10 15:31:41 2007 From: annevk at opera.com (Anne van Kesteren) Date: Tue, 11 Sep 2007 00:31:41 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: On Mon, 10 Sep 2007 10:41:41 +0200, Thomas Broyer wrote: > I'd say the tests (and html5lib) are wrong but maybe someone could > enlighten me? or should we just ignore such errors until the "big > issue" is solved? Given that , , , , and are all optional in the language it didn't seem logical to make this a parse error. I like to believe I'm correct in that interpretation. (Incidentally, I also wrote the implementation. Incidentally, this was tested against testcases written by Hixie himself.) -- Anne van Kesteren From t.broyer at gmail.com Mon Sep 10 23:21:06 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Tue, 11 Sep 2007 08:21:06 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: 2007/9/11, Anne van Kesteren: > On Mon, 10 Sep 2007 10:41:41 +0200, Thomas Broyer wrote: > > I'd say the tests (and html5lib) are wrong but maybe someone could > > enlighten me? or should we just ignore such errors until the "big > > issue" is solved? > > Given that , , , , and are all > optional in the language it didn't seem logical to make this a parse > error. I like to believe I'm correct in that interpretation. > (Incidentally, I also wrote the implementation. Incidentally, this was > tested against testcases written by Hixie himself.) Hmm, that's a pretty good point! ;-) ...so let's fix the spec (or rather, note it for when we'll solve the "big issue") -- Thomas Broyer From t.broyer at gmail.com Tue Sep 11 01:00:55 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Tue, 11 Sep 2007 10:00:55 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: 2007/9/11, Thomas Broyer: > 2007/9/11, Anne van Kesteren: > > Given that , , , , and are all > > optional in the language it didn't seem logical to make this a parse > > error. I like to believe I'm correct in that interpretation. > > (Incidentally, I also wrote the implementation. Incidentally, this was > > tested against testcases written by Hixie himself.) > > Hmm, that's a pretty good point! ;-) > > ...so let's fix the spec (or rather, note it for when we'll solve the > "big issue") FYI, I've fixed it in Twintsam by testing for "head" in addition to "body" in the EOF case of the main phase. The spec could read (changes marked with ): <<< An end-of-file token Generate implied end tags. If there are more than two nodes on the stack of open elements, or if there are two nodes but the second node is not a head node or a body node, this is a parse error. Otherwise, if the parser was originally created as part of the HTML fragment parsing algorithm, and there's more than one element in the stack of open elements, and the second node on the stack of open elements is not a head node or a body node, then this is a parse error. (fragment case) Stop parsing. >>> Note that I've also changed the "fragment case", though I'm really not sure it should be changed that way too. At least it doesn't change anything in the available test cases (or rather it doesn't make Twintsam pass/fail more/less tests; but Twintsam is far from finished). N.B.: If you're interested in how Twintsam handles EOF (and how it ensures every produced document has a head and a body), look for "ProcessEndOfFile" in Keep in mind that the HtmlReader class is a System.Xml.XmlReader subclass and that it "generates tokens" (its goal is to "fix" the markup to produce well-formed XML). I'll soon add a tree-builder class to complement the HtmlReader and handle reparenting cases (title goes into the head, things inside a table but not in a cell are moved outside the table, etc.) I'm not yet sure it's even feasible, but let's try doing it. -- Thomas Broyer From annevk at opera.com Tue Sep 11 02:38:48 2007 From: annevk at opera.com (Anne van Kesteren) Date: Tue, 11 Sep 2007 11:38:48 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: On Tue, 11 Sep 2007 10:00:55 +0200, Thomas Broyer wrote: > FYI, I've fixed it in Twintsam by testing for "head" in addition to > "body" in the EOF case of the main phase. The spec could read (changes > marked with ): FWIW, I would like the specification to reflect html5lib where we did away with insertion modes and turned them all into phases (as the note in the specification suggests). I don't feel too strongly about it, but I think it would make the specification easier to read and maybe also more straightforward to implement. -- Anne van Kesterend From t.broyer at gmail.com Tue Sep 11 03:24:41 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Tue, 11 Sep 2007 12:24:41 +0200 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: 2007/9/11, Anne van Kesteren: > On Tue, 11 Sep 2007 10:00:55 +0200, Thomas Broyer > wrote: > > FYI, I've fixed it in Twintsam by testing for "head" in addition to > > "body" in the EOF case of the main phase. The spec could read (changes > > marked with ): > > FWIW, I would like the specification to reflect html5lib where we did away > with insertion modes and turned them all into phases (as the note in the > specification suggests). I don't feel too strongly about it, but I think > it would make the specification easier to read and maybe also more > straightforward to implement. Well, having separate phases and insertion modes allows for switching from any phase back to the main phase without loosing the insertion mode (for instance, I implemented the "general CDATA/RCDATA parsing algorithm" as an additional phase) and without having to deal with storing the "phase where you were in when were switched to the XXX phase", which doesn't make the specification easier to read (YMMV). There's such a "switch back to the attribute value state that you were in when were switched into this state" in the tokenisation section which is a bit of a mess: why doesn't the "consume an entity" algorithm deal with the "if nothing is returned" case and the "entity in attribute value" and the "entity data state" just go away? On the other hand, adapting the "global" EOF case in the main phase to always build head and body elements is trivial (at least for the head, since we have a "head element pointer"; it's a bit less easier for the body because of the body/frameset duality, but it could be solved by just looking at the insertion mode: the insertion is never switched back to "before head", "in head" or "after head" ?there only are "process as if we were in the XXX insertion mode" instructions?, so if, at EOF, the insertion mode is one of these three values, it means the tree has no body or frameset element, and we can safely append a body element without attributes to the root node). Proposed wording: <<< End end-of-file token: Generate implied end tags. If there are more than two nodes on the stack of open elements, or if there are two nodes but the second node is not a head node or a body node, this is a parse error. Otherwise, if the parser was originally created as part of the HTML fragment parsing algorithm, and there's more than one element in the stack of open elements, and the second node on the stack of open elements is not a head node or a body node, then this is a parse error. (fragment case) If the head element pointer is null, create an element node with the tag name "head" and append it to the first element in the stack of open elements (the html element). If the insertion mode is one of "before head", "in head", "in head noscript" or "after head", create an element node with the tag name "body" and append it to the first element in the stack of open elements (the html element). Stop parsing. >>> It could also be solved with "act as if a XXX token with the tag name YYY and no attribute had been seen and reprocess the current token" (which would be more accurate given that the argument of not generating a parse error is that head, body and html start and end tags are optional): <<< If the insertion mode is "before head", act as if a start tag token with the tag name "head" and no attribute had been seen and reprocess the current token. Otherwise, if the insertion mode is "in head noscript", act as if an end tag token with the tag name "noscript" had been seen and reprocess the current token. Otherwise, if the insertion mode is "in head" or "after head", act as if a start tag token with the tag name "body" and no attribute had been seen and reprocess the current token. >>> No need to duplicate the whole thing into the fifteen insertion modes with only small variations in four of them. N.B.: there probably needs to be some special handling for the "fragment case", in which one I suppose the head element shouldn't always be implied. -- Thomas Broyer From hsivonen at iki.fi Tue Sep 11 03:34:07 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Tue, 11 Sep 2007 13:34:07 +0300 Subject: [imps] Problem with the tree-construction test cases and implied body In-Reply-To: References: Message-ID: On Sep 11, 2007, at 13:24, Thomas Broyer wrote: > Well, having separate phases and insertion modes allows for switching > from any phase back to the main phase without loosing the insertion > mode (for instance, I implemented the "general CDATA/RCDATA parsing > algorithm" as an additional phase) and without having to deal with > storing the "phase where you were in when were switched to the XXX > phase", which doesn't make the specification easier to read (YMMV). I agree with Anne. I also flattened phases and modes and introduced a variable for remembering the phase the tree builder was in before switching to trailing end. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From hsivonen at iki.fi Tue Sep 11 07:04:22 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Tue, 11 Sep 2007 17:04:22 +0300 Subject: [imps] Validation result format for review Message-ID: <25E94F4F-CE00-468E-9D55-634BF597BE95@iki.fi> I'd like to enable the use of Validator.nu as a RESTful Web service. To this end, I have designed a Validator.nu-native XML response format: http://wiki.whatwg.org/wiki/Validator.nu_XML_Output I'd appreciate comments on the format--especially from people who can foresee wanting to write clients. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From hsivonen at iki.fi Tue Sep 11 12:12:33 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Tue, 11 Sep 2007 22:12:33 +0300 Subject: [imps] Validation result format for review In-Reply-To: References: <25E94F4F-CE00-468E-9D55-634BF597BE95@iki.fi> Message-ID: <1A987CD4-E561-46CF-969C-5120766B1530@iki.fi> On Sep 11, 2007, at 21:22, ryan wrote: > On Sep 11, 2007, at 7:04 AM, Henri Sivonen wrote: > >> I'd like to enable the use of Validator.nu as a RESTful Web service. >> To this end, I have designed a Validator.nu-native XML response >> format: >> http://wiki.whatwg.org/wiki/Validator.nu_XML_Output >> >> I'd appreciate comments on the format--especially from people who can >> foresee wanting to write clients. > > I notice that you're reusing vocabulary from HTML, why not just use > HTML? I already offer class-annotated HTML and XHTML output (append &out=xhtml to the URI to get XHTML). Recently, I added both POSTing content and plain text output (append &out=text to the URI) intended to be dumpable to terminal and then human readable. So far, I've observed that in a Web service context, people (n=2) prefer scraping plain text over scraping HTML or XHTML. This suggests to me that (X)HTML is too crufty for the purpose. (Am I right? Lachy? Philip?) However, the plain text format isn't really designed for safe scraping. I am assuming that minimally crufty custom XML format and a custom JSON format would be best fits for the Web service scenario as they'd be more reliable than scraping the text output ad hoc and less crufty than (X)HTML. Moreover, as a (perhaps silly) design principle, I have decided that the XML format should expose all features to the point that one could theoretically re-create the HTML front-end with the XML service. The ability to support the planned-but-unimplemented elaboration feature in XML is there mostly for completeness as it will be easy to throw in there once the feature exists for (X)HTML. I'm not planning on exposing the HTML elaboration in JSON. And based on IRC comments today, I may abandon this principle as far as the parse tree goes. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From hsivonen at iki.fi Wed Sep 12 06:14:22 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Wed, 12 Sep 2007 16:14:22 +0300 Subject: [imps] Another validation result format for review Message-ID: I'd like to enable the use of Validator.nu as a RESTful Web service. To this end, I have designed a Validator.nu-native JSON response format: http://wiki.whatwg.org/wiki/Validator.nu_JSON_Output I'd appreciate comments on the format--especially from people who can foresee wanting to write clients. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From t.broyer at gmail.com Wed Sep 12 09:10:17 2007 From: t.broyer at gmail.com (Thomas Broyer) Date: Wed, 12 Sep 2007 18:10:17 +0200 Subject: [imps] Another validation result format for review In-Reply-To: References: Message-ID: Hi Henri, 2007/9/12, Henri Sivonen: > I'd like to enable the use of Validator.nu as a RESTful Web service. > To this end, I have designed a Validator.nu-native JSON response format: > http://wiki.whatwg.org/wiki/Validator.nu_JSON_Output I just skimmed through your two docs and the first comment that came to mind is: why is he using types/subtypes rather than a few more types? type:info, no subtype => type:info type:info, subtype:warning => type:warning type:error, no subtype => type:error type:error, subtype:fatal => type:fatal-error type:non-document-error, no subtype => type:non-document-error type:non-document-error, subtype:io => type:io-error type:non-document-error, subtype:schema => type:schema-error type:non-document-error, subtype:internal => type:internal-error My second comment is: it seems the "indeterminate" result is dependent upon a non-document-error message; couldn't they be merged? I.e. there's no type:non-document-error and, if a "non document error" happen, the result:indeterminate has specific properties related to the "non document error" which leaded to this state. Example in JSON: "result": { "type": "indeterminate", "errors": [ { "type": "io", "message": "...", "url": "..." } ], } > I'd appreciate comments on the format--especially from people who can > foresee wanting to write clients. I'm not such a person, but I thought you could nevertheless be interested by my comments ;-) -- Thomas Broyer From hsivonen at iki.fi Wed Sep 12 10:53:50 2007 From: hsivonen at iki.fi (Henri Sivonen) Date: Wed, 12 Sep 2007 20:53:50 +0300 Subject: [imps] Another validation result format for review In-Reply-To: References: Message-ID: <4ED5E49C-E59A-4AF7-A862-0CD12A82BF6C@iki.fi> On Sep 12, 2007, at 19:10, Thomas Broyer wrote: > 2007/9/12, Henri Sivonen: >> I'd like to enable the use of Validator.nu as a RESTful Web service. >> To this end, I have designed a Validator.nu-native JSON response >> format: >> http://wiki.whatwg.org/wiki/Validator.nu_JSON_Output > > I just skimmed through your two docs and the first comment that came > to mind is: why is he using types/subtypes rather than a few more > types? > type:info, no subtype => type:info > type:info, subtype:warning => type:warning > type:error, no subtype => type:error > type:error, subtype:fatal => type:fatal-error > type:non-document-error, no subtype => type:non-document-error > type:non-document-error, subtype:io => type:io-error > type:non-document-error, subtype:schema => type:schema-error > type:non-document-error, subtype:internal => type:internal-error Yeah, that's exactly the current internal flat taxonomy. The reason for the design in forward-compatible extensibility. The assumption is that the three main types will be cast in concrete, but subtypes may be added without breaking client written to the current spec. It looks cruftier in JSON than in XML, though. :-/ > My second comment is: it seems the "indeterminate" result is dependent > upon a non-document-error message; couldn't they be merged? I.e. > there's no type:non-document-error and, if a "non document error" > happen, the result:indeterminate has specific properties related to > the "non document error" which leaded to this state. Example in JSON: > "result": { > "type": "indeterminate", > "errors": [ > { "type": "io", "message": "...", "url": "..." } > ], > } The type of the result is completely redundant. It could be computed by the client from the top-level message types. My initial design didn't have explicit results at all due to this redundancy. I introduced explicit results for two reasons: 1) To carry the same human-readable message that you get from the (X) HTML output. 2) To make it trivial for clients to query the result format for the overall result. I hesitate merging non-document-errors into results, because the results would have to take on locator features (at least url for IO errors) that messages already have. However, if potential users of the Web service interface don't care about my human-readable characterization for the result (they differ for the HTML5 facet and for the generic facet?that's all), I'd be happy to zap the precomputed result altogether from the XML and JSON formats. Do people care about the precomputed result and the associated UI- level message? >> I'd appreciate comments on the format--especially from people who can >> foresee wanting to write clients. > > I'm not such a person, but I thought you could nevertheless be > interested by my comments ;-) I am. Thank you. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From lachlan.hunt at lachy.id.au Tue Sep 18 07:39:32 2007 From: lachlan.hunt at lachy.id.au (Lachlan Hunt) Date: Wed, 19 Sep 2007 00:39:32 +1000 Subject: [imps] Table Inspector Bug Message-ID: <46EFE324.5060504@lachy.id.au> (Resending to the correct mailing list address, sorry for the duplicate James) Hi, There's a strange bug in the Table Inspector [1]. I discovered this bug while inspecting a table for a TV Guide [2]. When there's a comment inside a , the associations between cells and headers gets messed up. Compare the results of the following 2 tables. In table 1, with the comment, using either the HTML4, HTML5 or Experemental algorithm: * Cell A does not get associated with any headers * Cell B is associated with headers X and ROW * Cell C is associated with headers Y and ROW In Table 2, without the comment, the cells are associated as expected. * Cell A is associated with headers X and ROW * Cell B is associated with headers Y and ROW * Cell C is associated with headers Z and ROW In both cases, the Smart Colspan algorithm behaves the same, except that it doesn't associate any cells with the header ROW. Table 1:
X Y Z
ROW A B C
Table 2:
X Y Z
ROW A B C
[1] http://james.html5.org/tables/table_inspector.html [2] http://www.ebroadcast.com.au/tv/static/SydneyNight.html (Note: unfortunately, that page uses so many layout tables and scripts to generate its content, it seems to overload the table inspector and is not possible to analyse the table directly. It works if you serialise the DOM from the browser and remove the noise, such as font and script elements, and irrelevant attributes.) -- Lachlan Hunt http://lachy.id.au/ From westonruter at gmail.com Sun Sep 30 11:19:43 2007 From: westonruter at gmail.com (Weston Ruter) Date: Sun, 30 Sep 2007 11:19:43 -0700 Subject: [imps] XHTML 1.0 + Web Forms 2.0 DTDs Message-ID: I've modified the XHTML 1.0 Strict and Transitional DTDs to include the changes specified by the current draft of Web Forms 2.0. XHTML 1.0 Strict + Web Forms 2.0: http://webforms2.googlecode.com/svn/trunk/DTD/xhtml1-strict-wf2.dtd XHTML 1.0 Transitional + Web Forms 2.0: http://webforms2.googlecode.com/svn/trunk/DTD/xhtml1-transitional-wf2.dtd I'd appreciate any feedback or suggestions you may have. Weston -------------- next part -------------- An HTML attachment was scrubbed... URL: