From andy at entai.co.uk Fri Jul 11 14:27:48 2008 From: andy at entai.co.uk (Andrew Sidwell) Date: Fri, 11 Jul 2008 22:27:48 +0100 Subject: [imps] Extending the tree builder test format to cover test annotations Message-ID: <4877D054.1060705@entai.co.uk> I've been writing some tree construction tests to get better coverage on a C HTML5 parser, and since I've been writing them fairly methodically, I have been annotating each one with what it intends to test. I've done this like: #data #errors #comments This tests that comments in "after after body" are appended to the Document object. #document | | | | | So I propose adding a #comments section to tests that should just be skipped over by tools unless the test fails. Opinions? Anything that lets me annotate tests would be helpeful, I'm not too fussed on syntax. a. From ryan at theryanking.com Fri Jul 11 18:06:30 2008 From: ryan at theryanking.com (Ryan King) Date: Fri, 11 Jul 2008 18:06:30 -0700 Subject: [imps] content type sniffing - unknown type Message-ID: <00256E09-A249-44A4-A629-A69F54453D31@theryanking.com> I'm working on a content type sniffing implementation based on the current spec, that will eventually make it into html5lib (its part of a separate project for now). Anyway, in "2.7.4 Content-Type sniffing: unknown type", i think there's a few things flipped around. Where is says "Examine the indexstreamth byte of the byte stream as follows:", i think it should actually be referring to the to the indexpatternth byte of the pattern. The I understand the algorithm is like this: walk through the pattern if we're at a WS byte consume all the whitespace else do the 'and' operation with the mask and test it against pattern[indexpattern] if we made it through without a mis-match, return the given type. Implementing it this way has yielded the expected results (ie, the examples given in the comments work). -ryan From ryan at theryanking.com Fri Jul 11 18:16:55 2008 From: ryan at theryanking.com (Ryan King) Date: Fri, 11 Jul 2008 18:16:55 -0700 Subject: [imps] content type sniffing - stream length Message-ID: "2.7.4 Content-Type sniffing: unknown type" says: 2. If pattern length is smaller than stream length then skip this row. ..which doesn't seem to make sense, since the streams will likely be 512 bytes long, and the patterns are a handful of bytes each. I suspect that the intent was the opposite: 2. If stream length is smaller than pattern length then skip this row. -ryan From ian at hixie.ch Fri Jul 11 19:48:36 2008 From: ian at hixie.ch (Ian Hickson) Date: Sat, 12 Jul 2008 02:48:36 +0000 (UTC) Subject: [imps] content type sniffing - unknown type In-Reply-To: <00256E09-A249-44A4-A629-A69F54453D31@theryanking.com> References: <00256E09-A249-44A4-A629-A69F54453D31@theryanking.com> Message-ID: On Fri, 11 Jul 2008, Ryan King wrote: > > I'm working on a content type sniffing implementation based on the > current spec, that will eventually make it into html5lib (its part of a > separate project for now). > > Anyway, in "2.7.4 Content-Type sniffing: unknown type", i think there's > a few things flipped around. Where is says "Examine the > indexstreamth byte of the byte stream as follows:", i think > it should actually be referring to the to the indexpatternth > byte of the pattern. Fixed. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' From ian at hixie.ch Fri Jul 11 19:50:13 2008 From: ian at hixie.ch (Ian Hickson) Date: Sat, 12 Jul 2008 02:50:13 +0000 (UTC) Subject: [imps] content type sniffing - stream length In-Reply-To: References: Message-ID: On Fri, 11 Jul 2008, Ryan King wrote: > > "2.7.4 Content-Type sniffing: unknown type" says: > > 2. If pattern length is smaller than stream length then skip this row. > > ..which doesn't seem to make sense, since the streams will likely be 512 > bytes long, and the patterns are a handful of bytes each. I suspect that > the intent was the opposite: > > 2. If stream length is smaller than pattern length then skip this row. Fixed. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' From andy at entai.co.uk Fri Jul 11 14:27:48 2008 From: andy at entai.co.uk (Andrew Sidwell) Date: Fri, 11 Jul 2008 22:27:48 +0100 Subject: [imps] Extending the tree builder test format to cover test annotations Message-ID: <4877D054.1060705@entai.co.uk> I've been writing some tree construction tests to get better coverage on a C HTML5 parser, and since I've been writing them fairly methodically, I have been annotating each one with what it intends to test. I've done this like: #data #errors #comments This tests that comments in "after after body" are appended to the Document object. #document | | | | | So I propose adding a #comments section to tests that should just be skipped over by tools unless the test fails. Opinions? Anything that lets me annotate tests would be helpeful, I'm not too fussed on syntax. a. From ryan at theryanking.com Fri Jul 11 18:06:30 2008 From: ryan at theryanking.com (Ryan King) Date: Fri, 11 Jul 2008 18:06:30 -0700 Subject: [imps] content type sniffing - unknown type Message-ID: <00256E09-A249-44A4-A629-A69F54453D31@theryanking.com> I'm working on a content type sniffing implementation based on the current spec, that will eventually make it into html5lib (its part of a separate project for now). Anyway, in "2.7.4 Content-Type sniffing: unknown type", i think there's a few things flipped around. Where is says "Examine the indexstreamth byte of the byte stream as follows:", i think it should actually be referring to the to the indexpatternth byte of the pattern. The I understand the algorithm is like this: walk through the pattern if we're at a WS byte consume all the whitespace else do the 'and' operation with the mask and test it against pattern[indexpattern] if we made it through without a mis-match, return the given type. Implementing it this way has yielded the expected results (ie, the examples given in the comments work). -ryan From ryan at theryanking.com Fri Jul 11 18:16:55 2008 From: ryan at theryanking.com (Ryan King) Date: Fri, 11 Jul 2008 18:16:55 -0700 Subject: [imps] content type sniffing - stream length Message-ID: "2.7.4 Content-Type sniffing: unknown type" says: 2. If pattern length is smaller than stream length then skip this row. ..which doesn't seem to make sense, since the streams will likely be 512 bytes long, and the patterns are a handful of bytes each. I suspect that the intent was the opposite: 2. If stream length is smaller than pattern length then skip this row. -ryan From ian at hixie.ch Fri Jul 11 19:48:36 2008 From: ian at hixie.ch (Ian Hickson) Date: Sat, 12 Jul 2008 02:48:36 +0000 (UTC) Subject: [imps] content type sniffing - unknown type In-Reply-To: <00256E09-A249-44A4-A629-A69F54453D31@theryanking.com> References: <00256E09-A249-44A4-A629-A69F54453D31@theryanking.com> Message-ID: On Fri, 11 Jul 2008, Ryan King wrote: > > I'm working on a content type sniffing implementation based on the > current spec, that will eventually make it into html5lib (its part of a > separate project for now). > > Anyway, in "2.7.4 Content-Type sniffing: unknown type", i think there's > a few things flipped around. Where is says "Examine the > indexstreamth byte of the byte stream as follows:", i think > it should actually be referring to the to the indexpatternth > byte of the pattern. Fixed. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' From ian at hixie.ch Fri Jul 11 19:50:13 2008 From: ian at hixie.ch (Ian Hickson) Date: Sat, 12 Jul 2008 02:50:13 +0000 (UTC) Subject: [imps] content type sniffing - stream length In-Reply-To: References: Message-ID: On Fri, 11 Jul 2008, Ryan King wrote: > > "2.7.4 Content-Type sniffing: unknown type" says: > > 2. If pattern length is smaller than stream length then skip this row. > > ..which doesn't seem to make sense, since the streams will likely be 512 > bytes long, and the patterns are a handful of bytes each. I suspect that > the intent was the opposite: > > 2. If stream length is smaller than pattern length then skip this row. Fixed. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' From andy at entai.co.uk Fri Jul 11 14:27:48 2008 From: andy at entai.co.uk (Andrew Sidwell) Date: Fri, 11 Jul 2008 22:27:48 +0100 Subject: [imps] Extending the tree builder test format to cover test annotations Message-ID: <4877D054.1060705@entai.co.uk> I've been writing some tree construction tests to get better coverage on a C HTML5 parser, and since I've been writing them fairly methodically, I have been annotating each one with what it intends to test. I've done this like: #data #errors #comments This tests that comments in "after after body" are appended to the Document object. #document | | | | | So I propose adding a #comments section to tests that should just be skipped over by tools unless the test fails. Opinions? Anything that lets me annotate tests would be helpeful, I'm not too fussed on syntax. a. From ryan at theryanking.com Fri Jul 11 18:06:30 2008 From: ryan at theryanking.com (Ryan King) Date: Fri, 11 Jul 2008 18:06:30 -0700 Subject: [imps] content type sniffing - unknown type Message-ID: <00256E09-A249-44A4-A629-A69F54453D31@theryanking.com> I'm working on a content type sniffing implementation based on the current spec, that will eventually make it into html5lib (its part of a separate project for now). Anyway, in "2.7.4 Content-Type sniffing: unknown type", i think there's a few things flipped around. Where is says "Examine the indexstreamth byte of the byte stream as follows:", i think it should actually be referring to the to the indexpatternth byte of the pattern. The I understand the algorithm is like this: walk through the pattern if we're at a WS byte consume all the whitespace else do the 'and' operation with the mask and test it against pattern[indexpattern] if we made it through without a mis-match, return the given type. Implementing it this way has yielded the expected results (ie, the examples given in the comments work). -ryan From ryan at theryanking.com Fri Jul 11 18:16:55 2008 From: ryan at theryanking.com (Ryan King) Date: Fri, 11 Jul 2008 18:16:55 -0700 Subject: [imps] content type sniffing - stream length Message-ID: "2.7.4 Content-Type sniffing: unknown type" says: 2. If pattern length is smaller than stream length then skip this row. ..which doesn't seem to make sense, since the streams will likely be 512 bytes long, and the patterns are a handful of bytes each. I suspect that the intent was the opposite: 2. If stream length is smaller than pattern length then skip this row. -ryan From ian at hixie.ch Fri Jul 11 19:48:36 2008 From: ian at hixie.ch (Ian Hickson) Date: Sat, 12 Jul 2008 02:48:36 +0000 (UTC) Subject: [imps] content type sniffing - unknown type In-Reply-To: <00256E09-A249-44A4-A629-A69F54453D31@theryanking.com> References: <00256E09-A249-44A4-A629-A69F54453D31@theryanking.com> Message-ID: On Fri, 11 Jul 2008, Ryan King wrote: > > I'm working on a content type sniffing implementation based on the > current spec, that will eventually make it into html5lib (its part of a > separate project for now). > > Anyway, in "2.7.4 Content-Type sniffing: unknown type", i think there's > a few things flipped around. Where is says "Examine the > indexstreamth byte of the byte stream as follows:", i think > it should actually be referring to the to the indexpatternth > byte of the pattern. Fixed. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' From ian at hixie.ch Fri Jul 11 19:50:13 2008 From: ian at hixie.ch (Ian Hickson) Date: Sat, 12 Jul 2008 02:50:13 +0000 (UTC) Subject: [imps] content type sniffing - stream length In-Reply-To: References: Message-ID: On Fri, 11 Jul 2008, Ryan King wrote: > > "2.7.4 Content-Type sniffing: unknown type" says: > > 2. If pattern length is smaller than stream length then skip this row. > > ..which doesn't seem to make sense, since the streams will likely be 512 > bytes long, and the patterns are a handful of bytes each. I suspect that > the intent was the opposite: > > 2. If stream length is smaller than pattern length then skip this row. Fixed. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'