[whatwg] [MIME Sniffing] Editorial feedback

Adam Barth w3c at adambarth.com
Wed Sep 28 16:23:25 PDT 2011


I've taken all of your suggestions, except as noted below.  Thanks for
your detailed feedback.

Adam


On Mon, Sep 26, 2011 at 2:27 PM, timeless <timeless at gmail.com> wrote:
>> Otherwise, if the octets in s starting at pos match any of the sequences of octets in the first column of the following table, then the user agent MUST follow the steps given in the corresponding cell in the second column of the same row. |
>
> What's the stray `|` character at the end of that doing?
>
> The ToC feels double spaced, is that normal?
>
> Would you mind quoting your attributes in source? Things like
> class=no-num or href=#web-data scare me. It's easier if you just quote
> all attributes :)
>
> Also, I generally recommend `<span ...>x</span> ` over `<span ...>x
> </span>` <- i.e. trailing space outside of span (see toc)
>
>> <p>Many web servers supply incorrect Content-Type header fields with their HTTP
>
> Can you mark up `Content-Type` in something which results in roughly
> "typewriter" font?
>
> s/user agents/User Agents/ as in:
>> responses.  In order to be compatible with these servers, user agents consider
>
>> Without a clear specification of how to "sniff" the media type, each user agent implementor was forced to reverse engineer the behavior of the other user agents and to develop
>
> s/the other/other/ -- there are some UAs who were ignored when the
> sniffing of a given UA was developed :)
>
>> their own algorithm
>
> I'm not sure if `algorithm` here belongs in singular or plural, I got
> distracted :)
>
>> an HTTP response to be interpreted as one media type but some user agents interpret the responses as another media type.
>
> s/responses/response/ (agreement with first part)
>
>> However, if a user agent does interpret a low-privilege media type, such as image/gif, as a high-privilege media type, such as text/html, the user agent has created a privilege escalation vulnerability in the server.
>
> s/, the user agent/, then the user agent/
>
>
>
> I believe abarth has addressed the above.
>
>> This document describes a content sniffing algorithm that carefully balances the compatibility needs of user agent implementors with the security constraints.
>
> `the security constraints` is problematic, I don't think `the`
> references anything
> so either drop `the`, or provide a reference :/
>
>> and metrics collected from implementations deployed to a sizable number of users .
>
> s/ ././

There's actually a reference that goes there.  I just haven't figured
out how to do references yet.

>> (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("MUST", "SHOULD", "MAY", etc)
>
> s/etc/etc./g
>
> "official-type" should probably be given some styling -- preferably
> not the same styling as "Content-Type"
>
>> (Such messages are invalid according to RFC2616.
>
> s/./.)/
>
> The rfcs should be href references of some sort :)

Yeah, I need to crack the references problem at some point.  :)

>> For octets received via HTTP, the Content-Type HTTP header field, if present, indicates the media type. Let the official-type be the media type indicted by the HTTP Content-Type header field, if present. If the Content-Type header field is absent or if its value cannot be interpreted as a media type (e.g. because its value doesn't contain a U+002F SOLIDUS ('/') character), then there is no official-type. (Such messages are invalid according to RFC2616.
>
>> If an HTTP response contains multiple Content-Type header fields, the User Agent MUST use the textually last Content-Type header field to the official-type. For example, if the last Content-Type header field contains the value "foo", then there is no official media type because "foo" cannot be interpreted as a media type (even if the HTTP response contains another Content-Type header field that could be interpreted as a media type).
>
> The for example part here applies to the previous paragraph, the
> sentence needs to be moved to the paragraph before the instruction for
> multiple header fields.

It's an example that combines both rules.

>> FTP RFC0959
>
> Is there a reason for the leading 0?
>
>> Comparisons between media types, as defined by MIME specifications, are done in an ASCII case-insensitive manner. [RFC2046]
>
> You need to somehow note that this is merely a note about mime
> equivalence and doesn't relate to how the spec works.

I'm not sure I understand.  It's in green and labeled as a "note".

>> If the official-type ends in "+xml", or if it is either "text/xml" or "application/xml", then let the sniffed-type be the official-type and abort these steps.
>
> Please mark up `sniffed-type` and `official-type`
>
>> If the official-type is an image type supported by the User Agent (e.g., "image/png", "image/gif", "image/jpeg", etc), then jump to the "images" section below.
>
> s/etc//
>
>> If none of the first n octets are binary data octets then let the sniffed-type be "text/plain" and abort these steps.
>> Binary Data Byte Ranges
>
> You don't actually define a `binary data octet` as any item within the
> ranges defined in the `binary data byte ranges`.
>
>> If the first octets match one of the octet sequences in the "pattern" column of the table in the "unknown type" section below, ignoring any rows whose cell in the "security" column says "scriptable" (or "n/a"), then let the sniffed-type be the type given in the corresponding cell in the "sniffed type" column on that row and abort these steps.
>
> If you could make `"unknown type" section` a link to the section, that
> would be helpful.
>
>> For each row in the table below:
>> If the row has no "WS" octets:
>
> I know that "WS" appears in the table below, but it hasn't been
> defined yet, and I don't want to guess what it means (whitespace?) --
> I guessed wrong for the other one.
>
>> If the row has a "WS" octet or a "_>" octet:
>
>> "WS" means "whitespace", and allows insignificant whitespace to be skipped when sniffing for a type signature.
>
> Oh, so that's where you hid the definition -- way too late :)
>
>> "_>" means "space-or-bracket", and allows HTML tag names to terminate with either a space or a greater than sign.
>
> Oh _ doesn't mean underscore
>
> Please put those definitions before their use, not way below their use :(

I'm tempted to just rename them to be less semantic.  They're just
symbols that don't mean anything, really.

>> If the octets of the masked-data matches the given pattern octets exactly, then let the sniffed-type be the type given in the cell of the third column in that row and abort these steps.
>
> s/matches/match/
>
>> LOOP: If index-stream points beyond the end of the octet stream, then this row doesn't match and skip this row.
>
> Please style `LOOP`
>
>> If the index-pattern-th octet of the pattern is a normal hexadecimal octet and not a "WS" octet or a "_>" octet:
>
> s/or a/nor a/
> s/not/neither/
>
>
>> If the index-stream-th octet of the stream is one of 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space), then increment only the index-stream to the next octet in the octet stream.
>
> If you could style the 0xXX items in something <tt>-ish, that'd be appreciated.
> ... And if you could style the names (ASCII TAB, etc.) in something,
> that'd also be appreciated.

That's a lot of editing!  I'm not sure that buys us much.

>> If the first n octets match the signature for MP4 (as define in ), then let the sniffed-type be video/mp4 and abort these steps.
>
> s/define/defined/
>
> -- The markup you're using failed to generate a visible-reference,
> could you get the tool to generate an XXX when it fails? :)
>
>> FF FF FF FF FF FF WS 3C 3f 78 6d 6c text/xml Scriptable <?xml (Note the case sensitivity and lack of trailing _>)
>
> s/sensitivity/sensitivity [mask = FF instead of DF]/
>
>> A JPEG SOI marker followed by a octet of another marker.
>
> s/a octet/an octet/
>
> -- the table doesn't currently handle .SWF; in the past, that has been a problem
> http://www.digitalpreservation.gov/formats/fdd/fdd000130.shtml

That is intentional.  Sniffing SWF is bad times.

>> If n is less than 4, then the sequence does not match the signature for MP4 and abort these steps.
>
> `and` doesn't work; s/ and/;/ ?
>
> In all previous cases, the form was `let foo and abort these steps`;
> here it's `then <statement of truth> and`.
>
> The fix is probably to move to "return TRUTH/FALSE value and abort
> these steps" (or let state-determined-truth-value-be TRUTH/FASLE value
> and ...).

Hum...  I see the problem.

>> For each I from 2 to box-size/4 - 1 (inclusive):
>
> If you could put `box-size/4 - 1` into some markup to indicate that
> it's a math section, that'd be helpful.

I put it in <code>.  I'm not sure that's the prettiest, but we can iterate.

>> If octets 4*i through 4*i + 2 (inclusive) of the sequence are 0x6D 0x70 0x34 (the ASCII string "mp4"), then the sequence does match the signature for MP4 and abort these steps.
>
> And here for `4*i` and `4*i + 2`
>
> I think you need s/If octets/If any octets/, otherwise, it's ambiguous
> between `any` and `all`.
>
>> 7 Images
> ...
>>     Otherwise, let the sniffed-type be the official-type and abort these steps.
>
> I'd rather otherwise be step 3 instead of part of the bulleted list
> inside step 2

:)

>> If the octets with positions pos to pos+2 in s are exactly equal to 0x2D, 0x2D, 0x3E respectively (ASCII for "-->"), then increase pos by 3 and jump back to the previous step (the step labeled loop start) in the overall algorithm in this section.
>
> `loop start` should be a link to the LOOP label and preferably have
> the same case as the LOOP label.
>
>> Return to step 2 in these substeps.
>
> It'd be nice if this was a link to an anchor in the right part of the steps.
>
>> If RDF-flag is 1 and RSS-flag is 1, then let the sniffed-type be "application/rss+xml" and abort these steps.
>
> s/and/or/ ??

and is correct.  I've made it strong.  It's got to have both qualities
before we'll change the type.

Adam


More information about the whatwg mailing list