[whatwg] [MIME Sniffing] Editorial feedback

Mon Sep 26 14:27:17 PDT 2011

> Otherwise, if the octets in s starting at pos match any of the sequences of octets in the first column of the following table, then the user agent MUST follow the steps given in the corresponding cell in the second column of the same row. |

What's the stray `|` character at the end of that doing?

The ToC feels double spaced, is that normal?

Would you mind quoting your attributes in source? Things like
class=no-num or href=#web-data scare me. It's easier if you just quote
all attributes :)

Also, I generally recommend `<span ...>x</span> ` over `<span ...>x
</span>` <- i.e. trailing space outside of span (see toc)

> <p>Many web servers supply incorrect Content-Type header fields with their HTTP

Can you mark up `Content-Type` in something which results in roughly
"typewriter" font?

s/user agents/User Agents/ as in:
> responses.  In order to be compatible with these servers, user agents consider

> Without a clear specification of how to "sniff" the media type, each user agent implementor was forced to reverse engineer the behavior of the other user agents and to develop

s/the other/other/ -- there are some UAs who were ignored when the
sniffing of a given UA was developed :)

> their own algorithm

I'm not sure if `algorithm` here belongs in singular or plural, I got
distracted :)

> an HTTP response to be interpreted as one media type but some user agents interpret the responses as another media type.

s/responses/response/ (agreement with first part)

> However, if a user agent does interpret a low-privilege media type, such as image/gif, as a high-privilege media type, such as text/html, the user agent has created a privilege escalation vulnerability in the server.

s/, the user agent/, then the user agent/

I believe abarth has addressed the above.

> This document describes a content sniffing algorithm that carefully balances the compatibility needs of user agent implementors with the security constraints.

`the security constraints` is problematic, I don't think `the`
references anything
so either drop `the`, or provide a reference :/

> and metrics collected from implementations deployed to a sizable number of users .

s/ ././

> (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("MUST", "SHOULD", "MAY", etc)

s/etc/etc./g

"official-type" should probably be given some styling -- preferably
not the same styling as "Content-Type"

> (Such messages are invalid according to RFC2616.

s/./.)/

The rfcs should be href references of some sort :)

> For octets received via HTTP, the Content-Type HTTP header field, if present, indicates the media type. Let the official-type be the media type indicted by the HTTP Content-Type header field, if present. If the Content-Type header field is absent or if its value cannot be interpreted as a media type (e.g. because its value doesn't contain a U+002F SOLIDUS ('/') character), then there is no official-type. (Such messages are invalid according to RFC2616.

> If an HTTP response contains multiple Content-Type header fields, the User Agent MUST use the textually last Content-Type header field to the official-type. For example, if the last Content-Type header field contains the value "foo", then there is no official media type because "foo" cannot be interpreted as a media type (even if the HTTP response contains another Content-Type header field that could be interpreted as a media type).

The for example part here applies to the previous paragraph, the
sentence needs to be moved to the paragraph before the instruction for
multiple header fields.

> FTP RFC0959

Is there a reason for the leading 0?

> Comparisons between media types, as defined by MIME specifications, are done in an ASCII case-insensitive manner. [RFC2046]

You need to somehow note that this is merely a note about mime
equivalence and doesn't relate to how the spec works.

> If the official-type ends in "+xml", or if it is either "text/xml" or "application/xml", then let the sniffed-type be the official-type and abort these steps.

Please mark up `sniffed-type` and `official-type`

> If the official-type is an image type supported by the User Agent (e.g., "image/png", "image/gif", "image/jpeg", etc), then jump to the "images" section below.

s/etc//

> If none of the first n octets are binary data octets then let the sniffed-type be "text/plain" and abort these steps.
> Binary Data Byte Ranges

You don't actually define a `binary data octet` as any item within the
ranges defined in the `binary data byte ranges`.

> If the first octets match one of the octet sequences in the "pattern" column of the table in the "unknown type" section below, ignoring any rows whose cell in the "security" column says "scriptable" (or "n/a"), then let the sniffed-type be the type given in the corresponding cell in the "sniffed type" column on that row and abort these steps.

If you could make `"unknown type" section` a link to the section, that
would be helpful.

> For each row in the table below:
> If the row has no "WS" octets:

I know that "WS" appears in the table below, but it hasn't been
defined yet, and I don't want to guess what it means (whitespace?) --
I guessed wrong for the other one.

> If the row has a "WS" octet or a "_>" octet:

> "WS" means "whitespace", and allows insignificant whitespace to be skipped when sniffing for a type signature.

Oh, so that's where you hid the definition -- way too late :)

> "_>" means "space-or-bracket", and allows HTML tag names to terminate with either a space or a greater than sign.

Oh _ doesn't mean underscore

Please put those definitions before their use, not way below their use :(

> If the octets of the masked-data matches the given pattern octets exactly, then let the sniffed-type be the type given in the cell of the third column in that row and abort these steps.

s/matches/match/

> LOOP: If index-stream points beyond the end of the octet stream, then this row doesn't match and skip this row.

Please style `LOOP`

> If the index-pattern-th octet of the pattern is a normal hexadecimal octet and not a "WS" octet or a "_>" octet:

s/or a/nor a/
s/not/neither/

> If the index-stream-th octet of the stream is one of 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space), then increment only the index-stream to the next octet in the octet stream.

If you could style the 0xXX items in something <tt>-ish, that'd be appreciated.
... And if you could style the names (ASCII TAB, etc.) in something,
that'd also be appreciated.

> If the first n octets match the signature for MP4 (as define in ), then let the sniffed-type be video/mp4 and abort these steps.

s/define/defined/

-- The markup you're using failed to generate a visible-reference,
could you get the tool to generate an XXX when it fails? :)

> FF FF FF FF FF FF WS 3C 3f 78 6d 6c text/xml Scriptable <?xml (Note the case sensitivity and lack of trailing _>)

s/sensitivity/sensitivity [mask = FF instead of DF]/

> A JPEG SOI marker followed by a octet of another marker.

s/a octet/an octet/

-- the table doesn't currently handle .SWF; in the past, that has been a problem
http://www.digitalpreservation.gov/formats/fdd/fdd000130.shtml

> If n is less than 4, then the sequence does not match the signature for MP4 and abort these steps.

`and` doesn't work; s/ and/;/ ?

In all previous cases, the form was `let foo and abort these steps`;
here it's `then <statement of truth> and`.

The fix is probably to move to "return TRUTH/FALSE value and abort
these steps" (or let state-determined-truth-value-be TRUTH/FASLE value
and ...).

> For each I from 2 to box-size/4 - 1 (inclusive):

If you could put `box-size/4 - 1` into some markup to indicate that
it's a math section, that'd be helpful.

> If octets 4*i through 4*i + 2 (inclusive) of the sequence are 0x6D 0x70 0x34 (the ASCII string "mp4"), then the sequence does match the signature for MP4 and abort these steps.

And here for `4*i` and `4*i + 2`

I think you need s/If octets/If any octets/, otherwise, it's ambiguous
between `any` and `all`.

> 7 Images
...
>     Otherwise, let the sniffed-type be the official-type and abort these steps.

I'd rather otherwise be step 3 instead of part of the bulleted list
inside step 2

> If the octets with positions pos to pos+2 in s are exactly equal to 0x2D, 0x2D, 0x3E respectively (ASCII for "-->"), then increase pos by 3 and jump back to the previous step (the step labeled loop start) in the overall algorithm in this section.

`loop start` should be a link to the LOOP label and preferably have
the same case as the LOOP label.

> Return to step 2 in these substeps.

It'd be nice if this was a link to an anchor in the right part of the steps.

> If RDF-flag is 1 and RSS-flag is 1, then let the sniffed-type be "application/rss+xml" and abort these steps.

s/and/or/ ??