[whatwg] [html5] Pre-Last Call Comments

Wed Jun 3 08:41:23 PDT 2009

2009/6/3 Ian Hickson <ian at hixie.ch>:
> On Sun, 5 Apr 2009, Giovanni Campagna wrote:
>>

(skipping all "done")

>> - From section 2.4.2 I don't understand if boolean attributes with
>> invalid values represent "true" or "false". In addition, I don't
>> understand if an empty value is false (as in XHTML1.0) or true (as in
>> HTML4, because of the minimized syntax).
>> >From my experience, I expect that the empty string (which is
>> equivalent to not specify the attribute at all) is false, and any
>> other value is true.
>
> The spec says "The presence of a boolean attribute on an element
> represents the true value, and the absence of the attribute represents the
> false value"; is that not clear?

Now it is clear, thanks.

>
>> - In 2.4.4.3 (and maybe in other places) I would prefer [A|E]BNF
>> instead of the prose description of a floating point number.
>
> It's not obvious to me that this would be any clearer.

Compare:
Optionally, a U+002D HYPHEN-MINUS ("-") character.
A series of one or more characters in the range U+0030 DIGIT ZERO (0)
to U+0039 DIGIT NINE (9).
Optionally:
A single U+002E FULL STOP (".") character.
A series of one or more characters in the range U+0030 DIGIT ZERO (0)
to U+0039 DIGIT NINE (9).
Optionally:
Either a U+0065 LATIN SMALL LETTER E character or a U+0045 LATIN
CAPITAL LETTER E character.
Optionally, a U+002D HYPHEN-MINUS ("-") character or U+002B PLUS SIGN
("+") character.
A series of one or more characters in the range U+0030 DIGIT ZERO (0)
to U+0039 DIGIT NINE (9).

to: "-"? DIGIT+ [ "." DIGIT +]? [ ["e" | "E"] ["-" | "+"]? DIGIT+]

I find the latter definitely easier to understand.

>
>> I'm also not sure that the normative algorithm is needed.
>
> You mean for parsing? How else would you know how to parse it? In some of
> the cases the algorithms don't accept any errorneous content at all, but
> in many cases we have to define how you handle bogus data, and I don't see
> how to do that any other way.

Ok, then.

>> I've also searched IEEE, IETF, ECMA, ISO and ANSI for another normative
>> version of the syntax and processing, but I've found none. If you think
>> that it is important to have it specified completely, you may submit an
>> ID, so future technologies won't need to rewrite it again.
>
> I'm not sure to what you refer. I certainly wouldn't want anyone reusing
> most of these definitions; many are the result of years of bugs causing
> legacy content to depend on weird quirks.

Signed/unsigned integers and floats are concepts used by many
technologies, which use quite the same syntax (expressed above). One
RFC for everyone avoids having to rewrite syntax and processing any
time you use a float, but I agree that it may require more work.

>
>> Also, don't rely on styles alone, use different words for identifiers
>> and prose. This includes also the Note following, where no styles are
>> applied and it is difficult to understand that "year year" is not a
>> typo but rather is the year numbered "year".
>
> I made the note use "y", but in general I find using anything but "year"
> here gets really ugly.

You have this problem still in section 2.4.5 Dates and Times. You can
use "year" as an identifier, but repeating it along with the noun used
in general sense makes it hard to understand, especially if you don't
have the visual italic clue (for example, you turned off stylesheets
or are using an non-visual media)

>> - Can't be simply referenced CSS3 Color in 2.4.6?
>> This way, implementors could have body[bgcolor] { background-color:
>> attr(bgcolor,color,white); } in the default CSS instead of using HTML5
>> specific rules.
>
> The rules for parsing a legacy color value are very constrained and don't
> match CSS, no.

Outside rgb/hsl and their alpha counterparts, is there anything that
CSS3 Color can't do compatibly with HTML colours?
If no, using CSS Color completely would allow more features and
simpler implementations.
I see HTML5 currently supports also CSS2 and SVG colours.

>
>> - In 2.4.9 a valid hash reference must be equal to an ID, name is
>> supported only for backward compatibility.
>
> No, <map> uses name="".

Is that even in *conforming documents*? I hope no, as name was
deprecated in all versions of XHTML, and was not even supposed to work
in XML.

>
>> - Section 2.6 is superfluous: handling of application cache is specified
>> in the appropriate section, handling of HTTP requests and caches is
>> defined in RFC2616, handling of cookie is defined in the appropriate RFC
>> (I don't remember the number), handling of about:blank is in the
>> proposed about-uri-scheme ID. In addition, serialized queue-based
>> handling of resources should not be mandated by the HTML5 specification
>> (can't UAs be multi-threaded?)
>
> Section 2.6 (fetching) is needed to define how the fetching algorithm
> (HTTP, etc) fit into the event loop mechanism and the storage mutex.

IMHO, it applies only to one class of html agents, namely web browser
(which require script serializability) and therefore should be moved
later in the document.

>> - Rewriting 2.6.1 without the HTTP word is definitely better. Browsers
>> are not required to support HTTP, AFAIK. You can write "a GET method"
>> (because GET is anyway an English word), "a response code" (most
>> protocols have response codes) and "metadata" (instead of headers, that
>> SMTP, POP, FTP don't support)
>
> I think that would be far less clear.

What about "a GET request", "a server response code", "response
metadata" or "response headers".

>
>> - 2.6.2 should be implied by the HTTP-over-TLS RFC
>
> Apparently implying it isn't good enough, given current implementations.

I've never known so (always seen certificate errors as hard errors,
that prevented navigation)

>> - Still in section 2.7.1, why the algorithm is a violation of RFC2616?
>> Because it is case insensitive? Because it allows spaces? Because it
>> does not imply ISO-8859-1 if no charset is explicit? Because it does not
>> imply ASCII for text/* mime types?
>
> Because it means not blindly honouring Content-Type.

Finding a charset requires knowing the Content-Type?

>
>> - Why don't you add "<?xml" to the sniffing table?
>
> I'll leave this up to Adam.

Ok

>
>> - In section 2.8, "x-x-big5" is not a different encoding than "big5",
>> it rather seems an alias (and as such should be submitted to IANA)
>
> Agreed; if anyone would like to volunteer to do this that would be very
> helpful.

Ok

>
>> - Later in the same section, I don't understand why you don't support
>> those encodings, if the encoding declaration is explicit in the protocol
>> layer or is allowed by a different specification. For example, XML
>> allows EBDIC based encodings.
>
> UTF-32 is widely misimplemented. EBCDIC isn't widely supported. Generally
> speaking we're trying to reduce encoding proliferation.

Ok

>
>> In addition, I don't understand why supporting UTF-32 or EBDIC means a
>> change to the algorithm, that are defined in terms of Unicode code
>> points (very similar to UTF-32 characters)
>
> Supporting UTF-32 or EBCDIC would mean changes to the character encoding
> sniffing algorithms.

Ok then. I see that EBDIC is widely unused, and I agree that UTF-32
requires a lot of code, because of endianess.

>
>> - In section 2.9.1, I completely don't understand the part about DOM
>> attributes of type HTMLElement, especially the subpart about setting.
>
> I'm not sure how to clarify it... What don't you understand? Or rather,
> what _do_ you understand?

Honestly, I cannot remember what was the problem (maybe you changed it
in the meantime). Now I understand that it is a content attribute
containing an ID, and a DOM attribute containing (a reference to) the
corrisponding HTML element.

>> - In section 2.9.5, instead of define DOMStringMap only for EcmaScript,
>> use explicit indexing operation in the IDL, add them the [NameGetter] /
>> [NameSetter] / [NameDeleter] attributes, and add a [NoIndexingOperation]
>> to the whole interface.
>
> Why?

Firstly, now it is section 2.9.6. Secondly, if you write something like
[NoIndexingOperation]
interface DOMStringMap {
[NameCreator][NameSetter] void setString(in DOMString name, in DOMString value);
[NameDeleter] void removeString(in DOMString name);
[NameGetter] DOMString getString(in DOMString name);
}
and then you say that the EcmaScript binding uses the standard [[Get]]
and [[Put]] operations, you get the current implementation but you
allow DOMStringMap to be used outside ES.

>
>> - In section 2.9.6 you discourage use of hasFeature. Firstly, if an
>> implementation says true and it is not compliant, it is not a spec bug,
>> it is an implementation bug.
>
> This isn't much of comfort to authors.

It is. Implementation bugs can be reported, hopefully fixed, worked
around and even avoided if you ask the user to change browser. Spec
bugs instead are a problem for everyone.

>
>> Secondly, to allow implementation granularity, you could define more
>> features (for example HTML 5.0, XHTML 5.0, HTMLCanvas2D 5.0, HTMLSection
>> 5.0, HTMLDatagrid 5.0, HTMLMediaObject 5.0 etc.)
>
> Why not rely on the features themselves instead? The whole hasFeature()
> idea is deeply flawed, IMHO.

You can just check once for hasFeature("XHTML","5.0") rather than
checking every time if "createElementNS" exists and if calling it with
the XHTML namespace result in the correct interface.

>
>> - In section 3.2.1, seems that interfaces other than Document and
>> HTMLDocument should be exposed by the object only if different
>> namespaces are found in the document. This is not true: SVG UAs for
>> example must always expose the SVGDocument interface on Document.
>
> What SVG requires is defined by SVG; the spec here is just saying that
> HTML5 isn't attempting to push the other specs away.

The spec says: "Document objects must also implement the
document-level interface of any other namespaces found in the document
that the UA supports."
This should be made clearer, IMHO.

>> - document.lastModified should return null or the empty string if the
>> last modification date is not known (what if the document was really
>> last modified on January 1st 1970?)
>
> This was changed to match implementations.

Ok

>> and I don't understand why CSS1Compat vs BackCompat if the quirks are
>> limited to parsing
>
> The names were invented by Microsoft long ago.

Ok then.

>
>> - Why do we have both document.charset and document.characterSet?
>
> I'd rather have neither, but implementations have both.

Ok.

>
>> - In section 3.2.4, about title in the author-only text, remember that
>> Document always implements SVGDocument and HTMLDocument.
>
> Yes?

I meant that there is always an SVGDocument interface at same point of
class hierarchy. But it is clearly specified "in SVG documents". My
fault.

>
>> - What on earth does "incumbent" mean? (about document.body)
>
> It's the one currently holding the office of "the body element", as
> opposed to the one that's about to replace it.

Do you mind changing it to "the current body element"?

>
>> - Is it necessary to have that mess of property indexing on HTMLDocument
>> (that, by the way, may be implemented along with other language specific
>> interfaces)? Just drop them at all: existing browser will continue to
>> implement it, but new browser won't, and neither new sites will use it.
>
> The idea is to define what it takes to write a browser that supports
> legacy pages, which is more or less what browsers do now, so
> unfortunately, we can't drop a feature just because we don't like it.

At least, can you push it down with document.all and document.bgColor?

>
>> - Named elements is defined twice: once before the algorithm, and once
>> after
>
> I can only find one definition for HTMLDocument; can you elaborate?

"The names of the supported named properties at any moment consist of
the values of the name content attributes of all the applet, embed,
form, iframe, img, and fallback-free object elements in the Document
that have name content attributes, and the values of the id content
attributes of all the applet and fallback-free object elements in the
Document that have id content attributes, and the values of the id
content attributes of all the img elements in the Document that have
both name content attributes and id content attributes." and
"Named elements with the name name, for the purposes of the above
algorithm, are those that are either:
applet, embed, form, iframe, img, or fallback-free object elements
that have a name content attribute whose value is name, or
applet or fallback-free object elements that have an id content
attribute whose value is name, or
img elements that have an id content attribute whose value is name,
and that have a name content attribute present also."

>
>> - In section 3.3.3.7, instead of defining the syntax of style
>> attributes, reference <http://www.w3.org/TR/css-style-attr>
>
> That draft is not actively maintained, so it's not clear that it is a good
> draft to reference yet.

This should be reported to the CSSWG, not repeated in a different specification.

>
>> - In section 4.2.5.3, a document may have a default language even if it
>> doesn't have a content-language http-equiv, if it has a Content-Language
>> HTTP header.
>
> No, the Content-Language HTTP header doesn't set the default language.

>From the HTTPbis ID part 3, I read "The entity-header field
"Content-Language" describes the natural language(s) of the intended
audience for the enclosed entity.".
I expect that the intended audience speaks the same language as the document.

>> - Section 4.2.7 should be completely delegated to CSSOM
>
> This section defines the interface to CSSOM; why would CSSOM define the
> HTML behaviour?

I cannot remember exactly why, probably it has changed since I wrote
my email. Definitely I agree that HTML should define reflecting
attributes.

>
>> - Noscript should be allowed in XML, just without the complexity (and
>> simply treated as display:none if scripting is enabled)
>
> Why?

To allow easier migration from HTML to XML, and to present content
when scripting is disabled, if "normal" content relies on scripting
for user interaction (which is bad by design, but unavoidable in many
complex web applications)

>> - I completely cannot understand 4.4.10.2
>
> Assuming you mean the section that is now 4.4.11.2 Distinguishing
> site-wide headings from page headings, could you elaborate? What is the
> first problem?

Sorry, I forgot what I meant. Section 4.4.11.2 as it stands currently is clear.

>
>> - I would like to disagree with the man who disagreed with the other man
>> who disagreed with Ian Hickson (who said that things that are impossible
>> just take longer) (section about <q>)
>
> Not sure if this is a joke or a request to change the spec. :-)

It was a joke, but meant not to write self-referential examples.

>
>> - Section 4.8.3 still refers to the Window Object specification, which I
>> think has been superseded by HTML5
>
> Yeah, I have a note about fixing this in the source. This will be fixed
> in due course.

Ok

>
>> - classid is not a conforming attribute for object, and yet it is used
>> in the algorithm to find a plugin. AFAIK, classid is only used by IE
>> (along with COM) so I don't think it is a problem to drop it completely.
>
> Actually it's used by a number of implementations much as described by the
> algorithm.

If the classid attribute is used, it should be made conforming,
because it means it is useful for something (see also the thread about
classid in public-html)

>> - In section 4.10.4, the table about which attributes applies to the
>> various input types overflows in Opera 9.64 (1280x768 being the
>> resolution, 12pt the font size) and it is very hard to read
>
> Not sure how to improve this. There's a lot of data here.

How about splitting the table or reducing the font size?

>> - In 4.10.4.1.5 I expect that neither the user is able to see the password
>
> If the user agent is able to hide the password from everyone but the user,
> that would be a conforming implementation (and a far more useful one than
> today's), so I disagree.

Agreed, although it is a somehow theoretical situation, and I would
have never expected something like that from you :)

>> I hope that this will help someone
>
> Indeed, thanks!

I will proceed with the next part as soon as possible, just I hope the
spec will be stable from the time I send the email to the time you
read it.

> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>

Giovanni