[whatwg] Comments on Web Forms 2.0
Henri Sivonen
hsivonen at iki.fi
Sun Aug 22 07:32:50 PDT 2004
On Aug 17, 2004, at 16:37, Ian Hickson wrote:
> On Tue, 13 Jul 2004, Henri Sivonen wrote:
>>> 2.5. Extensions to file upload controls
>>
>>> * UAs should use the list of acceptable types in constructing a
>>> filter
>>> for a file picker, if one is provided to the user.
>>
>> That feature is not likely to be reliably implementable considering
>> that
>> real-world systems do not have comprehensive ways of mapping between
>> file
>> system type data and MIME types.
>
> I am told modern systems do, now.
Which modern systems?
>>> For text input controls it specifies the maximum length of the
>>> input, in
>>> terms of numbers of characters. For details on counting string
>>> lengths, see
>>> [CHARMOD].
>>
>> Should UAs use NFC for submissions?
>
> I don't know, should they?
I am inclined to think that NFC SHOULD be used in order to accommodate
transitional systems that treat Unicode as "wide ASCII". For example, a
server-side system written in PHP4 may not have Unicode normalization
facilities available to it and might send the data to Mozilla later. If
a UA had posted content in NFD to the server and the server naïvely
sent to the content to the OS X version of Mozilla, text in common
European languages would break in an ugly way.
I would hesitate making NFC a MUST, though, because I don't know
whether small devices can hold the data that is needed in order to
carry out Unicode normalization. Requiring desktop apps to normalize
shouldn't be a big deal. At least OS X and Gnome provide normalization
facilities and ICU can be thrown in as a cross-platform solution.
In any case, robust server-side systems should not trust that the input
in is a particular normalization form and should normalize the data
themselves. The point is accommodating systems that are not robust.
>>> To prevent an attribute from being processed in this way, put a
>>> non-breaking
>>> zero-width space character () at the start of the attribute.
>>
>> Isn't the use of that char as anything but the BOM deprecated or at
>> least
>> considered harmful?
>
> Arguably, it _is_ a BOM here.
>
> I'm not overly fond of this either, but it's the only solution I could
> find that was relatively harmless (the BOM can always be dropped at the
> start of strings)
Exactly. Which is why tools used for generating the page might drop it
on the server!
Actually, I am distributing one such tool myself. Is the tool broken?
http://iki.fi/hsivonen/php-utf8/
> and yet did the job. Better suggestions are welcome
> though.
My immediate thought is ZWNJ, but I'm not sure if using it is a good
idea.
>>> Note that a string containing the codepoint's value itself (for
>>> example, the
>>> six-character string "U+263A" or the seven-character string
>>> "☺") is
>>> not considered to be human readable and must not be used as a
>>> transliteration.
>>
>> Do you expect UAs that already do this change their behavior with the
>> legacy
>> submission types?
>
> We can hope.
FWIW, there may be CMS input form handlers that expect the prohibited
behavior. I have been involved in developing one myself. (Not that I
recommend relying on such things. Obviously, UTF-8 is the way to go.)
>>> which has a root element named "submission", with no prefix,
>>> defining a
>>> default namespace uuid:d10e4fd6-2c01-49e8-8f9d-0ab964387e32.
>>
>> I think that is an inappropriate attempt to micromanage the syntactic
>> details
>> that are in the realm of a lower-level spec. I think the submission
>> format
>> should either allow all the syntactic sugar that comes with
>> Namespaces in XML
>> or be layered directly on top XML 1.0 without namespace support.
>
> The reason it is micromanaged is to make it possible to use either a
> pure
> XML 1.0 parser _or_ an XML 1.0 with namespaces parser on the server
> side
> without getting into any complications.
I was able to guess that that was the rationale behind the requirement.
But why is the ability use a namespace-unaware XML processor a
requirement? The only reason I can come up with is that PHP4 is borked
by default but widely used.
Processing namespaced XML with tools that don't support namespaces is
clueless and just plain wrong. If tools that don't support namespaces
are to be accommodated, wouldn't the natural way be to spec that the
elements are not in a namespace and the namespace processing layer is
not used? That way you wouldn't endorse behavior that is clueless and
just plain wrong.
I can see three problems with namespacelessness:
1) The current best practice for dispatching on the type of an XML
document is dispatching on the namespace. If there was no namespace,
one would have to fall back on dispatching on the content type. This is
not a real problem with this particular vocabulary because this
vocabulary has a distinct content type from the start.
2) You couldn't mix the vocabulary with other vocabularies using
namespaces. This is a theoretical problem but probably not a real one,
because the vocabulary is limited to a specific case of client-server
interaction. Besides, the way you limit the use of namespaces in the
current spec language would also preclude creative augmentations to the
submission vocabulary.
3) You intend to submit the spec to a consortium that shall not be
named and you know the powers that be in the consortium that shall not
be named would veto any spec that builds directly on top XML 1.0
without the namespace layer in between.
So of the three problems only the last one is significant and it is a
political problem and not a technical one. Sadly, political problems
may be more difficult to overcome than technical problems.
>>> but must include a BOM.
>>
>> I think that is not a legitimate requirement when UTF-8 is used.
>
> Why not?
It is a requirement that applies to the XML serialization, but the
requirement is not present in the XML spec. The requirement would mean
that you could not use any arbitrary but conforming XML serializer.
The use of the BOM as a UTF-8 signature is a Microsoftism that was only
allowed in XML 1.0 second edition, because fighting Microsoft text
editors would have been futile. Still, if you pick a non-Microsoft XML
serializer off the shelf, chances are it does not emit a BOM in the
UTF-8 mode.
Is there a good reason to limit the use of arbitrary but conforming
off-the-shelf XML serializers?
>>> UAs may use either CDATA blocks, entities, or both in escaping the
>>> contents of attributes and elements, as appropriate.
>>
>> In order not to imply that this spec could restrict the ways
>> characters
>> are escaped, that sentence should be a note rather than part of the
>> normative prose. (Of course, only the pre-defined entities are
>> available. Then there are NCRs.)
>
> This spec _could_ restrict the ways characters are escaped. It needs to
> not be a note so that the "may" has normative value. No?
The could restrict the escaping in the same sense the HTTP spec could
restrict how you choose TCP sequence numbers.
In general, please see section 4.3 of RFC 3470.
--
Henri Sivonen
hsivonen at iki.fi
http://iki.fi/hsivonen/
More information about the whatwg
mailing list