[whatwg] Comments on Web Forms 2.0
Ian Hickson
ian at hixie.ch
Fri Aug 27 02:25:46 PDT 2004
On Sun, 22 Aug 2004, Henri Sivonen wrote:
> > > >
> > > > 2.5. Extensions to file upload controls
> > >
> > > > * UAs should use the list of acceptable types in constructing a
> > > > filter
> > > > for a file picker, if one is provided to the user.
> > >
> > > That feature is not likely to be reliably implementable considering that
> > > real-world systems do not have comprehensive ways of mapping between file
> > > system type data and MIME types.
> >
> > I am told modern systems do, now.
>
> Which modern systems?
Windows, Mac, Gnome, etc.
> > > > For text input controls it specifies the maximum length of the
> > > > input, in
> > > > terms of numbers of characters. For details on counting string lengths,
> > > > see
> > > > [CHARMOD].
> > >
> > > Should UAs use NFC for submissions?
> >
> > I don't know, should they?
>
> I am inclined to think that NFC SHOULD be used in order to accommodate
> transitional systems that treat Unicode as "wide ASCII". For example, a
> server-side system written in PHP4 may not have Unicode normalization
> facilities available to it and might send the data to Mozilla later. If a UA
> had posted content in NFD to the server and the server naïvely sent to the
> content to the OS X version of Mozilla, text in common European languages
> would break in an ugly way.
>
> I would hesitate making NFC a MUST, though, because I don't know whether small
> devices can hold the data that is needed in order to carry out Unicode
> normalization. Requiring desktop apps to normalize shouldn't be a big deal. At
> least OS X and Gnome provide normalization facilities and ICU can be thrown in
> as a cross-platform solution.
>
> In any case, robust server-side systems should not trust that the input in is
> a particular normalization form and should normalize the data themselves. The
> point is accommodating systems that are not robust.
Ok, NFC and SHOULD it is.
> > > > To prevent an attribute from being processed in this way, put a
> > > > non-breaking zero-width space character () at the start of
> > > > the attribute.
> > >
> > > Isn't the use of that char as anything but the BOM deprecated or at
> > > least considered harmful?
> >
> > Arguably, it _is_ a BOM here.
> >
> > I'm not overly fond of this either, but it's the only solution I could
> > find that was relatively harmless (the BOM can always be dropped at
> > the start of strings)
>
> Exactly. Which is why tools used for generating the page might drop it
> on the server!
That's fine. When put at the start of the string, it should be dropped.
> Actually, I am distributing one such tool myself. Is the tool broken?
> http://iki.fi/hsivonen/php-utf8/
It depends. If it drops the BOM in the middle of the string, then yes.
I expect this to be used so that you first output the attribute with this
"BOM", then the user-derived string, then the rest of the document:
...
print("<input value=\"\xFEFF");
print(escape(data));
print("\">");
...
> My immediate thought is ZWNJ, but I'm not sure if using it is a good
> idea.
I think that would be worse than the BOM.
> > > > Note that a string containing the codepoint's value itself (for
> > > > example, the six-character string "U+263A" or the seven-character
> > > > string "☺") is not considered to be human readable and must
> > > > not be used as a transliteration.
> > >
> > > Do you expect UAs that already do this change their behavior with
> > > the legacy submission types?
> >
> > We can hope.
>
> FWIW, there may be CMS input form handlers that expect the prohibited
> behavior. I have been involved in developing one myself. (Not that I
> recommend relying on such things. Obviously, UTF-8 is the way to go.)
Yeah. Google, for one. I've also seen login forms where people typed in
characters not in the form's submission set, and thus got a username that
was not the one they thought it was, so when they switched to another UA
that did things differently, it broke. It's madness.
> > > > which has a root element named "submission", with no prefix,
> > > > defining a default namespace
> > > > uuid:d10e4fd6-2c01-49e8-8f9d-0ab964387e32.
> > >
> > > I think that is an inappropriate attempt to micromanage the
> > > syntactic details that are in the realm of a lower-level spec. I
> > > think the submission format should either allow all the syntactic
> > > sugar that comes with Namespaces in XML or be layered directly on
> > > top XML 1.0 without namespace support.
> >
> > The reason it is micromanaged is to make it possible to use either a
> > pure XML 1.0 parser _or_ an XML 1.0 with namespaces parser on the
> > server side without getting into any complications.
>
> I was able to guess that that was the rationale behind the requirement.
> But why is the ability use a namespace-unaware XML processor a
> requirement? The only reason I can come up with is that PHP4 is borked
> by default but widely used.
There are various people using non-namespace-aware parsers. I don't really
want to force namespace-aware parsing when in fact the document is anyway
guarenteed to only have one namespace.
> Processing namespaced XML with tools that don't support namespaces is
> clueless and just plain wrong. If tools that don't support namespaces
> are to be accommodated, wouldn't the natural way be to spec that the
> elements are not in a namespace and the namespace processing layer is
> not used? That way you wouldn't endorse behavior that is clueless and
> just plain wrong.
It's actually more the other way around. This is a non-namespaced
document, but to accomodate people who are going to be using it in
namespace-aware environments, possibly merging it into other documents,
etc, it makes sense to actually give it a namespace.
For example, the same data format is later used for seeding forms. If on
the server you stack the data into a huge XML file containing other data
too, it would make sense to be able to just yank out that namespaced
subtree and just use it for preseeding too.
> 1) The current best practice for dispatching on the type of an XML
> document is dispatching on the namespace. If there was no namespace, one
> would have to fall back on dispatching on the content type. This is not
> a real problem with this particular vocabulary because this vocabulary
> has a distinct content type from the start.
It does during submission. But when the data is flying about after
submission, who knows.
> 2) You couldn't mix the vocabulary with other vocabularies using
> namespaces. This is a theoretical problem but probably not a real one,
> because the vocabulary is limited to a specific case of client-server
> interaction.
It's only limited _if_ it doesn't have a namespace.
Also, it is later used for preseeding forms.
> Besides, the way you limit the use of namespaces in the current spec
> language would also preclude creative augmentations to the submission
> vocabulary.
Well, extensions would be non-compliant, yes. But at least there is a
clear mechanism for experimentation.
> > > > but must include a BOM.
> > >
> > > I think that is not a legitimate requirement when UTF-8 is used.
> >
> > Why not?
>
> It is a requirement that applies to the XML serialization, but the
> requirement is not present in the XML spec. The requirement would mean
> that you could not use any arbitrary but conforming XML serializer.
>
> The use of the BOM as a UTF-8 signature is a Microsoftism that was only
> allowed in XML 1.0 second edition, because fighting Microsoft text
> editors would have been futile. Still, if you pick a non-Microsoft XML
> serializer off the shelf, chances are it does not emit a BOM in the
> UTF-8 mode.
>
> Is there a good reason to limit the use of arbitrary but conforming
> off-the-shelf XML serializers?
I guess that makes sense. And the BOM isn't really needed anyway. Ok, I've
made it optional for UTF-8.
> > > > UAs may use either CDATA blocks, entities, or both in escaping the
> > > > contents of attributes and elements, as appropriate.
> > >
> > > In order not to imply that this spec could restrict the ways
> > > characters are escaped, that sentence should be a note rather than
> > > part of the normative prose. (Of course, only the pre-defined
> > > entities are available. Then there are NCRs.)
> >
> > This spec _could_ restrict the ways characters are escaped. It needs
> > to not be a note so that the "may" has normative value. No?
>
> The could restrict the escaping in the same sense the HTTP spec could
> restrict how you choose TCP sequence numbers.
>
> In general, please see section 4.3 of RFC 3470.
Yes, indeed. That's why WF2 specifically _doesn't_ restrict this.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list