[whatwg] Comments on Web Forms 2.0

Tue Nov 16 09:47:59 PST 2004

On Mon, 6 Sep 2004, Henri Sivonen wrote:
> On Aug 27, 2004, at 12:25, Ian Hickson wrote:
> > On Sun, 22 Aug 2004, Henri Sivonen wrote:
> > > > > > 
> > > > > > 2.5. Extensions to file upload controls
> > > > > 
> > > > > > * UAs should use the list of acceptable types in constructing 
> > > > > > a filter for a file picker, if one is provided to the user.
> > > > > 
> > > > > That feature is not likely to be reliably implementable 
> > > > > considering that real-world systems do not have comprehensive 
> > > > > ways of mapping between file system type data and MIME types.
> > > > 
> > > > I am told modern systems do, now.
> > > 
> > > Which modern systems?
> > 
> > Windows, Mac, Gnome, etc.
> 
> I was under the impression (unsubstantiated; haven't checked recently) 
> that the mappings are comprehensive only for the likes of PDF and JPEG 
> but are not comprehensive for the likes of OpenOffice.org or Lotus 
> files.

Oh, well, sure. Nobody has a _comprehensive_ list. That's one of the 
reasons the above is only a "should" -- there "may exist valid reasons in 
particular circumstances to ignore" this requirement. Such as the UA not 
having the information.

> > > Actually, I am distributing one such tool myself. Is the tool broken?
> > > http://iki.fi/hsivonen/php-utf8/
> > 
> > It depends. If it drops the BOM in the middle of the string, then yes.
> 
> It does. My reasoning was that the BOM could only occur in the middle of 
> a string as an artifact left there when concatenating strings that start 
> with the BOM.

This is incorrect, U+FEFF is a valid character in its own right (albeit 
deprecated in favour of U+2060) and is only the BOM if found at the start 
of a string.

> > I expect this to be used so that you first output the attribute with 
> > this "BOM", then the user-derived string, then the rest of the 
> > document:
> > 
> >    ...
> >    print("<input value=\"\xFEFF");
> >    print(escape(data));
> >    print("\">");
> >    ...
> 
> However, if the document is built using SAX or the DOM, the attribute 
> value as a whole exists as a string object at some point. Arguably, in 
> that case what you have is a string that starts with the BOM. Would it 
> be OK to drop the BOM?

At some point you have the data as its own string. Strip any BOM at that 
point. Then add a BOM and the string to your DOM. Don't strip it then. :-)

> > > I was able to guess that that was the rationale behind the 
> > > requirement. But why is the ability use a namespace-unaware XML 
> > > processor a requirement? The only reason I can come up with is that 
> > > PHP4 is borked by default but widely used.
> > 
> > There are various people using non-namespace-aware parsers.
> 
> But if they are using them with namespaced documents, what they are 
> doing is not right.

The documents don't _need_ to be namespaced, they are all in one namespace 
and don't contain any content that could ever be from other namespaces. 
The only reason to use namespaces at all is, in fact, to support users of 
namespace-aware parsers.

Which is in fact what I said, hehe:

> > It's actually more the other way around. This is a non-namespaced 
> > document, but to accomodate people who are going to be using it in 
> > namespace-aware environments, possibly merging it into other 
> > documents, etc, it makes sense to actually give it a namespace.
> > 
> > For example, the same data format is later used for seeding forms. If 
> > on the server you stack the data into a huge XML file containing other 
> > data too, it would make sense to be able to just yank out that 
> > namespaced subtree and just use it for preseeding too.
> 
> It would make sense to note that the constraint on the namespace 
> declaration does not apply when the data is flowing from the server to 
> the browser. That way, the random server-side developer would not have 
> to worry whether his/her serializer puts the namespace declaration only 
> on the root element without prefixes. (I realize the form seeding 
> section already implies this, but it wouldn't hurt to note it 
> explicitly.)

Ok, added:

   While this section restricts the exact features of XML that a UA may 
   use, these restrictions do not apply to the files used when seeding a 
   form with initial values. 

Is that ok?

> That still leaves the burden of adhering to a special syntactic rule to 
> browser implementors and desperate integrators who have to emulate form 
> submissions. However, when you're integrating with a system that is not 
> cooperative, chances are the system isn't using a proper namespace-aware 
> XML processor, either. :-/

Indeed.

> (The MIME type asymmetry in submission and seeding might raise some 
> eyebrows but is probably realistic.)

"application/x-www-form+xml" is one of the MIME types covered by section 
6.2 (it's an XML MIME type because it ends in "+xml").

> > > Besides, the way you limit the use of namespaces in the current spec 
> > > language would also preclude creative augmentations to the 
> > > submission vocabulary.
> > 
> > Well, extensions would be non-compliant, yes. But at least there is a 
> > clear mechanism for experimentation.
> 
> Actually, the spec doesn't say what the recipient is supposed to do when 
> encountering unrecognized elements or attributes.

The spec doesn't say what the recipient is supposed to do with 
_recognised_ elements or attributes, either. What servers do is pretty 
much up to the servers, not much we can do about it. (Similarly, there's 
no spec, to my knowledge, that really says what servers are supposed to do 
with application/x-www-form-urlencoded or multipart/form-data content.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'