[whatwg] Comments on Web Forms 2.0

Mon Sep 6 11:04:42 PDT 2004

On Aug 27, 2004, at 12:25, Ian Hickson wrote:

> On Sun, 22 Aug 2004, Henri Sivonen wrote:
>>>>>
>>>>> 2.5. Extensions to file upload controls
>>>>
>>>>>     * UAs should use the list of acceptable types in constructing a
>>>>> filter
>>>>> for a file picker, if one is provided to the user.
>>>>
>>>> That feature is not likely to be reliably implementable considering 
>>>> that
>>>> real-world systems do not have comprehensive ways of mapping 
>>>> between file
>>>> system type data and MIME types.
>>>
>>> I am told modern systems do, now.
>>
>> Which modern systems?
>
> Windows, Mac, Gnome, etc.

I was under the impression (unsubstantiated; haven't checked recently) 
that the mappings are comprehensive only for the likes of PDF and JPEG 
but are not comprehensive for the likes of OpenOffice.org or Lotus 
files.

>> Actually, I am distributing one such tool myself. Is the tool broken?
>> http://iki.fi/hsivonen/php-utf8/
>
> It depends. If it drops the BOM in the middle of the string, then yes.

It does. My reasoning was that the BOM could only occur in the middle 
of a string as an artifact left there when concatenating strings that 
start with the BOM.

> I expect this to be used so that you first output the attribute with 
> this
> "BOM", then the user-derived string, then the rest of the document:
>
>    ...
>    print("<input value=\"\xFEFF");
>    print(escape(data));
>    print("\">");
>    ...

However, if the document is built using SAX or the DOM, the attribute 
value as a whole exists as a string object at some point. Arguably, in 
that case what you have is a string that starts with the BOM. Would it 
be OK to drop the BOM?

>> I was able to guess that that was the rationale behind the 
>> requirement.
>> But why is the ability use a namespace-unaware XML processor a
>> requirement? The only reason I can come up with is that PHP4 is borked
>> by default but widely used.
>
> There are various people using non-namespace-aware parsers.

But if they are using them with namespaced documents, what they are 
doing is not right.

> It's actually more the other way around. This is a non-namespaced
> document, but to accomodate people who are going to be using it in
> namespace-aware environments, possibly merging it into other documents,
> etc, it makes sense to actually give it a namespace.
>
> For example, the same data format is later used for seeding forms. If 
> on
> the server you stack the data into a huge XML file containing other 
> data
> too, it would make sense to be able to just yank out that namespaced
> subtree and just use it for preseeding too.

It would make sense to note that the constraint on the namespace 
declaration does not apply when the data is flowing from the server to 
the browser. That way, the random server-side developer would not have 
to worry whether his/her serializer puts the namespace declaration only 
on the root element without prefixes. (I realize the form seeding 
section already implies this, but it wouldn't hurt to note it 
explicitly.)

That still leaves the burden of adhering to a special syntactic rule to 
browser implementors and desperate integrators who have to emulate form 
submissions. However, when you're integrating with a system that is not 
cooperative, chances are the system isn't using a proper 
namespace-aware XML processor, either. :-/

(The MIME type asymmetry in submission and seeding might raise some 
eyebrows but is probably realistic.)

>> Besides, the way you limit the use of namespaces in the current spec
>> language would also preclude creative augmentations to the submission
>> vocabulary.
>
> Well, extensions would be non-compliant, yes. But at least there is a
> clear mechanism for experimentation.

Actually, the spec doesn't say what the recipient is supposed to do 
when encountering unrecognized elements or attributes.

-- 
Henri Sivonen
hsivonen at iki.fi
http://iki.fi/hsivonen/