[whatwg] URL: file: URLs
Boris Zbarsky
bzbarsky at MIT.EDU
Sun Oct 28 10:51:38 PDT 2012
On 10/27/12 3:35 PM, Anne van Kesteren wrote:
> This is covered as we do this for all URLs currently with a "relative
> scheme" (http/ws/...). I know you indicated this as potentially
> problematic
Let's have that fight separately. ;)
>> 2) file:// URIs are parsed as a "no authority" URL in Gecko. Quoting the
>> IDL comment:
...
> The parser in the specification should handle these in the same way.
Same as the comment I quoted? As same as something else?
> I have not introduced a "no authority" concept however. The parser in
> the specification also preserves the host as other user agents seem to
> preserve it.
Well, the Gecko parser preserves the host at this stage assuming the URI
was correctly formatted with a host. Again:
blah://foo/bar => blah://foo/bar
The interesting things happen when you have 0, 1, or 3 slashes between
':' and "foo". The handling of "foo" after this point is a separate issue.
>> 4) For "no authority" URLs, including file://, on Windows and OS/2 only, if
>> what looks like authority section looks like a drive letter, it's treated as
>> part of the path. For example, "file://c:/" is treated as the filename
>> "c:\". "Looks like a drive letter" is defined as "ASCII letter (any case),
>> followed by a ':' or '|' and then followed by end of string or '/' or '\\'".
>> I'm not sure why this is checking for '\\' again, honestly. ;)
>
> Is this part of URL parsing or part of doing something with the
> resulting URL?
In Gecko, it's part of URL parsing. More precisely, it's part of the
normalization performed as part of constructing a "URL" object from a
string. Since this is also how we parse URLs, it's effectively all part
of the package.
But note that it would be a bit odd of file://c:/ claimed to have a host
of "c" with a default port or some such...
>> 5) When parsing a "no authority" URL (including file://), and when item 4
>> above does not apply, it looks like Gecko skips everything after "file://"
>> up until the next '/', '?', or '#' char before parsing path stuff.
>
> So the host is dropped?
In Gecko, I believe so, yes. I'm not saying this is desirable; just
what Gecko does.
>> 6) On Windows and OS/2, when dynamically parsing a path for a "no
>> authority" URL (not sure whether this is actually web-exposed, fwiw...)
>> Gecko will do something involving looking for a path that's only an ASCII
>> letter followed by ':' or '|' followed by end of string.
...
>> 7) When doing URI equality comparisons
...
>> 8) When actually resolving a file:// URL
> These points do not seem to be about parsing, correct?
Well, point 6 is about parsing, sort of.
7 and 8 are not, though at some point we'll need to define equality
comparisons anyway.
-Boris
More information about the whatwg
mailing list