[whatwg] Conformance requirements for IRIs
hsivonen at iki.fi
Mon Apr 17 09:14:14 PDT 2006
In WA 1.0 and WF 2.0 some values are required to be IRIs and some
values are required to be IRI references. I'm confused about what
exactly this means in terms of conformance checking. (WF 2.0 does say
something about processing in a browser, though.)
First, I was amazed to learn that for pure non-infoset-augmenting
validation xsd:anyURI datatype does not mean anything useful beyond
token and that it is not exactly an IRI reference.
I started to suspect that just about every string indeed can be
considered sort of an IRI reference that can munged into an IRI
reference so there's nothing to check.
Then I found
which provides a fascinating number of enforcement options. I could
write a custom datatype wrapper for it, but I don't know which
options to use.
I'd appreciate some guidance on which enforcement options to use.
(E.g. should knowledge of the http scheme used? Should security
issues be flagged as non-conforming? Should "SHOULD" violations be
flagged as non-conforming? Etc.)
(This is the first time I venture into the world of IRIs. I have
intuitively thought that they are trouble, so I have knowingly
avoided minting non-URI IRIs myself.
I suspected that bad stuff happens with IRIs containing decomposed
character sequences. (These can be found in the URI form due to HFS+-
backed Apache setups.) Now that I've read the RFC, I think it is a
very bad idea to allow decomposed characters in IRIs and that the RFC
does not require percent encoding character sequences that are not
invariant under NFC.
This may have relevance to how the WF 2.0 url input works. That is,
it probably SHOULD (MUST?) NOT percent-decode URIs that would result
in IRIs that are not invariant under NFC.)
hsivonen at iki.fi
More information about the whatwg