[whatwg] several messages about XML syntax and HTML5
bhawkeslewis at googlemail.com
Mon Dec 18 15:42:28 PST 2006
Henri Sivonen wrote:
> Search engines should not list ill-formed application/xhtml+xml at
> all, because a user following the link would see the YSoD.
Ah, but what about XHTML 1.0 served as text/html, which is in a weird
twilight zone where it is neither "HTML" nor quite the same as
"text/html (non-standard)"? (But then I suppose one could argue such
XHTML doesn't need to be well-formed either. Maybe just labelling all
such documents as "HTML compatible" would be better.)
> However, in cases of slightly broken text/html, the user could still find the
> page useful. The search engines are in the business of providing
> results that users find useful, so search engines should list
> slightly broken text/html documents.
I don't follow this. How can search engines distinguish between
"slightly broken text/html" and very broken text/html? How can search
engines prejudge how a given breakage will affect how the user wants to
use the page (as a blind user, as a microformats user, as a minority
browser user, etc.)?
> The point is that you shouldn't show users something that they
> don't understand or care about.
What, like ads? ;) Or, more seriously, like the information about the
sizes of pages offered by Google search? My guess (and I admit it's only
that) is that "39k" means nothing to an average user, even the ones on
dial-up who might care. Anyhow, this all prejudges what users care
about. If I'm an ordinary user, it's handy to know a page may not be
working because it's broken, not because of some flaw in my browser. And
a /lot/ of pages on the web don't work. Understanding might be a
problem, but that's true of most of the stuff on search engines. The
non-technical users I talk to can't understand the difference between
the address bar, the search bar, and the search input on their homepage.
> Google, Yahoo and MSN aren't in the business of enforcing a standards-
> compliance agenda.
Nothing I said implied they were. The apparent absence of validity
warnings from Google's Accessible Search may be more surprising, but I
think the chance of any of them implementing such warnings in their main
search results is zero, regardless of the merits of the case either way.
(It would be /way/ too embarrassing since many, if not most, of those
companies' own webpages don't validate.) I just don't think the
particular argument against it put forward earlier in this thread (about
it scaring users away from Google search) stands up.
> On the contrary, they compete on how well they can
> rank the relevance of search results even in the absence of the
> supposedly seache-engine-helping semantic markup.
Generally true, though some important aspects of valid markup do help
search engines; e.g. the requirement of an ALT attribute for IMG
provides search engines with additional text data.
More information about the whatwg