From hsivonen at iki.fi Mon Sep 1 00:17:33 2008 From: hsivonen at iki.fi (Henri Sivonen) Date: Mon, 1 Sep 2008 10:17:33 +0300 Subject: [whatwg] Creative Commons Rights Expression Language In-Reply-To: <0561279B-9DCC-45BE-9A13-EA54AF205C35@w3.org> References: <57221E38FB4DD54C946CE654959A554D1AAB6CF343@GVW0436EXB.americas.hpqcorp.net> <57221E38FB4DD54C946CE654959A554D1AAB903BA3@GVW0436EXB.americas.hpqcorp.net> <48AF59C4.9050604@adida.net> <4E35935B-AB9B-4D41-8AA3-4A0BED8B2A08@iki.fi> <48B07A3F.3080304@adida.net> <48B1333F.1010708@danbri.org> <48B61143.8020004@adida.net> <1cb725390808280531s25259ef3q81aeacdd5f2a2774@mail.gmail.com> <0561279B-9DCC-45BE-9A13-EA54AF205C35@w3.org> Message-ID: On Sep 1, 2008, at 06:20, Karl Dubost wrote: > Le 29 ao?t 2008 ? 23:04, Henri Sivonen a ?crit : >> Also, having more metadata leads to UI clutter and data entry >> fatigue that alienates users. In the past, I worked on a content >> repository project that failed because (among other things) the >> content upload UI asked for an insane amount (a couple of >> screenfuls back then; probably a screenful today) of metadata when >> it didn't occur to system specifiers to invest in full text search. >> More metadata isn't better. Instead, systems should ask for the >> least amount of metadata that can possibly work (when the metadata >> must be entered by humans as opposed to being captured by machines >> like EXIF data). See also >> http://www.w3.org/QA/2008/08/the-digital-stakhanovite > > hehe. This was a-good-try-but-mischaracterization-from-the-ministry- > of-truth That was uncalled for. > to associate this article with the rants on metadata :) Let's clarify. It's an excellent article. Thank you for writing it. > What I explain in the article is not the volume of metadata, but the > volume of items and the context of usage. > > 1. Extract anything you can from the data itself (exif, iptc, xmp, > modifications, date) Yes. It's sad how some systems ask the user for a title when the title is already in an HTML or PDF file but it never occurred to the specifiers of the system that files can actually be parsed. It even sadder to ask the user for keywords, because it never occurred to the specifiers of a system that full-text search has been invented. > 2. Give a possibility in the UI to modify or add data. Even the *possibility* to add costs UI real estate, so specifiers of a system should be very, very careful in what possibilities they offer. > In a business environment, you might have to give metadata about a > work. I do it in my every day job. I give titles to my emails, I put > comments in my cvs commits, etc. etc. These are all constraints. Not > adding the data would still work technically. Sure. However, writing a string that appears in mailbox list view or in a list view of commits is the baseline of user-entered metadata. Everything else is something *more*. Just because something happens in a business setting where people can be fired doesn't mean that more metadata is better. I've seen metadata fail even in the military where they thought they could *order* people to enter metadata (and where they have a more elaborate punishment structure than in an ordinary working environment). > Having a UI cluttered with fields to enter is not a failure of > metadata, it is a failure of the project in the social and business > constraints of the project. It's definitely a failure of the project in the social and business constraints. The reason for failure was a line of thought that went something like this: Metadata is good. Therefore, let's have more of it. Let's model what can be said about the domain. We are in a position to require people to enter the metadata. The process didn't try to seriously find out what the real must-have hard social and business constraints were. My point is that "metadata is useful" isn't the whole story. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From ddailey at zoominternet.net Mon Sep 1 04:04:55 2008 From: ddailey at zoominternet.net (ddailey) Date: Mon, 1 Sep 2008 07:04:55 -0400 Subject: [whatwg] Creative Commons Rights Expression Language References: <57221E38FB4DD54C946CE654959A554D1AAB6CF343@GVW0436EXB.americas.hpqcorp.net><57221E38FB4DD54C946CE654959A554D1AAB903BA3@GVW0436EXB.americas.hpqcorp.net><48AF59C4.9050604@adida.net><4E35935B-AB9B-4D41-8AA3-4A0BED8B2A08@iki.fi><48B07A3F.3080304@adida.net> <48B1333F.1010708@danbri.org><48B61143.8020004@adida.net><1cb725390808280531s25259ef3q81aeacdd5f2a2774@mail.gmail.com> <0561279B-9DCC-45BE-9A13-EA54AF205C35@w3.org> Message-ID: Sorry for joining in naively to a conversation I've not been following, but reading Karl's remarks on the facilitation of metadata entry for users, some discussions in the vicinity of the recent SVGOpen that concerned usability, accessibility, and metadata made me think the following (that I suppose is rather outside the realm of HTML): Suppose the user or author (since in an app the distinction is blurred somewhat) is building something like a graph (in the discrete math sense), an image repository, or even a diagram (though the categories of content here are heterogeneous, making the argument a bit more tenuous) using a guiwebapp (like inkscape for diagrams or http://srufaculty.sru.edu/david.dailey/svg/graphs30.svg for graphs). Let's say there are n basic entities (like graphs or images) for which metadata is required. Let us furthermore assume the metadata description language is of order 0 1 2 3 or 4 * and that the minimum number of user operations required to complete the metadata description for a single entity is bounded above by k. We then may plot a user performance function that estimates the probability, p, that users will actually succeed in entering data (as a function perhaps of not only n and k, but of the user's investment in the process). Clearly as n and k grow and as the user's investment in the process declines, so does p. We are interested, through, interface, in maximizing p. I have a hunch (in math it is called a conjecture, but in CHI it is more like a hunch) that not only how, but also when, this conversation between user and software takes place affects the probability. For example if an artist were using Inkscape to draw SVG, then mandating a conversation about metadata each time a curve or gradient is completed is likely to drive users to AutoCad for their diagrams, even if wine is served. In certain cases, it makes most sense to build that conversation as an "exit interview". If we will have k phrases to enter (using a grammar of graph theoretic phrases) for each of n objects, then we may wish to build a very comfortable GUI to facilitate that for all the affected entities upon closing the app: Dear user, you have just completed a schematic drawing for the Intel i-Chore 42x processor, would you now like to a) save b) enter appropriate metadata c) save and enter data d) drink wine. The notion is that a GUI enabling such, could if it were viewed as a stage or mode of development a) rely on the visualization of the opus as thus far created b) be appropriately rich to the order of the metadata description language and c) make the data entry process unbundled from the creation process, hence allowing diversification of the assignments of tasks to workers (e.g. the familiar phrase of the assessment revolt of 2028: "let the bureacrats do the bureaucracy!"). That isn't to say that we should not also facilitate the entry of data at each stage of the drawing process, with a sub-interface of the master metadata editor, but given the complexity that some metadata editors may have to convey, the nature of the conversation between user and software may not be allowed to remain entirely casual (that is, wine may need to be upgraded to tequila). /fwiw David (by the way, an Intellectual Property/provenance description language such as the library and visual rights communities work with might be an interesting overlay for the web, provided both free and corporate models (together with ample graph theory) are included) * define the order of a metadata description language as 0 if it consists of simple non-delimited strings, 1 if it consists of delimited strings (with a single delimiter), 2 if the delimiters are parentheses (required to match), 3 if the delimiters act like parentheses of multiple flavors as in XML, and 4 if the language is fully graph theoretic (parenthesized strings plus cross linkages -- footnotes). ----- Original Message ----- From: "Karl Dubost" To: "Henri Sivonen" Cc: "Ben Adida" ; "Paul Prescod" ; "Ian Hickson" ; "WHAT-WG" Sent: Sunday, August 31, 2008 11:20 PM Subject: Re: [whatwg] Creative Commons Rights Expression Language Le 29 ao?t 2008 ? 23:04, Henri Sivonen a ?crit : > Also, having more metadata leads to UI clutter and data entry fatigue > that alienates users. In the past, I worked on a content repository > project that failed because (among other things) the content upload UI > asked for an insane amount (a couple of screenfuls back then; probably a > screenful today) of metadata when it didn't occur to system specifiers to > invest in full text search. More metadata isn't better. Instead, systems > should ask for the least amount of metadata that can possibly work (when > the metadata must be entered by humans as opposed to being captured by > machines like EXIF data). See also > http://www.w3.org/QA/2008/08/the-digital-stakhanovite hehe. This was a-good-try-but-mischaracterization-from-the-ministry-of- truth to associate this article with the rants on metadata :) Let's clarify. What I explain in the article is not the volume of metadata, but the volume of items and the context of usage. 1. Extract anything you can from the data itself (exif, iptc, xmp, modifications, date) 2. Give a possibility in the UI to modify or add data. In a business environment, you might have to give metadata about a work. I do it in my every day job. I give titles to my emails, I put comments in my cvs commits, etc. etc. These are all constraints. Not adding the data would still work technically. For my own personal photo, I don't (want/have) time to put plenty of metadata. And that's fine. I do though bulk metadata at a regular pace, for location (ex: all these selected photos have been taken in Taiwan with the help of GUI tools. Yes tools save my life). Having a UI cluttered with fields to enter is not a failure of metadata, it is a failure of the project in the social and business constraints of the project. -- Karl Dubost - W3C http://www.w3.org/QA/ Be Strict To Be Cool From ian at hixie.ch Mon Sep 1 20:38:14 2008 From: ian at hixie.ch (Ian Hickson) Date: Tue, 2 Sep 2008 03:38:14 +0000 (UTC) Subject: [whatwg] [editorial] Tokeniser "tag name" state order In-Reply-To: <48573D83.7090604@entai.co.uk> References: <48573D83.7090604@entai.co.uk> Message-ID: On Tue, 17 Jun 2008, Andrew Sidwell wrote: > > http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenisation.html#tag-name0 > > The "tag name" state has the "EOF" entry in a weird place -- in other > states, "EOF" comes before "Anything else", but in this one it comes > between "U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL > LETTER Z" and "U+002F SOLIDUS (/)". Putting it in the same place as it > is in other states would be nice. Fixed. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' From ian at hixie.ch Mon Sep 1 20:48:39 2008 From: ian at hixie.ch (Ian Hickson) Date: Tue, 2 Sep 2008 03:48:39 +0000 (UTC) Subject: [whatwg] vtab as an NCR expansion In-Reply-To: <451CA8D0-E5C8-49BC-9096-6AC188BD88AA@iki.fi> References: <451CA8D0-E5C8-49BC-9096-6AC188BD88AA@iki.fi> Message-ID: On Wed, 18 Jun 2008, Henri Sivonen wrote: > > Is it intentional that the vtab change didn't cause a change to vtab > treatment when expanding NCRs? Yes; why would it cause a change? The character still passes through, it's just not treated as whitespace. (It's not like U+000D, which is converted to U+000A.) -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' From ian at hixie.ch Mon Sep 1 20:50:26 2008 From: ian at hixie.ch (Ian Hickson) Date: Tue, 2 Sep 2008 03:50:26 +0000 (UTC) Subject: [whatwg] Any "other" end tag in after head In-Reply-To: <3FA51B37-2F60-43ED-AFEB-AA4C52D59D10@iki.fi> References: <3FA51B37-2F60-43ED-AFEB-AA4C52D59D10@iki.fi> Message-ID: On Wed, 18 Jun 2008, Henri Sivonen wrote: > > After head talks about any "other" end tag, but has no definitions for > end tags but "other". Is that intentional? It was, for consistency, but I've changed it. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' From ian at hixie.ch Mon Sep 1 21:04:07 2008 From: ian at hixie.ch (Ian Hickson) Date: Tue, 2 Sep 2008 04:04:07 +0000 (UTC) Subject: [whatwg] Comment parsing In-Reply-To: References: Message-ID: Included below are some e-mails regarding how to parse comments. They point out inconsistencies between browsers and the spec. These inconsistencies were known when the spec was written. Browsers aren't consistent with each other either. I'd rather leave the parser spec stable here for a while to see if we can converge on that (as far as I can tell it represents a good compromise along the axes of compatibility, security, implementation ease, and maintainability). If browsers, when they implement HTML5, find that they cannot get good enough compatibility with the current spec text, then we should change the spec at that point. On Thu, 26 Jun 2008, Adam Barth wrote: > > Recently, I've been testing how browser parsers handle unterminated . Internet Explorer 7, Firefox 3, Safari 3.1, and Opera 9.5 > agree on the following cases: > > http://crypto.stanford.edu/~abarth/research/html5/comments/open-textarea.html > http://crypto.stanford.edu/~abarth/research/html5/comments/open-script.html > http://crypto.stanford.edu/~abarth/research/html5/comments/open-style.html > > Essentially, they treat the as > an alternate comment terminator to the usual --> > > http://crypto.stanford.edu/~abarth/research/html5/comments/strange-ending.html > > In Internet Explorer 7 and Opera 9.5, if the document later contains the > usual comment terminator, then that character sequence terminates the > comment instead: > > http://crypto.stanford.edu/~abarth/research/html5/comments/strange-ending-with-real-ending.html > http://crypto.stanford.edu/~abarth/research/html5/comments/strange-ending-with-later-comment.html > > Firefox 3 and Safari 3.1 do not appear to exhibit this behavior. > > (Interestingly, the syntax highlighter in vim suggests the document will > be parsed as in Firefox and Safari, no doubt contributing to author > confusion.) On Fri, 27 Jun 2008, Adam Barth wrote: > > Ian explained to me on IRC that IE and Opera are consuming the entire > document as a comment and reparsing for > (i.e., --!> is not treated > specially). That is supported by the following test case: > > http://crypto.stanford.edu/~abarth/research/html5/comments/bang-gt.html > > Safari and Firefox contain explicit code for detecting --!> (as > demonstrated by the above test case). In Safari, the code was > introduced in > > http://trac.webkit.org/changeset/4103 > > In Firefox, the code was introduced in > > https://bugzilla.mozilla.org/show_bug.cgi?id=110544 > > As far as I can tell, neither checkin explains why this behavior was > added. On Fri, 27 Jun 2008, Maciej Stachowiak wrote: > > Hyatt's comment on the WebKit checkin says it was to match other > browsers (presumably Mozilla). On Fri, 27 Jun 2008, Adam Barth wrote: > > It looks like Mozilla is planning to change their behavior to match the > HTML5 spec in this regard. See the patch in > . On Tue, 15 Jul 2008, Jim Jewett wrote: > > That's too bad; I would rather that the spec supported "--!>" while > parsing (though not for authoring). > > *I* see it mostly on fairly old pages -- generally in archives, or other > places where the original author cannot make a change. > > I notice these pages because I remember a time (err, not this decade) > when I wrote most of my own comments that way, because it was > recommended by about half the tutorials, it worked on the browsers I > could check with (lynx, and I think Mosaic and early netscape) -- and it > seemed more consistent because of the symmetry. (It also allowed the > use of "-->" for arrow, but I don't see a good way to compatibly support > that.) > > Having a later "-->" turn "--!>" recognition off seems to silently break > a fair portion of these older pages, because that is often from a later > comment, so that a middle portion of the document is lost. > > Letting any ">" end the comment may or may not be better still. I do > remember that Opera found that strictly enforcing the SGML requirements > was a loss, though I don't remember the details. (Something like > counting parity on double-hyphens.) -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' From ian at hixie.ch Mon Sep 1 21:36:19 2008 From: ian at hixie.ch (Ian Hickson) Date: Tue, 2 Sep 2008 04:36:19 +0000 (UTC) Subject: [whatwg] Define Authoring Requirements for Duplicate Attributes In-Reply-To: <486DF535.1010004@lachy.id.au> References: <486DF535.1010004@lachy.id.au> Message-ID: On Fri, 4 Jul 2008, Lachlan Hunt wrote: > > In the Writing HTML Documents section, under Attributes, the spec > should state that attribute names need to be unique for each element, > and that duplicate attributes are an error. Currently, this is only > stated in the parsing algorithm. I've tried to fix this. It's complicated by the serialisation of non-HTML element attributes. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' From ian at hixie.ch Mon Sep 1 22:06:22 2008 From: ian at hixie.ch (Ian Hickson) Date: Tue, 2 Sep 2008 05:06:22 +0000 (UTC) Subject: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range] In-Reply-To: References: Message-ID: On Wed, 30 Jul 2008, ?istein E. Andersen wrote: > > The current table seems to cover the mappings between different common > compatible 8-bit encodings as implemented in IE7, yes. The table at > gives a bit more detail, most > of which is better kept outside HTML5 itself. However, the following > observations can be made: > > 1. Opera, Firefox and Safari all handle US-ASCII as Windows-1252. > IE7, on the other hand, simply ignores the high bit (as it does for > a few other 7-bit encodings, by the way). Perhaps this > alias could be dropped from the other browsers. Ignoring the high bit seems like a dangerous security bug; dropping any character with a high bit as U+FFFD seems unnecessarily drastic. I've made the spec go with the O/F/S behaviour here. > 2. Firefox and Opera seem to sniff for text/plain; charset=ISO-8859-1 (as per HTML5), > whereas Safari seems to do the same for text/plain; charset=ISO-8859-11 > instead [Version 3.1.2 (5525.20.1)]. Bug? I believe so. > 3. For certain character sets, different browsers map to different, but visually > similar Unicode characters. Sometimes, one mapping is old/outdated, > but this is not always the case. Not sure what I can do about that. > 4. Delete (0x7F) and the C1 range (0x80--0x9F) are handled quite inconsistently; > different browsers do different things for the same encoding, and the same > browser gives analogous encodings different treatment. > > (For the early ISO-8859-* encodings, the IANA registry points to RFC 1345, > which effectively maps 0x7F--0x9F to U+7F--U+9F, but does not really > seem to regard this feature as an essential part of the character set: > > the charset is often coded with both > graphical and control character sets. If the coded character set is > a 96-character set, it is tabled with the relevant GL set (normally > ISO-IR-6) and with ISO 6429 as C0 and C1 > > As for the Windows-* encodings, Microsoft documentation treats bytes > in this range as unassigned unless they are mapped to graphical characters, > whereas Microsoft products return the underlying byte value in this case.) I think the HTML5 spec does what is necessary here, but it may be that the encodings specs are vague still. > 5. IE handles KOI8-U as KOI8-RU, whereas Safari does the opposite. The former > is probably more reasonable (assuming that letters are more important than > line-drawing characters), but neither is actually correct given that the encodings > are, strictly speaking, incompatible. This issue will of course look a bit different > if it can be shown that documents containing the letter ??/?? (only in KOI8-RU) > are frequently mislabelled as KOI8-U. I guess we'll see what feedback we get on this when testing begins. Cheers, -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' From hsivonen at iki.fi Tue Sep 2 00:12:41 2008 From: hsivonen at iki.fi (Henri Sivonen) Date: Tue, 2 Sep 2008 10:12:41 +0300 Subject: [whatwg] vtab as an NCR expansion In-Reply-To: References: <451CA8D0-E5C8-49BC-9096-6AC188BD88AA@iki.fi> Message-ID: <67F45187-560E-4B92-AB15-F2AC6CD03F8A@iki.fi> On Sep 2, 2008, at 06:48, Ian Hickson wrote: > On Wed, 18 Jun 2008, Henri Sivonen wrote: >> >> Is it intentional that the vtab change didn't cause a change to vtab >> treatment when expanding NCRs? > > Yes; why would it cause a change? The character still passes > through, it's > just not treated as whitespace. (It's not like U+000D, which is > converted > to U+000A.) NCR expansion turns every other non-XML character into REPLACEMENT CHARACTER. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/ From ian at hixie.ch Tue Sep 2 00:25:17 2008 From: ian at hixie.ch (Ian Hickson) Date: Tue, 2 Sep 2008 07:25:17 +0000 (UTC) Subject: [whatwg] vtab as an NCR expansion In-Reply-To: <67F45187-560E-4B92-AB15-F2AC6CD03F8A@iki.fi> References: <451CA8D0-E5C8-49BC-9096-6AC188BD88AA@iki.fi> <67F45187-560E-4B92-AB15-F2AC6CD03F8A@iki.fi> Message-ID: On Tue, 2 Sep 2008, Henri Sivonen wrote: > On Sep 2, 2008, at 06:48, Ian Hickson wrote: > > On Wed, 18 Jun 2008, Henri Sivonen wrote: > > > > > > Is it intentional that the vtab change didn't cause a change to vtab > > > treatment when expanding NCRs? > > > > Yes; why would it cause a change? The character still passes through, > > it's just not treated as whitespace. (It's not like U+000D, which is > > converted to U+000A.) > > NCR expansion turns every other non-XML character into REPLACEMENT > CHARACTER. Hm ok, done. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' From maikmerten at googlemail.com Tue Sep 2 11:05:38 2008 From: maikmerten at googlemail.com (Maik Merten) Date: Tue, 02 Sep 2008 20:05:38 +0200 Subject: [whatwg] Query supported formats for media elements Message-ID: <48BD8072.8020403@googlemail.com> Hello, I'm trying to find out how to determine if a given media format is supported by a media-element implementation. The motivation is to replace e.g.