[whatwg] A few hints on html5

Ian Hickson ian at hixie.ch
Tue Dec 23 23:38:41 PST 2008

On Tue, 16 Dec 2008, Calogero Alex Baldacchino wrote:
> About the cross-document messaging
> Let's consider the following scenario. A somewhat productivity suite (or 
> any sort of "web applications collection") is made up of a few different 
> top-level/auxiliary browsing contexts - let's call each one a "module" - 
> eventually from different origins, and exploits cross-document 
> communications to some extent, i.e. to delegate some computations or 
> some shareable communications with a remote server; each module is 
> independent and can instantiate the proper auxiliary module(s).
> Here we are: as far as the modules are instantiated as auxiliary 
> browsing contexts of one other module (i.e. through a call to 
> 'window.open()'), communications are easily established, but what if any 
> module is instantiated by the user as a separate top-level browsing 
> context, i.e. opening a new tab or window and recalling the module 
> document from a bookmark? I'd suggest the following:
> - a mechanism is established to get access, without any restriction, to 
> every browsing context for which the user agent can individuate a 
> non-empty, non-null, non-undefined name attribute, at least with the 
> capability to let "cross-origin" access to the postMessage() methods. 
> For instance, the specifications could clearly state that the Window 
> open() method must return an existing window reference with the 
> specified name when invoked with an empty string or null as URL 
> argument, with no security restriction (security restrictions should 
> apply just to the returned window object properties). When more than one 
> browsing context share the same name, actual "rules for choosing a 
> browsing context given a browsing context name" should apply to choose a 
> first result, without checking if current browsing context is allowed to 
> navigate that browsing context; it might be helpful to get instead a 
> list of all browsing contexts with the same name, obtained as follow: a 
> Window object is created as a pseudo unit of browsing contexts, so that 
> each browsing context is reachable both by invoking the XXX4() method 
> and by accessing the frames property; each browsing context is wrapped 
> in a Window object with 1)accessible postMessage() methods, calling the 
> wrapped window ones, 2)an accessible parent attribute referring to the 
> grouping Window object, 3)a self attribute referring to the wrapped 
> object, accessible if access to the wrapped object is allowed by 
> security restrictions, 4) access denied, without any exception/error 
> arising, to any other method/attribute; the first member of the group 
> (i.e. the object returned by calling XXX4(0) on the grouping Window) is 
> the wrapper for a Window object determined by the rules for choosing a 
> browsing context given a browsing context name (i.e. the most recently 
> opened, or focused, or the most related with the open() method caller 
> browsing context) and is returned.

That seems like an exceedingly high level of complexity to address a very 
odd corner case. I would recommend instead using shared workers to 
communicate between these windows -- after all, it is likely that such an 
application would need a shared worker anyway to handle things like 
synchronising shared databases with the server.

Anyway, going through a shared worker you could negotiate a MessagePort 
communication channel from the two end points.

Also, note that the spec doesn't disallow browsers from doing what you say 
with respect to giving all named frames access to all other named frames, 
but it does allow user agents to limit it. This is because we received 
requess from user agents asking for that to be allowed to be limited to 
handled a set of legacy applications that otherwise cause havoc in 
multiple-window environments.

> - optionally, a few "postMessageToAll()" methods (with about the same 
> arguments of the postMessage() ones) could be considered to let any 
> browsing context to communicate, through its own Window interface, 
> either to any other browsing context (eventually allowing communications 
> from current browsing context as source, see below), or to every 
> browsing contexts constrained by the same name (passed as, let's say, 
> first argument), or to every browsing contexts with the same domain 
> (specified, let's say, as the second argument).

That's an interesting idea, probably something to consider in a future 
version based on our experience with what is currently specified.

> Let's consider another scenario. A site (perhaps a blog) embeds content 
> from a forum (or any social network), and uses script code to connect to 
> the remote server and keep it's content up to date, but also to notify 
> the user about any changes in other contents the remote server holds as 
> subscribed (this scenario can be extended to mail notifications in the 
> previous example of a productivity suite, or to a groupware). When the 
> user navigates other documents from the site in different browsing 
> contexts, each one is aware of the others (perhaps establishing a 
> connection through a call to postMessageToAll, or by getting a reference 
> by name); to avoid increasing the number of connections per server, any 
> successive document navigated as a standalone browsing context (after 
> the first or after a certain number) won't connect to the remote server, 
> but will communicate with the document having an active remote 
> connection. That is: the first navigated document maintains a remote 
> connection and receives notifications as remote events; if it is fully 
> active, the notifications are shown to the user, otherwise a message is 
> sent to any other known document capable to handle the notification, 
> hoping one is fully active; the first document becoming fully active 
> handles the messages and notifies to the other documents that any 
> required operation has been performed; when the remote events handling 
> document(s) are to become no more active (i.e. they unload), a message 
> is sent to the remaining documents so they can decide (somehow) who's 
> the next "dispatcher".
> The above could be realized with a few eventsource elements in the documents,
> each one with a proper list of event sources and one or more MessageEvent
> listeners on the corresponding Window object, which could "manually" handle
> the switching operations (i.e. calling the appropriate element's
> addEventSource()/removeEventSource() and creating the appropriate event to
> dispatch to the eventsource listeners from the received messages); however,
> most of the work could be automated in the following manner:
> - let the MessageEvent instances hold one attribute for the remote event URL
> and one for the remote event type;
> - let's provide appropriate methods to set those attributes and to post 
> a message constrained this way (a pair of initMessageEvent and 
> postMessageEvent variant should be enough);
> - let the RemoteEventTarget list of event sources hold, for each source, 
> an attribute (to be set when adding a source and optionally referred to 
> on removal) identifying one of three state: remote-only, for exclusively 
> remote messaging; local-only, for cross-document messaging (mainly 
> thought for Window objects, optionally for other elements); both-sides 
> to handle a scenario like the above described one; a proper attribute 
> could be thought for eventsource elements, such to be coupled with the 
> src attribute;
> - let Window objects have a default action for message events, 
> inspecting the event for remote url and event type and, if found, 
> forwarding the event - with proper modification, or creating a new 
> appropriate event - to each RemoteEventTarget waiting for that remote 
> source, with a "both" state, present in the active document; optionally 
> a non remote cross-document message could be considered to be dispatched 
> to the active document RemoteEventTargets waiting for cross-document 
> messages (see below);
> - optionally, for security aim, it could be established that before 
> accepting a remote event dispatched as a cross-document event, a 
> connection is made to the remote source to get a session id as the first 
> streamed event data, which must match the cross-document message data 
> (the remote events originating server should be capable to identify the 
> user instantiating the communication through different 
> documents/browsing-contexts, i.e. by the mean of a log-in procedure, 
> session-ids, cookies, or a combination of those methods, and so generate 
> and send the same "session-communication-id");

That's definitely something best done using a shared worker.

> Listing an expected source of cross-document events as a remote source 
> (that is taking advantage of the RemoteEventTarget interface), specially 
> on Window objects, could be helpful to improve cross-document security, 
> since posting a message would fail not only if the recipient origin does 
> not match the message targetOrigin, but also if the message origin is 
> not registered as a valid source; anyway, the possibility to get 
> messages from any other Window should be preserved availing the 
> registration of the string "*" (this should be the default value of a 
> default source if no source is registered, with a state of "both"; when 
> the first source is registered the default source is removed; if the "*" 
> string is added to a list comprehending valid absolute URLs, the state 
> should discriminate the acceptance of a message - i.e. a simple 
> cross-document message is accepted from any source while a 
> cross-document message "enveloping" a remote event is accepted from just 
> one source, and dispatched if an eventsource is waiting from messages 
> from such source).

It's not clear to me what problem you are trying to solve here.

> Furthermore, the EventListener interface could be derived to give the 
> opportunity to list all sources a listener is able to handle, as tuples 
> consisting of: 1)the expected target origin (either the listener owner 
> document origin, or "*"), 2)the expected source origin (the event origin 
> attribute), 3) the expected remote source origin (for remote events 
> dispatched as cross-document messages), 4)the accepted state; all 4 
> components of the tuple could be optional (but not missing: null or the 
> empty string should be valid values), but at least one of the first 
> three should not be neither null nor the empty string, otherwise the 
> whole tuple should be ignored (and discarded from the list); if the 
> state component is not expressed, it should default to the value of 
> "both" (this being either a string or a numerical value, accessible as a 
> constant: to be defined); if no tuple is listed, a default tuple is 
> created with the components: ("*","*","*",<both_value>); the tuple 
> ("","","http://example.server.src",<remote_only_value>) should accept 
> events just from the indicated remote server, while the tuple 
> ("","","http://example.server.src",<both_value>) should be treated as if 
> the first two components were equals to "*", meaning any remote event 
> originating at "http://example.server.src" can be accepted as a 
> cross-document re-dispatched one, and the tuple 
> ("","","http://example.server.src",<local_only_value>) should be illegal 
> (as if the 3rd component where "" or null); if the state component is 
> "remote-only" the first two components, if expressed, should be ignored 
> (or the second be legal only if matching the third, since we are dealing 
> with remote events accepted just from the remote server), if it is 
> "local-only" the 3rd component should be ignored (and deemed as 
> mistaken, but without entering an error state, since the user agent 
> should never dispatch cross-document re-dispatched remote events to a 
> listener waiting for "local-only" cross-document messages, despite the 
> remote component of the tuple – that is, the third one – otherwise 
> that should be deemed as illegal), while if it is "both" the 3rd 
> component, if not expressed should be defaulted to the string "*".

I'm sorry, that sentence is too long for me to understand what you are 

> The tuples list on a message listener should simplify the URLs checking 
> inside the listener (which could be thought to make the same operations 
> consistently in different contexts, so no checking could be needed at 
> all), and thus improving the overall security. In such an environment a 
> cross-document message is delivered if and only if: a) the target origin 
> matches the document domain or is the "*" string, and b) the message 
> origin is listed as remote source for the Window object (or is expressly 
> allowed by the "*" string), and c) a listener is actually waiting a 
> message of that type from that source and with the corresponding 
> connection state (either "local-only" or "both"); a remote (streamed) 
> event is delivered if: a) the event source is registered to a 
> RemoteEventTarget, and b) a listener on such RemoteEventTarget is 
> actually waiting for an event of that type from the same source. This 
> should also enable a precise listener selection.

Again, I'm not sure I really understand what problem you are trying to 
solve with this proposal, so it is hard for me to evaluate its merits.

> Allowing a RemoteEventTarget inside a document to list a source as 
> "local-only" and receive messages from other documents instead of remote 
> servers (through out a default action defined for Window objects 
> receiving messages, which could not handle it directly) could be an 
> alternative (and to some extent a redundant) way to allow cross-document 
> messaging at a Document level instead of a browsing context level, maybe 
> suitable for some scenarios (or maybe just desirable, as far as a 
> certain grade of redundancy in an API might be desirable). A 
> "originForResponse" and a "sourceForResponse" attributes could be 
> considered, for the MessageEvent interface, in order to allow a certain 
> capability of syndication and communications distribution and switching 
> among collaborating documents (i.e. the listener checks for those 
> attributes and answers in a manner such as: 
> 'event.sourceForResponce.postMessage(message_str, 
> event.originForResponse)' – the postMessage methods should accept two 
> arguments to set those attributes). If a post method were provided to 
> allow communications from a single source to any existing browsing 
> context (i.e. "postMessageToAll()"), the "targetOrigin" argument should 
> be absent and it should be clearly stated in the specs that such a 
> method must call the proper post method on each existing browsing 
> context, passing the proper URL as the targetOrigin argument.

With shared workers and the MessageChannel infrastructure, it isn't clear 
that there is a real need for this extension.

On Tue, 16 Dec 2008, Calogero Alex Baldacchino wrote:
> The Window interface open method accepts a "features" argument for 
> historical (and backward compatibility) reasons, which, as stated, has 
> no actual effect. I was considering the opportunity, instead, of 
> maintaining the old functionality as an alternative and redundant 
> implementation of the "make application state". That could work this 
> way: any browser feature set disabled in the features string is disabled 
> and not shown in the newly opened window, BUT, a somewhat element, 
> clearly being part of the browser application, is provided to let the 
> user enable any hidden feature (either altogether, or one by one), so to 
> reset the "normal" application condition; when a browser interface 
> component is hidden, any related key binding is "freed" from usual 
> capture, and redirected to the window active document, so that a "full 
> standalone" behaviour is transparently shown to the user (the "reset 
> element" should never be disabled), while when that component is 
> re-enabled its normal behaviour is re-established; if the application is 
> going full-screen the user is clearly advised about this and allowed to 
> block the operation (in the case the operation is allowed, the "reset 
> element" should become floating and maybe half-transparent -- I was 
> thinking on a possible, future 2D or even 3D web based game...).

This is all quite possible, but does not require anything of the spec, as 
it is purely a user interface issue and is not required for 

> Current draft provides a few overloaded methods (like postMessage() 
> variants) differing for the number, type and order of their attributes. 
> A first concern could arise on the choice to overload functions in IDL 
> interfaces, since any of the possible supported/supportable script 
> language could not provide such a feature, making implementation more 
> difficult--

JavaScript is our primarily target, so concerns over other languages are 
somewhat more academic at this point.

> --; however, this could be a minor concern, both since a script with 
> C-like syntax (as most are) usually let functions be overloaded, one way 
> or another, and because a different kind of language, not providing 
> such, could overcome the problem by defining methods with slightly 
> different names and binding them to the appropriate interface (but this 
> would lead maybe to a longer learning period and to possible, successive 
> even greater difficulties whether such names would clash with future 
> standard names). Maybe the parameters order and number could be another 
> concern, since a script language could (like JavaScript does) allow 
> functions overloading by varying the number of passed arguments, without 
> caring about arguments types, and leaving to the inner code any checking 
> and choice of what to do (that's closer to a C++ function declaration 
> with default arguments, than to a "full" overload); this is not a real 
> problem, but perhaps a little improvement in current specs might result 
> from changing the arguments order so that the arguments list of an 
> overloaded method's two variant, when compared, is equal for the first 
> 'x' arguments, where 'x' is the length of the shortest list, since this 
> could reduce the translation work the script engine must do before 
> calling the underlying implementation (i.e., it could be a slightly 
> easier casting of the arguments to their correspondent native types, 
> without any previous checking for the right type, before calling the 
> interface native implementation - the point is: a check is likely to be 
> done by the casting routine(s), so couldn't it be avoid before 
> casting?). Furthermore, any language missing the overload semantics 
> could expose just one method with the whole list of possible arguments, 
> corresponding to the idl declared method with the longer list, and I 
> think that defining idl methods with some care for arguments order would 
> be a neater choice.

If you have any specific methods in mind here that would be good to know. 
Many of the overloads are cases somewhat beyond our control now.

> Current browsers provides facilities to parse xml code (either the 
> DOMParser object or a DOM Load and Save Parser). All fail with html "tag 
> soup", so if for any reason a somewhat string of html code must be 
> parsed to manipulate its DOM representation before taking any action, a 
> workaround must be found (i.e. calling 
> document.implementation.createHTMLDocument() and somehow inserting the 
> string into such fake document, then getting the DOM structure - this 
> could be quite unreliable too, as a parsing alternative, if any script 
> code in that string were executed). Since one of the goal of html 5 
> specifications is the definition of a standard parser, with a standard 
> parse error management, maybe the opportunity of exposing an 
> html-specific parser (skipping script execution) through the DOM might 
> be considered.

Does Document.innerHTML do this acceptably?

> Current draft states a script element set through the innerHTML property 
> is not executed at all, while it is when added by calling 
> document.write() (what about insertAdjacentHTML()?).

The spec disables it for anything inserted via the fragment algorithm (so 
all of innerHTML, outerHTML, and insertAdjacentHTML).

> However, I think that allowing script execution in the former case would 
> made of the innerHTML property a truly live one, with some possible 
> benefit: i.e. it could be a way to insert new script elements into the 
> document head section from outside the head element (i.e. from an event 
> listener on an eventsource, to dynamically change a web application 
> behaviour by appending new markup to the head.innerHTML string).

We can't change this, we're constrained by legacy content.

> The HTMLDocument interface presents several variants for the open 
> method, with very different "meaning" and purpose. Sincerely, I don't 
> think it's a very nice idea to expose functions with the very same name 
> but performing so much different operations on the same interface.

I would be the first to agree with you if we had a choice, but 
unfortunately that's just the way browsers work, and pages depend on it.

> If those methods were thought about as need for backward compatibility 
> purpose, maybe they cold be moved to a third interface (called, i.e., 
> HTMLBwdCompliantDocument), as well as any other property thought for 
> backward/cross-browsing compliance and/or being deprecated, stating any 
> object implementing the former interface must also implement the latter. 
> Maybe the same could be done with other interfaces, to maintain a full 
> compatibility with HTML 2 DOM (perhaps in this case the "secondary" 
> interface implementation could be not mandatory). Such process could be 
> suitable to deprecate any method/attribute/interface before conclusively 
> obsoleting it, in future specifications.

We'll never be able to remove them from implementations (and thus 
specifications), unfortunately. Today's markup will always be relevant, 
even if only to archeologists millenia hence.

> Let me come back to the non-JS scripts question. Let's assume that a 
> script engine exists for a somewhat script language "SL", is compatible 
> with the browser plug-in architecture and supports a technology such as 
> liveconnect to gain access to any DOM interface and give back 
> informations about the actual script context. Such engine could be 
> embedded into the document as an object descendant of the head element, 
> and a proper meta tag could bind the "SL" mime-type to that object: this 
> would be specially suitable for event handler content attributes, while 
> a script element could hold a proper set of attributes to recall a 
> specific engine (i.e. some attributes corresponding to a classid, a 
> codebase and a bypass mode, the latter specifying whether the plugged-in 
> script engine must be preferred to a native one, or not). Some special 
> restriction could be applied to such a script engine, such as running 
> separate processes for any independent script context, asking the user 
> for permission when a plug-in is required for scripting, requiring the 
> engine neither attempts to directly access the network (this would be 
> exclusive duty of the networking task source), nor to gain access to any 
> other running process or system library but what allowed for 
> communicating with the user agent or for proper execution, and 
> establishing a testing and certification mechanism (eventually optional) 
> to verify the fulfilment of such requirements (this might work very fine 
> if a standard plug-in architecture were defined and universally 
> adopted). So doing we'd have defined a pluggable script engine 
> architecture, which could be the base for a future cross language script 
> interaction architecture (providing the script contexts isolation is not 
> violated), or a part of a future, more complex and complete, COM/CORBA 
> (or the alike) based architecture.

I don't really understand what you are proposing here.

> It's been reported that people are asking for non string messaging, but 
> a few constraints should be considered. First, no access is granted to 
> the network physical layer, so the API should take it as a black box and 
> make the most conservative choices, in order to keep communications as 
> reliable as possible: this leads to a need for a string serialization of 
> structured data, which could be done either at the DOM level or by the 
> networking task source. Furthermore, the message might be handled by a 
> piece of code written in a language other than the one generating it, so 
> a DOM level data serialization might be a good solution for both a 
> client-server and a cross-document messaging (thus the actual string 
> "nature" of a message content could be preserved), and consequently a 
> whole object serialization should be avoided for anything but DOM 
> elements, unless it is thought HTML 5 DOM must define a complete set of 
> interfaces for data structures which are neither document, nor browsing 
> strictly related (I don't feel to agree with such an idea, because that 
> could mean to put hands over a range of things which are in the scope of 
> a script language grammar and semantics, more than in the scope of a 
> DOM). This means, i.e., programmer should not assume an ECMAScript Array 
> object would carry on its prototype full range of properties and methods 
> (this should not happen at all, according to me).

This is now defined for window.postMessage and MessagePort.postMessage. Is 
the definition ok? If so, we'll probably use the same mechanism for 
workers. For WebSocket I expect we'll just use JSON, which is similar 
though not identical. (Maybe by then JSON will have been fixed to actually 
support things like NaN and Infinity, and will have well-defined parsing 
rules. We can hope.) However, we won't be adding this to WebSocket for 
some time, since we need implementation experience first.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list