[whatwg] Comments on the definition of a valid e-mail address
Tab Atkins Jr.
jackalmage at gmail.com
Sun Aug 23 13:00:53 PDT 2009
Thanks for doing this work, Aryeh! It's really awesome!
On Sun, Aug 23, 2009 at 2:41 PM, Aryeh Gregor<Simetrical+w3c at gmail.com> wrote:
> Beyond that, although it's safe to say that quoted-string or
> domain-literal or even entirely invalid addresses are extraordinarily
> rare, there are *some* real people who do use them. Unless something
> is so completely invalid that it's obviously impossible that any mail
> server would even try to send it anywhere, you're probably going to be
> cutting out some small number of users.
Unless you avoid validating *entirely*, there's virtually always going
to be some subset of theoretically valid addresses that you'll flag as
> So why not have the spec say that in the case of e-mail addresses, the
> browser may warn the user, but should permit them to submit the
> address anyway? If the user is willing to override the warning, then
> it's likely that they personally know that the e-mail address works,
> e.g., because they use it.
I'd be okay with this.
> Alternatively, you could just loosen the restrictions even further,
> and only ban input that doesn't contain an @ sign. (Or that doesn't
> match ^[^@]+@[^@]+\.[^@]+$, or whatever.) Or just don't ban anything
> at all, like with type=tel. type=email differs from most of the other
> types with validity constraints (like month, number, etc.) in that the
> difference between valid and invalid values is a purely pragmatic
> question (what will actually work?) that the user can often answer
> better than the application. It doesn't seem like a good idea for the
> standard to tell users that the e-mail addresses they've actually been
> using are invalid.
Unlike type=tel, emails have a relatively simply format which *very
nearly everyone* uses. I agree that if an email works but is one of
those crazy formats it's probably not a good idea to bar them from
using it, but in practice that's exactly what happens right now with
email validation scripts. If type=email doesn't validate at all
people will still just continue to use their broken homebrew
validators both on client-side and server-side.
It's possible that a token validation step would be sufficient, but I
suspect not. Probably just a slight loosening of the allowed format,
informed by actual data such as what you gathered, would work fine,
possibly augmented by your suggestion of making type=email flag
'invalid' addresses but not actually prevent them from being
Would you mind sharing these 200 or so that don't validate? Obviously
there are privacy concerns, but I think it would be sufficient to just
replace every alpha character with 'x' and every numeric with '0', or
some similar information-removing transformation. None of them fail
validation because of the letters or numbers used, so that would still
give us the information we need without revealing stuff we don't.
More information about the whatwg