[whatwg] Comments on the definition of a valid e-mail address

Sun Aug 23 13:00:53 PDT 2009

Thanks for doing this work, Aryeh!  It's really awesome!

On Sun, Aug 23, 2009 at 2:41 PM, Aryeh Gregor<Simetrical+w3c at gmail.com> wrote:
> Beyond that, although it's safe to say that quoted-string or
> domain-literal or even entirely invalid addresses are extraordinarily
> rare, there are *some* real people who do use them.  Unless something
> is so completely invalid that it's obviously impossible that any mail
> server would even try to send it anywhere, you're probably going to be
> cutting out some small number of users.

Unless you avoid validating *entirely*, there's virtually always going
to be some subset of theoretically valid addresses that you'll flag as
invalid, though.

> So why not have the spec say that in the case of e-mail addresses, the
> browser may warn the user, but should permit them to submit the
> address anyway?  If the user is willing to override the warning, then
> it's likely that they personally know that the e-mail address works,
> e.g., because they use it.

I'd be okay with this.

> Alternatively, you could just loosen the restrictions even further,
> and only ban input that doesn't contain an @ sign.  (Or that doesn't
> match ^[^@]+@[^@]+\.[^@]+$, or whatever.)  Or just don't ban anything
> at all, like with type=tel.  type=email differs from most of the other
> types with validity constraints (like month, number, etc.) in that the
> difference between valid and invalid values is a purely pragmatic
> question (what will actually work?) that the user can often answer
> better than the application.  It doesn't seem like a good idea for the
> standard to tell users that the e-mail addresses they've actually been
> using are invalid.

Unlike type=tel, emails have a relatively simply format which *very
nearly everyone* uses.  I agree that if an email works but is one of
those crazy formats it's probably not a good idea to bar them from
using it, but in practice that's exactly what happens right now with
email validation scripts.  If type=email doesn't validate at all
people will still just continue to use their broken homebrew
validators both on client-side and server-side.

It's possible that a token validation step would be sufficient, but I
suspect not.  Probably just a slight loosening of the allowed format,
informed by actual data such as what you gathered, would work fine,
possibly augmented by your suggestion of making type=email flag
'invalid' addresses but not actually prevent them from being
submitted.

Would you mind sharing these 200 or so that don't validate?  Obviously
there are privacy concerns, but I think it would be sufficient to just
replace every alpha character with 'x' and every numeric with '0', or
some similar information-removing transformation.  None of them fail
validation because of the letters or numbers used, so that would still
give us the information we need without revealing stuff we don't.

~TJ