[whatwg] Comments on the definition of a valid e-mail address
Tab Atkins Jr.
jackalmage at gmail.com
Sun Aug 23 13:00:53 PDT 2009
Thanks for doing this work, Aryeh! It's really awesome!
On Sun, Aug 23, 2009 at 2:41 PM, Aryeh Gregor<Simetrical+w3c at gmail.com> wrote:
> Beyond that, although it's safe to say that quoted-string or
> domain-literal or even entirely invalid addresses are extraordinarily
> rare, there are *some* real people who do use them. Unless something
> is so completely invalid that it's obviously impossible that any mail
> server would even try to send it anywhere, you're probably going to be
> cutting out some small number of users.
Unless you avoid validating *entirely*, there's virtually always going
to be some subset of theoretically valid addresses that you'll flag as
invalid, though.
> So why not have the spec say that in the case of e-mail addresses, the
> browser may warn the user, but should permit them to submit the
> address anyway? If the user is willing to override the warning, then
> it's likely that they personally know that the e-mail address works,
> e.g., because they use it.
I'd be okay with this.
> Alternatively, you could just loosen the restrictions even further,
> and only ban input that doesn't contain an @ sign. (Or that doesn't
> match ^[^@]+@[^@]+\.[^@]+$, or whatever.) Or just don't ban anything
> at all, like with type=tel. type=email differs from most of the other
> types with validity constraints (like month, number, etc.) in that the
> difference between valid and invalid values is a purely pragmatic
> question (what will actually work?) that the user can often answer
> better than the application. It doesn't seem like a good idea for the
> standard to tell users that the e-mail addresses they've actually been
> using are invalid.
Unlike type=tel, emails have a relatively simply format which *very
nearly everyone* uses. I agree that if an email works but is one of
those crazy formats it's probably not a good idea to bar them from
using it, but in practice that's exactly what happens right now with
email validation scripts. If type=email doesn't validate at all
people will still just continue to use their broken homebrew
validators both on client-side and server-side.
It's possible that a token validation step would be sufficient, but I
suspect not. Probably just a slight loosening of the allowed format,
informed by actual data such as what you gathered, would work fine,
possibly augmented by your suggestion of making type=email flag
'invalid' addresses but not actually prevent them from being
submitted.
Would you mind sharing these 200 or so that don't validate? Obviously
there are privacy concerns, but I think it would be sufficient to just
replace every alpha character with 'x' and every numeric with '0', or
some similar information-removing transformation. None of them fail
validation because of the letters or numbers used, so that would still
give us the information we need without revealing stuff we don't.
~TJ
More information about the whatwg
mailing list