Validating an e-mail address

So this post will be short and sweet and filled with nice tech on regexes or regular expressions. Regex is a language on its own and it serves to live as a benefit to parse complicated strings and make it so it is easy for us developers to validate strings or to extract relevant information from it.

You have a problem. You use regex to solve it. Now you have two problems.

What is an e-mail address? Is there a specification on how it is supposed to be formed? Indeed there is. There are a couple of them but in general these matter:

RFCs are made by us engineers and are validated and refined by a group of individuals who know their stuff. So these officially state what a e-mail address is supposed to look like and in general you can state the following:

<some characters up to 64 in length>@<some characters up to 255 in length> and total is not allowed over 256 characters in length. Which means our regular expression does not need to be 3.7k. It simply needs to be the following: (.{1,64}@.{1,255}){1,256}

What does it mean though?

So to break it down step by step. 1. The . means any character except a newline character and newline is not allowed in the email addresses 2. The {1,64} means one to 64 characters 3. The @ means a literal @ character 4. The () captures the entire first segment and makes sure you do not overstep the total of 256 limit.

Now a domain can or can not contain multiple . characters so in this case it is fine. This will literally make all e-mails that are valid ok. Is there a caveat? Yes there is. The caveat is this is too loose. There are rules on when to use spacing and when to not use it and you cannot have an unclosed “ character for example. In general this regex will solve 99% of the cases out there though I feel.

#thoughts