CompileNix's Blog - Regex: Powerful But Dangerous

Start Page | RSS Feed | Find Stuff

The plural of "regex" is "production bugs"

Well, even though the statement is quite provocative, I think there's some truth to it.

Regex is powerful, but it also quickly becomes a footgun. Over- and under-matching happens to everyone who works with it, no matter how long or well they know Regex.

In my opinion, ad-hoc tasks are the most suitable use case. The more an expression becomes part of a larger program logic and probably also takes on more complex structures (e.g., validate an email address for me), the more potentially problematic its use becomes.

For more complex structures, there are other tools, such as a proper parser.

I still have on my list to learn how to build parsers, both out of interest and necessity.

Regex is fundamentally very simple and easy to learn; it just looks like a salad. However, I'm not sure who (as in target audience) should really master Regex.

In production code, Regex should actually almost never be used, as the risk of having overlooked something is often too high. And for things like form validation, there should (or maybe already are) openly standardized expressions (email, DNS name, etc.).

Using Regex to break down / validate addresses, names, or other less clearly definable structures is only obvious to those who have invested enough time and energy in learning Regex. But I think that few people do that.

In this regard, my opinion on Regex is similar to that of C or ASM: Very powerful, great that it exists, very much worth learning, high footgun potential, don't use it in critical prod environments unless you have good knowledge about it and reason to do so.