lookaheads (and lookbehinds) in JavaScript regular expressions
- Published at
- Updated at
- Reading time
- 4min
Regular expressions (regex) are tough. It always takes me a few minutes until I understand what a particular regular expression does. But nevertheless, there's no question about their usefulness.
Today, I had my Sunday morning coffee and worked myself through the slide deck "What's new in ES2018" by Benedikt Meurer and Mathias Bynens.
There is so much useful information in these slides. Besides new language features such as async iterations, object spread properties and named capture groups in regular expressions they cover regular expression lookaheads (and the upcoming lookbehinds).
Occasionally, regular expression lookaheads cross my way, but I never had to use them, but as their counterpart lookbehinds are going to be in the language, too, I decided to read some documentation and finally learn what these regex lookaheads and lookbehind are.
Since publishing this post, lookahead and lookbehind assertions made it into all the major browser engines! π Browser support information is included in this post.
You can define patterns that only match when they're followed or not followed by another pattern with lookaheads.
The MDN article about regular expressions describes two different types of lookaheads in regular expressions.
Positive and negative lookaheads:
x(?=y)
β positive lookahead (matches 'x' when it's followed by 'y')x(?!y)
β negative lookahead (matches 'x' when it's not followed by 'y')
Oh well... x(?=y)
β that's a tricky syntax. What confused me initially is that I usually use ()
for captured or non-capturing groups in JavaScript expressions.
Let's look at an example of a captured group:
const regex = /\w+\s(\w+)\s\w+/;
regex.exec('eins zwei drei');
// ['eins zwei drei', 'zwei']
// /\
// ||
// captured group
// defined with
// (\w+)
The regular expression above captures a word (zwei
in this case) that is surrounded by spaces and another word on both ends.
Let's look at a typical example that you'll find when you read about lookaheads in JavaScript regular expressions.
// use positive regex lookahead
const regex = /Max(?= Mustermann)/;
regex.exec('Max Mustermann');
// ['Max']
regex.exec('Max MΓΌller');
// null
This example matches Max
whenever it is followed by a space and Mustermann
, otherwise it's not matching and returns null
. The interesting part is that it only matches Max
and not the pattern defined in the lookahead ((?= Mustermann)
).
This exclusion can seem weird after working with regular expressions but when you think of it, that's the difference of lookaheads and groups. Using lookaheads, you can test strings against patterns without including them in the resulting match.
1 | 1 | 12 | 1 | 1 | 1 | 1 | 1.5 | 1 |
The "Max Mustermann" example is not very useful, though, let's dive into positive and negative lookaheads with a real-world use case.
Positive regular expression lookaheads in JavaScript
Let's assume you have a long string of Markdown that includes a list of people and their food preferences. How would you figure out which people are vegan when everything's just a long string?
const people = `
- Bob (vegetarian)
- Billa (vegan)
- Francis
- Elli (vegetarian)
- Fred (vegan)
`;
// use positive regex lookahead
const regex = /-\s(\w+?)\s(?=\(vegan\))/g;
// |----| |-----------|
// / \
// more than one \
// word character positive lookahead
// but as few as => followed by "(vegan)"
// possible
let result = regex.exec(people);
while(result) {
console.log(result[1]);
result = regex.exec(people);
}
// Result:
// Billa
// Fred
Let's have a quick look at the regular expression and try to phrase it in plain language.
const regex = /-\s(\w+?)\s(?=\(vegan\))/g;
Match any dash followed by one space followed by one or more but as few as possible word characters (A-Za-z0-9_) followed by a space when everything is followed by the pattern "(vegan)".
Negative/negating regex lookaheads in JavaScript
On the other hand, how would you figure out who is not vegan?
const people = `
- Bob (vegetarian)
- Billa (vegan)
- Francis
- Elli (vegetarian)
- Fred (vegan)
`;
// use negative regex lookahead
const regex = /-\s(\w+)\s(?!\(vegan\))/g;
// |---| |-----------|
// / \
// more than one \
// word character negative lookahead
// but as few as => not followed by "(vegan)"
// possible
let result = regex.exec(people);
while(result) {
console.log(result[1]);
result = regex.exec(people);
}
// Result:
// Bob
// Francis
// Elli
Let's have a quick look at the regular expression and try to phrase it in words, too.
const regex = /-\s(\w+)\s(?!\(vegan\))/g;
Match any dash followed by one space character followed by more one or more but as few as possible word characters (A-Za-z0-9_) followed by a space character (which includes line breaks) when everything is not followed by the pattern "(vegan)".
Lookbehinds work the same way but for leading patterns. Lookaheads consider the patterns after the matching part whereas lookbehinds consider the patterns before.
62 | 62 | 79 | 78 | 78 | 16.4 | 16.4 | 8.0 | 62 |
When we flip the example strings around and adjust the regular expression to use lookbehinds, everything still works.
const people = `
- (vegetarian) Bob
- (vegan) Billa
- Francis
- (vegetarian) Elli
- (vegan) Fred
`;
// use positive regex lookbehind
const regex = /(?<=\(vegan\))\s(\w+)/g;
// |------------| |---|
// / \__
// positive lookbehind \
// => following "(vegan)" more than one
// word character
// but as few as possible
let result = regex.exec(people);
while(result) {
console.log(result[1]);
result = regex.exec(people);
}
// Result:
// Billa
// Fred
Side note: I usually recommend RegExr for the fiddling with regular expressions, but lookbehinds are not supported yet.
Additional resources
If you're interested in more cutting edge features, have a look at Mathias' and Benedikt's slides on new features coming to JavaScript there is way more exciting stuff to come.
To remember the syntax for lookahead and lookbehinds I created a quick cheat sheet about it.
Join 5.5k readers and learn something new every week with Web Weekly.