There is a Unicode mode in JavaScript regular expressions

Published at: Jul 23 2017
Updated at: Feb 09 2022
Reading time: 2min

This post is part of my Today I learned series in which I share all my web development learnings.

Unicode is such an interesting topic, and it feels like there are new things to discover every day. Today was one of these days. I was reading a blog post and came across the u flag. I haven't seen this regular expression flag, and I found myself reading Axel's chapter in "Exploring ES6" on that topic.

So what's this u flag?

In JavaScript, we've got the "problem" that strings are represented in UTF-16 which means that not every character can be represented with a single code unit. This behavior leads to weird length properties of certain strings, and it becomes tricky when you deal with surrogate pairs.

In short: surrogate pairs are two Unicode code units representing a single character.

If you want to learn more about Unicode or Regular Expressions in JavaScript, have a look at these two talks:

Should the period (.) in regular expressions (.) match a character that needs two code units then? This is where the u flag comes into play.

Let's have a look at an example:

const emoji = '\u{1F60A}'; // "smiling face with smiling eyes" / "😊"
emoji.length               // 2 -> it's a surrogate pair
/^.$/.test(emoji)          // false
/^.$/u.test(emoji)         // true

The unicode mode (//u) enables the use of code point escape sequences (\u{1F42A}) in regular expressions and they help when dealing with surrogate pairs.

const emoji = '\u{1F42A}';  // "🐪"
/\u{1F42A}/.test(emoji);    // false
/\uD83D\uDC2A/.test(camel); // true
/\u{1F42A}/u.test(emoji);   // true

Unicode mode helps deal with Unicode in Regular Expressions. Read Axel's book chapter or Mathias Bynens' article on the topic if you want to learn more. Have fun!

If you enjoyed this article...

Join 6k readers and learn something new every week with Web Weekly.

Reply to this post and share your thoughts via good old email.

Stefan standing in the park in front of a green background

About Stefan Judis

Frontend nerd with over ten years of experience, freelance dev, "Today I Learned" blogger, conference speaker, and Open Source maintainer.

There is a Unicode mode in JavaScript regular expressions

About Stefan Judis

Related Topics

Related Articles