String.prototype.normalize for safer string comparison

Published at: Apr 17 2017
Updated at: Feb 07 2022
Reading time: 1min

This post is part of my Today I learned series in which I share all my web development learnings.

Today I discovered the String.prototype.normalize method. If you're dealing with user-generated content, it helps with making string comparisons more reliable.

Let's me show you a quick example:

// pick a random word with a German Umlaut
const word = 'über';       // displayed as 'über'
console.log(word.length);  // 4

const alikeWord = 'u\u0308ber';  // displayed as 'über'
console.log(alikeWord.length);   // 5

console.log(word === alikeWord); // false

As you see, strings that look identical can consist of different code points and units. alikeWord makes use of a Combining Diacritical Mark to generate the German Umlaut ü – specifically, it uses COMBINING DIAERESIS.

But here's the catch: the Umlaut ü also has its own Unicode codepoint. Here we have two ways to display the same glyph making a string comparison tricky.

To solve this issue you can use normalize to normalize strings.

const word = 'über';       // displayed as 'über'
console.log(word.length);  // 4

const alikeWord = 'u\u0308ber'.normalize(); // displayed as 'über'
console.log(alikeWord.length);              // 4

console.log(word === alikeWord); // true

If you enjoyed this article...

Join 6.1k readers and learn something new every week with Web Weekly.

Reply to this post and share your thoughts via good old email.

Stefan standing in the park in front of a green background

About Stefan Judis

Frontend nerd with over ten years of experience, freelance dev, "Today I Learned" blogger, conference speaker, and Open Source maintainer.

String.prototype.normalize for safer string comparison

About Stefan Judis

Related Topics

Related Articles