String.prototype.normalize for safer string comparison
Written by Stefan Judis
- Published at
- Updated at
- Reading time
- 1min
This post is part of my Today I learned series in which I share all my web development learnings.
Today I discovered the String
method. If you're dealing with user-generated content, it helps with making string comparisons more reliable.
Let's me show you a quick example:
// pick a random word with a German Umlaut
const word = 'über'; // displayed as 'über'
console.log(word.length); // 4
const alikeWord = 'u\u0308ber'; // displayed as 'über'
console.log(alikeWord.length); // 5
console.log(word === alikeWord); // false
As you see, strings that look identical can consist of different code points and units. alikeWord
makes use of a Combining Diacritical Mark to generate the German Umlaut ü
– specifically, it uses COMBINING DIAERESIS.
But here's the catch: the Umlaut ü
also has its own Unicode codepoint. Here we have two ways to display the same glyph making a string comparison tricky.
To solve this issue you can use normalize
to normalize strings.
const word = 'über'; // displayed as 'über'
console.log(word.length); // 4
const alikeWord = 'u\u0308ber'.normalize(); // displayed as 'über'
console.log(alikeWord.length); // 4
console.log(word === alikeWord); // true
If you enjoyed this article...
Join 5.5k readers and learn something new every week with Web Weekly.
Reply to this post and share your thoughts via good old email.