Published at
Updated at
Reading time
10min

Once a year the folks behind HTTP Archive release the Web Almanac.

The Web Almanac is a comprehensive report on the state of the web, backed by real data and trusted web experts. The 2021 edition is comprised of 24 chapters spanning aspects of page content, user experience, publishing, and distribution.

It's a very interesting (and also massive!) read written by very smart people. I highly recommend checking it out yourself. I learned quite a bit with the 2021 edition.

Below you find my Web Almanac highlight summary from a Twitter mega-thread. ๐Ÿ™ˆ

Every section below was a single Tweet. If you want to share or comment on something, start your way from the top.

The median webpage loads 70kb CSS. The "top scoring site" loaded over 60mb of CSS. ๐Ÿ˜ฒ๐Ÿ˜† 60!

FFq2 GUXoAIFFKNNot every page was so constrained: the page with the greatest CSS weight loaded 64,628 KB. The biggest mobile CSS weight seems positively svelte in comparison: only 17,823 KB.

Graph showing the CSS transfer size with 66-71kb median value.

Haha, I love that! There's also a new high score for the number of loaded external stylesheets: 2368!

2368 โ€“ The largest number of external stylesheets loaded by a page.

I didn't realize that Font Awesome is still so popular. Over 30% of the scanned sites include fa- classes in stylesheets.

Most popular classnames with `fa-` at 32%.

Oh wow, the majority of length declarations use px. I barely use pixels for anything.

70% of font-size declarations are in pixels. ๐Ÿ˜ฒ

70% of length units are using px.

Table showing CSS properties and their used values. 69% of font declarations are in px.

Hex is the primary way to define colors with rgba on the third place.

Side note: if you use rgba you can drop the a these days. ๐Ÿ™ˆ

Graph showing that #rrggbb and #rgb are the most popular color definition values. Followed by rgba() on a this place with 14%.

The site with the most loaded images loaded 300mb+ worth of images via CSS.

I just love these stats! ๐Ÿ’™

314,386 โ€“ The heaviest total weight of images loaded via CSS, in KB.

Huh, substantially more sites use prefers-reduced-motion than prefers-color-scheme.

I guess reduced motion is included in modern CSS libs? And dark mode is not as common as I thought then? ๐Ÿค”

7% of sites use `prefers-color-scheme`

Haha, what's CSS Ruby? ๐Ÿค”

In addition to directionality and logical features, CSS also offers internationalization support via CSS Ruby, a collection of properties used to affect the layout of interlinear annotation, which are short runs of text alongside the base text. Its usage is vanishingly small: only 8,157 desktop pages and 9,119 mobile pages were found to be using itโ€”less than 0.1% of all pages analyzed.

Here we go; only 3% of pages use CSS-in-JS.

While the topic of โ€œCSS in JSโ€ is good for at least a Twitter flame war or two, its use in the wild continues to be very small. This year, we found that about 3% of pages are using some form of CSS-in-JS, up from 2% in 2020. Furthermore, nearly all of it comes from libraries built for the purpose, and more than half of that usage is from the Styled Components library.

14% of pages ship webkit-transition instead of -webkit-transition. :D

Most popular unknown CSS properties: 14% โ€“ webkit-transition, 14% โ€“ font-smoothing, 9% โ€“ tab-highlight-color

Nothing to see here โ€“ just the normal statistic showing that we all ship a lot of JS. ๐Ÿ™ˆ It's 420kb+ per page at the 50th percentile.

๐Ÿ‘‡ this is transferred bytes, so the amount of JS is way higher after decompressing...

Loaded JS bytes distribution 427kb at the 50th percentile.

Only 4% ship type="module" for scripts. :/

Fun fact: Angular dropped IE11 support last month.

4% of pages load scripts via `type="module"`.

Wait what? There's a SourceMap HTTP header?

0.1% of mobile pages use the SourceMap header.

Ready for the "boring is beautiful" and "the web is not as cutting edge as it always seems" screenshot?

Here we go. ๐Ÿ‘‡

84% use jQuery and 8% are built with React.

Adoption of the top libraries โ€“ jQuery 84% and React at 8%

~20% of sites don't define the lang attribute. :/

If you don't define it, let's fix that. ;)

Out of the pages scanned, 19.6% on desktop, and 18.6% on mobile, specified no lang attribute, even though the Web Content Accessibility Guidelines (WCAG) requires that a page language is defined and โ€œprogrammatically accessibleโ€. Languages can be specified in different ways, including an xml:lang element, which we didnโ€™t check for, so there might still be hope for some of the pages scanned.

Haha, I haven't seen this friend for a loooooong time. ๐Ÿ™ˆ

Oldschool HTML comment for IE8

The median count for different HTML elements is 31! ๐ŸŽ‰ That's way higher than I expected.

Element distribution: 10th percentile: 21, 50th percentile: 31, 90th percentile: 42

Huh, 10% of pages ship <base>? ๐Ÿ˜ฒ

From looking at the desktop pages, base is a popular element, with 10.4% of pages having one. But do they have only one? There are 5,908 more base elements than pages, so we can only conclude at least some pages have more than one base element. Who said developers were great at following directions? We would also recommend people validate their HTML using the W3C-provided Markup Validation Service.

Fun fact: 0% use <button type="text"> :D ๐Ÿ˜†

distribution of button types with 0% of type="text" (that's very funny)

Big numbers first: almost all sites out there include 3rd party resources. This is not surprising because it includes tracking, libs from a CDN, video players and all these things.

94% of mobile sites use at least one third-party resource.

I would have expected "tracking" to be way higher. 3rd party resources are mainly used for ads, "unknown" and library serving from CDNs.

Third-party requests by type: 25% ads, 19% unknown, 15% cdn, 11% social.

That's a big one. Google's everywhere basically...

Google takes 8 of the top 15 most-used third partiesโ€”including the top 6 spots!โ€”and no else comes close. Google is a market leader in Analytics, Fonts, Ads, Accounts, Tag Managers, and Video to name but a few. A staggering 62.7% of mobile websites use Google Analytics, and almost as many use Google Fonts, with Ads, Accounts and Tag Manager usage not far behind in the 42%-49% range.

Top 3 3rd parties: Google Analytics, Google Fonts, Google Ads

If YouTube resources are embedded, they lead to a median blocking time of 1.6 seconds. That's quite something. ๐Ÿ˜ฒ

YouTube's impact on the main thread: 50 percentile: 1.6s blocking time 90 percentile: 4.5 blocking time.

TIL: there's a timing-allow-origin header. ๐Ÿ˜ฒ It enables the Resource Timing API for 3rd party requests.

Last year we looked at the prevalence of the timing-allow-origin header, which allows the Resource Timing API to be used on third-party requests. Without this HTTP header, the information available to on-page performance monitoring tools for third-party requests is restricted for security and privacy reasons. However, for static requests, third parties that allow this header enable greater transparency into the loading performance of their resources.

A robots.txt is no requirement (if it's not there, all pages are free to index), and 16% of sites don't ship a robots.txt (my site doesn't ๐Ÿ™ˆ).

robots.txt status codes: 200 โ€“ 81%, 404 โ€“ 16%

Huh, there's a file limit for robots.txt files. ๐Ÿ™ˆ

Most robots.txt files are fairly small, weighing between 0-100 kb. However, we did find over 3,000 domains that have a robots.txt file size over 500 KiB which is beyond Googleโ€™s max limit. Rules after this size limit will be ignored.

Only 65% of pages include an h1. ๐Ÿ˜ฒ

Distribution and usage of `h` elements: h1 65%, h2 71%, h3 61%

On the topic of not accessible links. "click here!" is not a great link and 16% of pages include these. :/

16% of pages use non-descriptive link texts.

We're not getting better at shipping sites with less detectable contrast issues. :/

Only 22% of pages ship without detectable contrast issues over the last three years.

Speaking of HTML fundamentals: 33% of input fields have no accessible name provided by e.g. a label.

What's surprising is that placeholder is on the list? It doesn't make an input field more accessible, or?

Where inputs get their accessible names from: no accessible name 33%, relatedElement: label 27%, placeholder: 25%

Accessible video/audio is pretty much not existing. :/

0.02% of desktop websites with an 'audio' element have at least one accompanying 'track' element

0.5% of desktop websites with a "video" element have at least one accompanying "track" element

Almost 30% of pages include a role="button" somewhere. ๐Ÿ˜ฒ This is super high!

It would be so great if there would be an HTML element for that, right?

#justUseAButton ๐Ÿ™ˆ

Top 10 most common ARIA roles, button 29%, navigation 22%, presentation 21%, dialog 20%, search 18%

Here's the major player in all web almanac statistics. Wordpress powers 33% of all scanned sites. That's huge!

It's probably affecting the jquery usage, TTFB and web vitals metrics, everything...

Top 5 CMS by rank: Wordpress: 33%, Joomla: 1.9%, Drupal: 1.8%, Wix: 1.6%, Squarespace: 1%

Oh boy, 641 million emails, 428 million passwords and 149 million phone numbers were involved in data breaches in 2021 . ๐Ÿ˜ฒ

Top 10 breached data classes: Emails 641 million, Passwords 428 million, Names 369 million, Locations 173 million, Phone numbers 149 million

22% of sites ship with HSTS (HTTP Strict Transport Security)? Wow! HSTS tells browsers always to use HTTPS.

22.2% percent of requests have HSTS header on mobile.

More sites use CSP (Content Security Policy)! ๐Ÿ‘ With all this 3rd party code running in the sites, this is important to not mine crypto because an npm package was hacked. ๐Ÿ™ˆ

We see more and more websites starting to use CSP with 9.3% of websites on mobile using CSP now compared to 7.2% last year. upgrade-insecure-requests continues to be the most frequent CSP used. The high adoption rate for this policy is likely because of the same reasons mentioned last year; it is an easy, low-risk, policy that helps in upgrading all HTTP requests to HTTPS and also helps with to block mixed content being used on the page. frame-ancestors is a close second, which helps one define valid parents that may embed a page.

And that's the end of the Twitter thread. ๐Ÿ™ˆ If you like this information, I send a weekly newsletter, too!

If you enjoyed this article...

Join 5.5k readers and learn something new every week with Web Weekly.

Web Weekly โ€” Your friendly Web Dev newsletter
Reply to this post and share your thoughts via good old email.
Stefan standing in the park in front of a green background

About Stefan Judis

Frontend nerd with over ten years of experience, freelance dev, "Today I Learned" blogger, conference speaker, and Open Source maintainer.

Related Topics

Related Articles