Adactio: Tags—metrics

Invisible success – Eric Bailey

Big, flashy things get noticed. Quiet, boring things don’t.

There isn’t much infrastructure in place to quantify the constant, silent, boring, predictable, round-the-clock passive successes of this aspect of design systems after the initial effort is complete.

A lack of bug reports, accessibility issues, design tweaks, etc. are all objectively great, but there are no easy datapoints you can measure here.

Fidinpamp

If you’re a fan of gratuitous initialisms, you’ll love Google’s core web vitals. Just get a load of the obfuscation in the important-sounding metrics like CLS, FCP, LCP, and more.

To be fair to Google, this is a problem in the web performance world in general. Practioners prefer to talk about TTFB rather than “time to first byte” even though both contain exactly the same number of syllables.

The big news in the web performance community this month is the arrival of a new initialism. INP sounds like one of those pseudo-scientific psychologic profiles but it’s meant to stand for Interaction to Next Paint (even if they were to swear off pointless initialisms, you’d still have to pry Pointless Capitalisation from Google’s cold dead hands).

This new metric is a welcome one. It’s replacing first input delay. Sorry, First Input Delay, or FID, one of the few web vital initialisms that can be spoken as a word, making it a true acronym (fortunately fid’s successor, inp, also works as an acronym).

First Input Delay has long outstayed its welcome. It was always an outlier in the core web vitals. It didn’t seem to measure anything actually useful. I know it sounds like it’s measuring the delay until the user can interact with a web page, but when you dive into what it actually does, it’s a mess:

FID measures the time from when a user first interacts with a page (that is, when they click a link, tap on a button, or use a custom, JavaScript-powered control) to the time when the browser is actually able to begin processing event handlers in response to that interaction.

See that word “begin” in there? It’s doing a lot of work. First Input Delay doesn’t measure the lag between the user interaction and the browser response; it only measures the lag between the user interaction and the browser beginning to respond. The actual response could take ages, but that lag doesn’t get measured. Unlike the other core web vitals, this metric is very far removed from what actually matters to the user’s experience.

What the fid where they thinking? How the fid did this measurement ever get included in core web vitals in the first place?

Well, feel free to take what I’m about to say as pure gossip, but I have my sources, I trust ’em, and no, I’m not going to reveal ’em…

It’s because of AMP.

Remember Google AMP? An acronym so pointless they eventually just forgot it ever stood for anything?

The AMP project ended up doing incredible damage to Google’s developer relations. By colluding with the search team to privilege the appearance of AMP pages in the top news carousel, Google effectively blackmailed the entire publishing industry into using their format.

In the end, it didn’t work. It was a shit format. All they did was foster resentment and animosity:

AMP seems to have faded away. Most publishers have started dropping support, and even Google doesn’t seem to care much anymore.

It turns out that Google search wasn’t the only team infected by AMP. The core web vitals team also had to play ball.

Originally they had a genuinely useful metric for measuring the lag between input and response. But guess which pages did terribly? That’s right: AMP pages.

Rather than ship an actually-useful measurement, the core web vitals team instead had to include the broken First Input Delay, brainchild of a certain someone on the AMP team.

Now it all makes sense.

So good riddance to FID. Welcome to INP. And here’s hoping it won’t be much longer till we’re finally burying AMP.

PageSpeed Insights bookmarklet

I’m a little obsessed with web performance. I like being able to check a page’s core web vitals quickly and easily.

Four years ago, I made a Lighthouse bookmarklet. Whatever web page you were on, when you clicked on the bookmarklet you’d get the Lighthouse results for that page. Handy!

It doesn’t work anymore. This is probably because Google are in the loop. Four years is pretty good innings for anything involving that company.

I kid (mostly). Lighthouse itself is still going strong, despite being a Google product. But the bookmarklet needs updating.

Rather than just get Lighthouse results, I figured that the full PageSpeed Insights results would be even better. If your website is in the Chrome UX report, you get to see those CrUX details too.

So here’s the updated bookmarklet:

PageSpeed Insights

Drag that up to your desktop browser’s bookmarks toolbar. Press it whenever you want to test the page you’re on.

The Illusion Of Developer “Productivity” Opens The Door To Snake Oil – Codemanship’s Blog

There are a lot of astute observations in here.

Adoption. — Ethan Marcotte

Ethan highlights a classic case of the McNamara Fallacy—measuring adoption of design system components.

How normal am I?

A fascinating interactive journey through biometrics using your face.

Can you count on what you measure? | Clearleft

One of my favourite episodes of the Clearleft podcast is on measuring design. This post from Chris is a complements that episode in a sensible and practical style.

What gets measured gets done. You are what you measure. Measurement eliminates argument. If you work in an environment that puts store in these oft-quoted business adages then I urge you to take a moment to challenge your calculations. Let’s review our metrics to ensure they can stand up and be counted.

My Challenge to the Web Performance Community — Philip Walton

I’ve noticed a trend in recent years—a trend that I’ve admittedly been part of myself—where performance-minded developers will rebuild a site and then post a screenshot of their Lighthouse score on social media to show off how fast it is.

Mea culpa! I should post my CrUX reports too.

But I’m going to respectfully decline Phil’s advice to use any of the RUM analytics providers he recommends that require me to put another script element on my site. One third-party script is one third-party script too many.

Doing the right thing for the wrong reasons

I remember trying to convince people to use semantic markup because it’s good for accessibility. That tactic didn’t always work. When it didn’t, I would add “By the way, Google’s searchbot is indistinguishable from a screen-reader user so semantic markup is good for SEO.”

That usually worked. It always felt unsatisfying though. I don’t know why. It doesn’t matter if people do the right thing for the wrong reasons. The end result is what matters. But still. It never felt great.

It happened with responsive design and progressive enhancement too. If I couldn’t convince people based on user experience benefits, I’d pull up some official pronouncement from Google recommending those techniques.

Even AMP, a dangerously ill-conceived project, has one very handy ace in the hole. You can’t add third-party JavaScript cruft to AMP pages. That’s useful:

Beleaguered developers working for publishers of big bloated web pages have a hard time arguing with their boss when they’re told to add another crappy JavaScript tracking script or bloated library to their pages. But when they’re making AMP pages, they can easily refuse, pointing out that the AMP rules don’t allow it. Google plays the bad cop for us, and it’s a very valuable role.

AMP is currently dying, which is good news. Google have announced that core web vitals will be used to boost ranking instead of requiring you to publish in their proprietary AMP format. The really good news is that the political advantage that came with AMP has also been ported over to core web vitals.

Take user-hostile obtrusive overlays. Perhaps, as a contientious developer, you’ve been arguing for years that they should be removed from the site you work on because they’re so bad for the user experience. Perhaps you have been met with the same indifference that I used to get regarding semantic markup.

Well, now you can point out how those annoying overlays are affecting, for example, the cumulative layout shift for the site. And that number is directly related to SEO. It’s one thing for a department to over-ride UX concerns, but I bet they’d think twice about jeopardising the site’s ranking with Google.

I know it doesn’t feel great. It’s like dealing with a bully by getting an even bigger bully to threaten them. Still. Needs must.

Weighing up UX

You can listen to an audio version of Weighing up UX.

This is the month of UX Fest 2021—this year’s online version of UX London. The festival continues with masterclasses every Tuesday in June and a festival day of talks every Thursday (tickets for both are still available). But it all kicked off with the conference part last week: three back-to-back days of talks.

I have the great pleasure of hosting the event so not only do I get to see a whole lot of great talks, I also get to quiz the speakers afterwards.

Right from day one, a theme emerged that continued throughout the conference and I suspect will continue for the rest of the festival too. That topic was metrics. Kind of.

See, metrics come up when we’re talking about A/B testing, growth design, and all of the practices that help designers get their seat at the table (to use the well-worn cliché). But while metrics are very useful for measuring design’s benefit to the business, they’re not really cut out for measuring user experience.

People have tried to quantify user experience benefits using measurements like NetPromoter Score, which is about as useful as reading tea leaves or chicken entrails.

So we tend to equate user experience gains with business gains. That makes sense. Happy users should be good for business. That’s a reasonable hypothesis. But it gets tricky when you need to make the case for improving the user experience if you can’t tie it directly to some business metric. That’s when we run into the McNamara fallacy:

Making a decision based solely on quantitative observations (or metrics) and ignoring all others.

The way out of this quantitative blind spot is to use qualitative research. But another theme of UX Fest was just how woefully under-represented researchers are in most organisations. And even when you’ve gone and talked to users and you’ve got their stories, you still need to play that back in a way that makes sense to the business folks. These are stories. They don’t lend themselves to being converted into charts’n’graphs.

And so we tend to fall back on more traditional metrics, based on that assumption that what’s good for user experience is good for business. But it’s a short step from making that equivalency to flipping the equation: what’s good for the business must, by definition, be good user experience. That’s where things get dicey.

Broadly speaking, the talks at UX Fest could be put into two categories. You’ve got talks covering practical subjects like product design, content design, research, growth design, and so on. Then you’ve got the higher-level, almost philosophical talks looking at the big picture and questioning the industry’s direction of travel.

The tension between these two categories was the highlight of the conference for me. It worked particularly well when there were back-to-back talks (and joint Q&A) featuring a hands-on case study that successfully pushed the needle on business metrics followed by a more cautionary talk asking whether our priorities are out of whack.

For example, there was a case study on growth design, which emphasised the importance of A/B testing for validation, immediately followed by a talk on deceptive dark patterns. Now, I suspect that if you were to A/B test a deceptive dark pattern, the test would validate its use (at least in the short term). It’s no coincidence that a company like Booking.com, which lives by the A/B sword, is also one of the companies sued for using distressing design patterns.

Using A/B tests alone is like using a loaded weapon without supervision. They only tell you what people do. And again, the solution is to make sure you’re also doing qualitative research—that’s how you find out why people are doing what they do.

But as I’ve pondered the lessons from last week’s conference, I’ve come to realise that there’s also a danger of focusing purely on the user experience. Hear me out…

At one point, the question came up as to whether deceptive dark patterns were ever justified. What if it’s for a good cause? What if the deceptive dark pattern is being used by an organisation actively campaigning to do good in the world?

In my mind, there was no question. A deceptive dark pattern is wrong, no matter who’s doing it.

(There’s also the problem of organisations that think they’re doing good in the world: I’m sure that every talented engineer that worked on Google AMP honestly believed they were acting in the best interests of the open web even as they worked to destroy it.)

Where it gets interesting is when you flip the question around.

Suppose you’re a designer working at an organisation that is decidedly not a force for good in the world. Say you’re working at Facebook, a company that prioritises data-gathering and engagement so much that they’ll tolerate insurrectionists and even genocidal movements. Now let’s say there’s talk in your department of implementing a deceptive dark pattern that will drive user engagement. But you, being a good designer who fights for the user, take a stand against this and you successfully find a way to ensure that Facebook doesn’t deploy that deceptive dark pattern.

Yay?

Does that count as being a good user experience designer? Yes, you’ve done good work at the coalface. But the overall business goal is like a deceptive dark pattern that’s so big you can’t take it in. Is it even possible to do “good” design when you’re inside the belly of that beast?

Facebook is a relatively straightforward case. Anyone who’s still working at Facebook can’t claim ignorance. They know full well where that company’s priorities lie. No doubt they sleep at night by convincing themselves they can accomplish more from the inside than without. But what about companies that exist in the grey area of being imperfect? Frankly, what about any company that relies on surveillance capitalism for its success? Is it still possible to do “good” design there?

There are no easy answers and that’s why it so often comes down to individual choice. I know many designers who wouldn’t work at certain companies …but they also wouldn’t judge anyone else who chooses to work at those companies.

At Clearleft, every staff member has two levels of veto on client work. You can say “I’m not comfortable working on this”, in which case, the work may still happen but we’ll make sure the resourcing works out so you don’t have anything to do with that project. Or you can say “I’m not comfortable with Clearleft working on this”, in which case the work won’t go ahead (this usually happens before we even get to the pitching stage although there have been one or two examples over the years where we’ve pulled out of the running for certain projects).

Going back to the question of whether it’s ever okay to use a deceptive dark pattern, here’s what I think…

It makes no difference whether it’s implemented by ProPublica or Breitbart; using a deceptive dark pattern is wrong.

But there is a world of difference in being a designer who works at ProPublica and being a designer who works at Breitbart.

That’s what I’m getting at when I say there’s a danger to focusing purely on user experience. That focus can be used as a way of avoiding responsibility for the larger business goals. Then designers are like the soldiers on the eve of battle in Henry V:

For we know enough, if we know we are the kings subjects: if his cause be wrong, our obedience to the king wipes the crime of it out of us.

Numbers

Core web vitals from Google are the ingredients for an alphabet soup of exlusionary intialisms. But once you get past the unnecessary jargon, there’s a sensible approach underpinning the measurements.

From May—no, June—these measurements will be a ranking signal for Google search so performance will become more of an SEO issue. This is good news. This is what Google should’ve done years ago instead of pissing up the wall with their dreadful and damaging AMP project that blackmailed publishers into using a proprietary format in exchange for preferential search treatment. It was all done supposedly in the name of performance, but in reality all it did was antagonise users and publishers alike.

Core web vitals are an attempt to put numbers on user experience. This is always a tricky balancing act. You’ve got to watch out for the McNamara fallacy. Harry has already started noticing this:

A new and unusual phenomenon: clients reluctant (even refusing) to fix performance issues unless they directly improve Vitals.

Once you put a measurement on something, there’s a danger of focusing too much on the measurement. Chris is worried that we’re going to see tips’n’tricks for gaming core web vitals:

This feels like the start of a weird new era of web performance where the metrics of web performance have shifted to user-centric measurements, but people are implementing tricky strategies to game those numbers with methods that, if anything, slightly harm user experience.

The map is not the territory. The numbers are a proxy for user experience, but it’s notoriously difficult to measure intangible ideas like pain and frustration. As Laurie says:

This is 100% the downside of automatic tools that give you a “score”. It’s like gameification. It’s about hitting that perfect score instead of the holistic experience.

And Ethan has written about the power imbalance that exists when Google holds all the cards, whether it’s AMP or core web vitals:

Google used its dominant position in the marketplace to force widespread adoption of a largely proprietary technology for creating websites. By switching to Core Web Vitals, those power dynamics haven’t materially changed.

We would do well to remember:

When you measure, include the measurer.

But if we’re going to put numbers to user experience, the core web vitals are a pretty good spread of measurements: largest contentful paint, cumulative layout shift, and first input delay.

(If you prefer using initialisms, remember that CFP is Certified Financial Planner, CLS is Community Legal Services, and FID is Flame Ionization Detector. Together they form CWV, Catholic War Veterans.)

Global performance insights for your site | Lighthouse Metrics

I hadn’t come across this before—run Lighthouse tests on your pages from six different locations around the world at once.

Cloudflare’s privacy-first Web Analytics is now available for everyone

Cloudfare’s alternative to Google Analytics is now available—for free—regardless of whether your a Cloudflare customer or not:

Being privacy-first means we don’t track individual users for the purposes of serving analytics. We don’t use any client-side state (like cookies or localStorage) for analytics purposes. Cloudflare also doesn’t track users over time via their IP address, User Agent string, or any other immutable attributes for the purposes of displaying analytics — we consider “fingerprinting” even more intrusive than cookies, because users have no way to opt out.

The Core Web Vitals hype train

Goodhart’s Law applied to Google’s core web vitals:

If developers start to focus solely on Core Web Vitals because it is important for SEO, then some folks will undoubtedly try to game the system.

Personally, my beef with core web vitals is that they introduce even more uneccessary initialisms (see, for example, Harry’s recent post where he uses CWV metrics like LCP, FID, and CLS—alongside TTFB and SI—to look at PLPs, PDPs, and SRPs. I mean, WTF?).

We need more inclusive web performance metrics | Filament Group, Inc.

Good point. When we talk about perceived performance, the perception in question is almost always visual. We should think more inclusively than that.

Designing for Progressive Disclosure by Steven Hoober

Progressive disclosure interface patterns categorised and evaluated:

popups,
drawers,
mouseover popups (just say no!),
accordions,
tabs,
new pages,
scrolling,
scrolling sideways.

I really like the hypertext history invoked in this article.

The piece finishes with a great note on the MacNamara fallacy:

Everyone thinks metrics let us measure results. But, actually, they don’t. They measure only what they are measuring. Engagement, for example, is not something that can be measured, so we use an analogue for it. Time on page. Or clicks.

We often end up measuring what is quick, cheap, and easy to measure. Therefore, few organizations regularly conduct usability testing or customer-satisfaction surveys, but lots use analytics.

Even today, organizations often use clicks as a measure of engagement. So, all too often, they design user interfaces to generate clicks, so the system can measure them.

Web bloat

Pages are often designed so that they’re hard or impossible to read if some dependency fails to load. On a slow connection, it’s quite common for at least one depedency to fail.

Fire up Reader Mode and read this excellent article informed by data from using a typically slow connection in rural USA today. Two findings are:

A large fraction of the web is unusable on a bad connection. Even on a good (0% packetloss, no ping spike) dialup connection, some sites won’t load.

Some sites will use a lot of data!

CrUX.RUN

This is so useful! Get instant results from Google’s Chrome User Experience Report without having to wait (or pay) for BigQuery.

Here’s an example of my site’s metrics over the last few months, complete with nice charts.

Page Speed Benchmarks | SpeedCurve

This is going to be so useful for client work at Clearleft—instant snapshots of performance metrics across industries and regions!

For example, we’ve been working a lot with the travel sector, and now we can call up these benchmarks without having to generate a whole bunch of Web Page Test results ourselves.

See Tammy’s blog post for me details.

Performance Budgets, Pragmatically – CSS Wizardry

Smart advice from Harry on setting performance budgets:

They shouldn’t be aspirational, they should be preventative … my suggestion for setting a budget for any trackable metric is to take the worst data point in the past two weeks and use that as your limit

Thursday, April 18th, 2024

Tuesday, March 26th, 2024

Thursday, February 22nd, 2024

Saturday, September 30th, 2023

Monday, March 20th, 2023

Wednesday, July 13th, 2022

Tuesday, April 12th, 2022

Thursday, October 7th, 2021

Thursday, June 10th, 2021

Monday, June 7th, 2021

Tuesday, April 20th, 2021

Monday, December 14th, 2020

Monday, November 16th, 2020

Tuesday, July 7th, 2020

Friday, May 8th, 2020

Monday, February 24th, 2020

Saturday, February 15th, 2020

Friday, February 14th, 2020

Friday, January 10th, 2020