We've been using the same basic methodology since our first set of browser speed tests, but we've changed the tools, and a few of the specifics, along the way.
For a long time, we tested web browsers on one editor's main laptop: a ThinkPad T61p, with a 2.0 GHz Intel Centrino Duo processor and 2 GB of RAM. At first we were testing on Windows XP, but lately it's been a fresh Windows 8 installation. Since the tests were taken over by another editor, we've moved up to a more powerful desktop machine, with an i7 processor overclocked to 3.8 GHz and with 12GB of RAM (plus a wired internet connection, to eliminate the unreliability of Wi-Fi).
Start-Up and Page Load Times
For the time-based tests, we measure "cold" start-up (just after reboot, browser not having run yet). We also usually test a browser's ability to load multiple tabs at once—these days including Lifehacker, Bing, Hulu, Amazon, Wikipedia, Facebook, MSN, YouTube, and eBay. Each of these tests is measured from the perspective of human perception, starting a timer at the same time as the mouse is clicked for that specific task (i.e., launching the browser from the taskbar or opening all nine tabs at once from the bookmark bar). At least three tests are done for each measurement on each browser, and an average taken from those three. Any obvious outliers (as in, 2.8 seconds, 3.2 seconds, 7.9 seconds) are removed and replaced. I test from the moment I launch the browser to the point where the browser window shows up for cold starts, not when the page shows up—since many people will start using the browser (clicking bookmarks, typing in an address) before their home page even appears. For the tab loading test, I wait until every page has loaded and the wheel on each tab has stopped spinning.
JavaScript and CSS
JavaScript and CSS powers are an increasingly important metric for browsers, as webapps become more powerful and more of our work moves into the cloud. Early on, we relied on a JavaScript benchmark from the Celtic Kane site, and a downloadable CSS test. Since our earlier tests, we've moved to Mozilla's Dromaeo testing suite. Mozilla itself will admit that no test, theirs included, is perfect—browser makers can "write to the test," and today's measurement metric might not provide an accurate picture of what makes tomorrow's browser feel so snappy. Still, Dromaeo combines tests from the two leading JavaScript proponents, Google and Apple, and if Mozilla wrote it to artificially pump up Firefox's performance, their secret scheme has yet to show itself.
Memory Usage
To measure memory, we've recently adopted Google Chrome's about:memory
function. As more browsers adopt multi-process architecture for stability and security, Windows' measurement of the memory they actually use has become inaccurate. Chrome's tool reports on itself, but also every browser we've tested. As with Mozilla, if Google were writing the page to secretly degrade other browsers' memory use, it's not paying off, and it seems like a fairly straightforward reporting tool. When measuring Chrome's own memory usage, I make sure to subtract the memory used by the about:memory tab itself, to ensure it doesn't artificially increase Chrome's score. I also let each browser sit for awhile and wait for its memory usage to level out before measuring, since most of them will keep accruing more and more memory usage for the first few minutes they're open—especially with extensions installed and tabs open.
Extensions
Very few Lifehacker readers are likely using Firefox, Chrome, or Opera without an extension or two on board. To measure the impact of extensions on memory use, we've recently run our memory tests with five representative and (more or less) cross-browser extensions for Chrome, Firefox, and Opera:
Xmarks bookmark sync
LastPass password manager
Gmail-based mail checkers: Gmail checker for Firefox, Google Mail Checker for Chrome, and Gmail Checker on Opera
InvisibleHand bargain hunter
Obviously, these extensions aren't the same on Firefox, Chrome, and Opera, but they're about as similar as we could get, and extremely popular for the functions they provide.
Any questions on how we do our tests? Suggestions on ways we could do better? Drop them here in the comments.