Journal tags: signup

4

sparkline

Preventing automated sign-ups

The Session goes through periods of getting spammed with automated sign-ups. I’m not sure why. It’s not like they do anything with the accounts. They’re just created and then they sit there (until I delete them).

In the past I’ve dealt with them in an ad-hoc way. If the sign-ups were all coming from the same IP addresses, I could block them. If the sign-ups showed some pattern in the usernames or emails, I could use that to block them.

Recently though, there was a spate of sign-ups that didn’t have any patterns, all coming from different IP addresses.

I decided it was time to knuckle down and figure out a way to prevent automated sign-ups.

I knew what I didn’t want to do. I didn’t want to put any obstacles in the way of genuine sign-ups. There’d be no CAPTCHAs or other “prove you’re a human” shite. That’s the airport security model: inconvenience everyone to stop a tiny number of bad actors.

The first step I took was the bare minimum. I added two form fields—called “wheat” and “chaff”—that are randomly generated every time the sign-up form is loaded. There’s a connection between those two fields that I can check on the server.

Here’s how I’m generating the fields in PHP:

$saltstring = 'A string known only to me.';
$wheat = base64_encode(openssl_random_pseudo_bytes(16));
$chaff = password_hash($saltstring.$wheat, PASSWORD_BCRYPT);

See how the fields are generated from a combination of random bytes and a string of characters never revealed on the client? To keep it from goint stale, this string—the salt—includes something related to the current date.

Now when the form is submitted, I can check to see if the relationship holds true:

if (!password_verify($saltstring.$_POST['wheat'], $_POST['chaff'])) {
    // Spammer!
}

That’s just the first line of defence. After thinking about it for a while, I came to conclusion that it wasn’t enough to just generate some random form field values; I needed to generate random form field names.

Previously, the names for the form fields were easily-guessable: “username”, “password”, “email”. What I needed to do was generate unique form field names every time the sign-up page was loaded.

First of all, I create a one-time password:

$otp = base64_encode(openssl_random_pseudo_bytes(16));

Now I generate form field names by hashing that random value with known strings (“username”, “password”, “email”) together with a salt string known only to me.

$otp_hashed_for_username = md5($saltstring.'username'.$otp);
$otp_hashed_for_password = md5($saltstring.'password'.$otp);
$otp_hashed_for_email = md5($saltstring.'email'.$otp);

Those are all used for form field names on the client, like this:

<input type="text" name="<?php echo $otp_hashed_for_username; ?>">
<input type="password" name="<?php echo $otp_hashed_for_password; ?>">
<input type="email" name="<?php echo $otp_hashed_for_email; ?>">

(Remember, the name—or the ID—of the form field makes no difference to semantics or accessibility; the accessible name is derived from the associated label element.)

The one-time password also becomes a form field on the client:

<input type="hidden" name="otp" value="<?php echo $otp; ?>">

When the form is submitted, I use the value of that form field along with the salt string to recreate the field names:

$otp_hashed_for_username = md5($saltstring.'username'.$_POST['otp']);
$otp_hashed_for_password = md5($saltstring.'password'.$_POST['otp']);
$otp_hashed_for_email = md5($saltstring.'email'.$_POST['otp']);

If those form fields don’t exist, the sign-up is rejected.

As an added extra, I leave honeypot hidden forms named “username”, “password”, and “email”. If any of those fields are filled out, the sign-up is rejected.

I put that code live and the automated sign-ups stopped straight away.

It’s not entirely foolproof. It would be possible to create an automated sign-up system that grabs the names of the form fields from the sign-up form each time. But this puts enough friction in the way to make automated sign-ups a pain.

You can view source on the sign-up page to see what the form fields are like.

I used the same technique on the contact page to prevent automated spam there too.

Spamduffing

Running The Session and Huffduffer is immensely rewarding …most of the time. There are occasions when the actions of a few bad apples make it a real pain in the bum.

Yes, I’m talking about SEO spammers.

Huffduffer tends to get it worse than The Session, but even then it’s fairly manageable—just a sign-up or two here or there. This weekend though, there was a veritable spam tsunami. I was up late on Friday night playing a constant game of whack-a-mole with thousands of spam postings by newly-created accounts. (I’m afraid I inadvertently may have deleted some genuine new accounts in the trawl; if you signed up for Huffduffer last Friday and can’t access your account now, I’m really, really sorry.)

Normally these spam SEO accounts would have some pattern to them—either they’d be from the same block of IP addresses or they’d have similar emails. But these all looked different enough to thwart any quick fixes. I knew I’d be spending my Saturday writing some spam-blocking code.

Most “social” websites have a similar sign-up flow: you fill in a form with your details (including your email address), and then you have to go to your email client to click a link to verify that you are indeed who you claim to be. The cynical side of me thinks that this is mostly to verify that you providing a genuine email address so that the site can send you marketing crap.

Neither Huffduffer nor The Session includes that second step of confirming your email address. The only reason for providing your email address is so that you can reset your password if you ever forget it.

I’ve always felt that making a new user break out of the sign-up flow to go check their email was a bit shit. It also strikes me as following the same logic as CAPTCHAs (which I hate): “Because of the bad actions of a minority, we’re going to punish the majority by making them prove to us that they’re human.” It’s such a machine-centric way of thinking.

But after the splurge of spam on Huffduffer, I figured I’d have no choice but to introduce that extra step. Just as I was about to start coding, I thought to myself “No, this is wrong. There must be another way.”

I thought a bit more about the problem. The issue wasn’t so much about spam sign-ups per se. Like I said, there’s always been a steady trickle and it isn’t too onerous to find them and delete them. The problem was the sheer volume of spam posts in a short space of time.

I ended up writing some different code with this logic:

  1. When someone posts to Huffduffer, check to see if they’ve posted at least ten items in the past;
  2. If they have, grab the timestamps for the last ten posts;
  3. Calculate the cumulative elapsed time between those ten posts;
  4. If it’s less than 100 seconds (i.e. an average of one post every ten seconds), delete the user …and delete everything they’ve ever posted.

It worked. I watched as new spam sign-ups began to hammer the site with spam postings …only to self-destruct when they hit the critical mass of posts over time.

I’m still getting SEO spammers signing up but now they’re back to manageable levels. I’m glad that I didn’t end up having to punish genuine new users of Huffduffer for the actions of a few SEO marketing bottom-feeders.

Testing Huffduffer’s sign-up

Ever since I launched Huffduffer, one of the features that really caught people’s attention was the sign up form.

I have to admit, I didn’t really think it was that revolutionary an idea. All I was trying to do was make the sign-up process a little friendlier and if web standards have taught us anything, it’s that there’s nothing inherent in the presentation of any element, much less forms. So I made the form more conversational and less blocky and rigid.

Well, it turns out that people love it. I’ve received bucketloads of Twitter messages and emails from people telling me how much they enjoyed the sign-up process.

But amongst all the positive comments I saw about the sign-up form when Huffduffer launched, I saw some armchair UX practitioners wondering about the usability of this somewhat unorthodox approach to forms. Fair point. Without user testing, how can I really know if the mad-libs approach is really working?

Now, it happens that Luke W. likes the Huffduffer sign-up form, as evidenced by a recent chat he had with Jared.

SpoolCast: Moving Beyond Static Forms with Luke Wroblewski on Huffduffer

If anyone knows anything about the usability of web forms, it’s Luke. He literally wrote the book on it.

Not content with simply expressing a liking for the Huffduffer-style of human-friendly form presentation, he decided to put it to the test with Vast.com:

After seeing the Huffduffer form in action, I was curious how it would perform against a traditional form. Would people be more inclined to complete it because of the narrative format? Or would the unfamiliar presentation format confuse people? Thanks to Ron Kurti and the team at Vast.com, I now have some early answers.

Ron and his team ran some A/B testing online that compared a traditional Web form layout with a narrative “Mad Libs” format. In Vast.com’s testing, Mad Libs style forms increased conversion across the board by 25-40%.

That seems to be a statistically-significant result, even accounting for Cennydd’s reality-check on A/B testing.

It’ll be interesting to see if this is the start of a trend. If nothing else, it’s a way of getting designers to think about the presentation of common human-computer interactions, such as signing up to a new website.

Sign up and log in

It’s common practice for sign-up forms to include duplicate fields for either password or email, where the user has to type the same thing twice. I deliberately avoided this on the Huffduffer sign-up form. Not long after Huffduffer launched, I was asked about this ommision on Get Satisfaction and I defended my position there, citing the audience demographic.

I still think I made the right decision although, in retrospect, I’ve changed my position completely from when I said, I can see more value in a ‘confirm your password’ field than a ‘confirm your email address’ field. Thinking about it, getting a correct email address is more important. If a password is entered incorrectly, it can always be reset as long as the site can send a reset link to a valid email address. But if an email address is entered incorrectly, the site has no way of helping a user in difficulty.

Here’s an interesting scripted approach to avoiding duplicate email fields:

The last thing you see before you submit is your own email address.

Sign-up is something that user should only ever experience once on a site. But the log-in process can be one of the most familiar actions that a user performs. A common convention for log-in forms is a “remember me” checkbox. I have one of those on the Huffduffer log-in page, labelled with “remember me on this Turing machine” (hey, I thought it was cute).

Here’s a question from 37 Signals:

Has the time come to kill the “Remember me” check box and just assume that people using shared computers will simply logout?

There are a lot of arguments, both for and against, in the comments. It prompted me to think about this use case on Huffduffer and I’ve decided to keep the checkbox but I’ve now made it checked by default. I think that while there are very good reasons why somebody wouldn’t want a permanent cookie set on the machine they’re using (many of the use cases are mentioned in the comments to that 37 Signals post), the majority of people find it convenient.

It always pays to think about default states in UI. Good defaults are important:

Defaults are arguably the most important design decisions you’ll ever make as a software developer. Choose good defaults, and users will sing the praises of your software and how easy it is to use. Choose poor defaults, and you’ll face down user angst over configuration, and probably a host of tech support calls as well.