Journal tags: spam

4

sparkline

Preventing automated sign-ups

The Session goes through periods of getting spammed with automated sign-ups. I’m not sure why. It’s not like they do anything with the accounts. They’re just created and then they sit there (until I delete them).

In the past I’ve dealt with them in an ad-hoc way. If the sign-ups were all coming from the same IP addresses, I could block them. If the sign-ups showed some pattern in the usernames or emails, I could use that to block them.

Recently though, there was a spate of sign-ups that didn’t have any patterns, all coming from different IP addresses.

I decided it was time to knuckle down and figure out a way to prevent automated sign-ups.

I knew what I didn’t want to do. I didn’t want to put any obstacles in the way of genuine sign-ups. There’d be no CAPTCHAs or other “prove you’re a human” shite. That’s the airport security model: inconvenience everyone to stop a tiny number of bad actors.

The first step I took was the bare minimum. I added two form fields—called “wheat” and “chaff”—that are randomly generated every time the sign-up form is loaded. There’s a connection between those two fields that I can check on the server.

Here’s how I’m generating the fields in PHP:

$saltstring = 'A string known only to me.';
$wheat = base64_encode(openssl_random_pseudo_bytes(16));
$chaff = password_hash($saltstring.$wheat, PASSWORD_BCRYPT);

See how the fields are generated from a combination of random bytes and a string of characters never revealed on the client? To keep it from goint stale, this string—the salt—includes something related to the current date.

Now when the form is submitted, I can check to see if the relationship holds true:

if (!password_verify($saltstring.$_POST['wheat'], $_POST['chaff'])) {
    // Spammer!
}

That’s just the first line of defence. After thinking about it for a while, I came to conclusion that it wasn’t enough to just generate some random form field values; I needed to generate random form field names.

Previously, the names for the form fields were easily-guessable: “username”, “password”, “email”. What I needed to do was generate unique form field names every time the sign-up page was loaded.

First of all, I create a one-time password:

$otp = base64_encode(openssl_random_pseudo_bytes(16));

Now I generate form field names by hashing that random value with known strings (“username”, “password”, “email”) together with a salt string known only to me.

$otp_hashed_for_username = md5($saltstring.'username'.$otp);
$otp_hashed_for_password = md5($saltstring.'password'.$otp);
$otp_hashed_for_email = md5($saltstring.'email'.$otp);

Those are all used for form field names on the client, like this:

<input type="text" name="<?php echo $otp_hashed_for_username; ?>">
<input type="password" name="<?php echo $otp_hashed_for_password; ?>">
<input type="email" name="<?php echo $otp_hashed_for_email; ?>">

(Remember, the name—or the ID—of the form field makes no difference to semantics or accessibility; the accessible name is derived from the associated label element.)

The one-time password also becomes a form field on the client:

<input type="hidden" name="otp" value="<?php echo $otp; ?>">

When the form is submitted, I use the value of that form field along with the salt string to recreate the field names:

$otp_hashed_for_username = md5($saltstring.'username'.$_POST['otp']);
$otp_hashed_for_password = md5($saltstring.'password'.$_POST['otp']);
$otp_hashed_for_email = md5($saltstring.'email'.$_POST['otp']);

If those form fields don’t exist, the sign-up is rejected.

As an added extra, I leave honeypot hidden forms named “username”, “password”, and “email”. If any of those fields are filled out, the sign-up is rejected.

I put that code live and the automated sign-ups stopped straight away.

It’s not entirely foolproof. It would be possible to create an automated sign-up system that grabs the names of the form fields from the sign-up form each time. But this puts enough friction in the way to make automated sign-ups a pain.

You can view source on the sign-up page to see what the form fields are like.

I used the same technique on the contact page to prevent automated spam there too.

Spamduffing

Running The Session and Huffduffer is immensely rewarding …most of the time. There are occasions when the actions of a few bad apples make it a real pain in the bum.

Yes, I’m talking about SEO spammers.

Huffduffer tends to get it worse than The Session, but even then it’s fairly manageable—just a sign-up or two here or there. This weekend though, there was a veritable spam tsunami. I was up late on Friday night playing a constant game of whack-a-mole with thousands of spam postings by newly-created accounts. (I’m afraid I inadvertently may have deleted some genuine new accounts in the trawl; if you signed up for Huffduffer last Friday and can’t access your account now, I’m really, really sorry.)

Normally these spam SEO accounts would have some pattern to them—either they’d be from the same block of IP addresses or they’d have similar emails. But these all looked different enough to thwart any quick fixes. I knew I’d be spending my Saturday writing some spam-blocking code.

Most “social” websites have a similar sign-up flow: you fill in a form with your details (including your email address), and then you have to go to your email client to click a link to verify that you are indeed who you claim to be. The cynical side of me thinks that this is mostly to verify that you providing a genuine email address so that the site can send you marketing crap.

Neither Huffduffer nor The Session includes that second step of confirming your email address. The only reason for providing your email address is so that you can reset your password if you ever forget it.

I’ve always felt that making a new user break out of the sign-up flow to go check their email was a bit shit. It also strikes me as following the same logic as CAPTCHAs (which I hate): “Because of the bad actions of a minority, we’re going to punish the majority by making them prove to us that they’re human.” It’s such a machine-centric way of thinking.

But after the splurge of spam on Huffduffer, I figured I’d have no choice but to introduce that extra step. Just as I was about to start coding, I thought to myself “No, this is wrong. There must be another way.”

I thought a bit more about the problem. The issue wasn’t so much about spam sign-ups per se. Like I said, there’s always been a steady trickle and it isn’t too onerous to find them and delete them. The problem was the sheer volume of spam posts in a short space of time.

I ended up writing some different code with this logic:

  1. When someone posts to Huffduffer, check to see if they’ve posted at least ten items in the past;
  2. If they have, grab the timestamps for the last ten posts;
  3. Calculate the cumulative elapsed time between those ten posts;
  4. If it’s less than 100 seconds (i.e. an average of one post every ten seconds), delete the user …and delete everything they’ve ever posted.

It worked. I watched as new spam sign-ups began to hammer the site with spam postings …only to self-destruct when they hit the critical mass of posts over time.

I’m still getting SEO spammers signing up but now they’re back to manageable levels. I’m glad that I didn’t end up having to punish genuine new users of Huffduffer for the actions of a few SEO marketing bottom-feeders.

Spam of the Gods

Stephen Hawking has been quoted recently urging caution about the prospect of first contact with an extra-terrestrial civilisation:

We only have to look at ourselves to see how intelligent life might develop into something we wouldn’t want to meet.

This isn’t the first time that such reservations have been raised.

Both of the Voyager spacecraft are carrying ; snapshots and time capsules of our planet’s culture—a project with such a long timeline that it makes the clock of the Long Now look like a disposable gadget in comparison. As well as carrying instructions on how to decode the record—ingeniously using the fundamental transition of a hydrogen atom as the base unit of time—the records also have a map inscribed upon them. This is the same illustration that was included with .

The map consists of fourteen lines converging on a central point. The length and angle of each line corresponds to the position of a pulsar relative to Earth. Those fourteen beacons point to one position in the galaxy: our home planet.

The responsibility for deciding the contents of the golden record fell to Carl Sagan. I highly recommend listening to this account by Sagan’s widow Ann Druyan of how the golden record may just contain the encoded patterns of love itself:

Carl Sagan And Ann Druyan’s Ultimate Mix Tape on Huffduffer

Many people at the time were upset that the pulsar map was included on the Voyager record, for the same reasons that Hawking is giving today: we are effectively hanging a sign around our neck that reads free food here.

I was talking about this with Tantek at South by Southwest this year and he had to admit that, with his Schneier-esque security hat on, those people have a point. What you really want to do, he said, is point to a drop-off box instead: a nearby uninhabited star-system that we can monitor from Earth. That way, if we ascertain that the alien civilisation is friendly, we can go and greet them but if they are hostile, we can simply lay low.

In fact, in Sagan’s book Contact—where the shoe is on the other foot and we are the alien civilisation responding to a message—this is exactly what happens. The origin point we are given is the Vega system, which turns out not to be the home of any alien civilisation but merely a way station: a routing point in the galactic network.

There may well be a galactic RFC for , which the Pioneer and Voyager probes have flagrantly disregarded. What is an alien civilisation to make of a message that effectively states:

Dear Friend,
Although you may be apprehensive as we have not met before, I come to you with great hope. I am a probe from an abundant planet that has recently acquired spacefaring technology. Please contact me at your earliest convenience so that we may transfer knowledge.
I await your response,
Third planet from an insignificant star

It’s clearly a designed to lure in the gullible of the galaxy.

Carl Sagan, my hero, looks like nothing more than a galactic .

Blame

If I’m on the tube listening to my iPod—because, y’know, that’s exactly the kind of situation for which the iPod was invented—and somebody steals said iPod, which is illegal, is that my fault?

If I publish my email address online—because, y’know, I actually want people to be able to get in touch with me quickly and conveniently—and it gets harvested by scum-sucking spammers who send unsolicted commercial email, which is illegal, is that my fault?

If I utter my date of birth or my mother’s maiden name—because, y’know, I don’t believe that information should be a state secret—and somebody uses that information to “steal my identity”, which is illegal, is that my fault?

If you answered yes to any of the above, I would like to remind you of something said at last year’s South by Southwest:

If I’ve learned anything from hanging out with the Eastern European dissident crowd, it’s make no decision out of fear.