5 Ways to Protect Web Forms from Spam - Without a CAPTCHA

If you’ve got a web contact form you’ve probably received your share of spam through it. Maybe it’s protected with a CAPTCHA to protect this. Unfortunately CAPTCHA can sometimes be hard to complete and put users off.

Here’s five things you can try to increase user engagement and reduce the spam being sent through a web form, without having to use a CAPTCHA.

How does contact form spam work, and how do CAPTCHAs prevent it?

Automated crawlers (bots) will trawl the web looking for contact forms. These are pretty easy to identify, just look for a form that has name and email fields, and probably also a message field. After finding a form, they put some data in all these fields and submit it.

CAPTCHAs work because they are hard for bots to fill out but “easy” for humans. Although as you’re sure to have experienced, they’re sometimes hard for humans too. Google’s reCAPTCHA is the one you’ll most likely recognise.

reCAPTCHA
reCAPTCHA in action. Is the top middle a statue or McDonalds’ Grimace in a tutu?

Their difficulty can lead to (potential) customers not bothering to fill in the form because the CAPTCHA is too annoying.

CAPTCHA stands for completely automated public Turing test to tell computers and humans apart.

Luckily there are a few behaviours that bots exhibit that can let you identify them automatically. We’ll look at them on a conceptual level. There will be some code examples but they should be easy to adapt to your language of choice.

Bots are fast

Bots are designed to be quick. They can find, complete and submit a form in just a few seconds – much faster than a human normally world. You can take advantage of this fact and reject input that comes too quickly.

You can use a two-pronged approach to implement this, with protection on the both the frontend and backend.

On the backend, record the time that the page with the contact form was loaded. When the form is submitted, check that it has been more than ten seconds (or some other interval you choose) since the page was loaded. You can store this in the session or a signed cookie. For example:

# record the page load time in seconds from the epoch
request.session['page_load_time'] = int(time.time())

Then, in the code that accepts the form data, check that more than ten seconds have elapsed before allowing the form processing to proceed.

# get the previous page load time for the user.
# Default to current time if not set
page_load_time = request.session.get('page_load_time', time.time())

# check if more than ten seconds have elapsed
data_valid = time.time() - page_load_time >= 10

if data_valid:
    send_email()

You should return a response that indicates success, so that the bot doesn’t know that the submission has failed.

You can add similar checks on the frontend. Record the current time on page load, then check the elapsed time on submit.

On page load:

let pageLoadTime

window.onload = function() {
    pageLoadTime = Date.now()
}

And on form submit:

form.addEventListener('submit', event => {
    // check if less than ten seconds have elapsed since the page loaded
    if (Date.now() - pageLoadTime < 10000) {
        // if so, prevent the form from being submitted
        event.preventDefault()
        return false
    }

    return true
});

The efficacy of the frontend technique can be tenuous. Many bots don’t run Javascript so this code may not even be executed. We’ll look at how to exploit the bot’s lack of Javascript later.

Of course this technique has its drawbacks. If your users are fast typists they may be able to read the page and fill in the form in under ten seconds – consider adjusting the time threshold to account for this. Or if the bots are slow they won’t be trapped by this. I have seen good results on forms by just implementing this simple technique though.

Bots have excellent vision

Humans are limited in that they normally can only interact with things the can see on screen. Bots, on the other hand, can see and even fill in forms that are off the screen. This is known as a honeypot. We can use CSS to place a form field way off to the side of the screen. Bots don’t know it’s invisible and will mindlessly put some data in. On the backend, if we see data in this field, we can reject the form.

<div style="position: absolute; left: -10000px;" aria-hidden="true">
    <input type="text" name="comments" tabindex="-1" value="">
</div>

Here we use the name comments for the text field, so it looks like it could be valid. We use aria-hidden so that screen readers will ignore the div and its content, and setting tabindex to -1 means that the field can’t be tabbed into.

On the backend, just check it using a method like this:

if request.POST['comments'] != '':
    raise ValueError('Comments must be blank')

You’d perhaps want to handle this type of request differently than the above example though. It should look to the bot like the post went through successfully, so raising an exception might not be the best method.

Bots are stupid

I mentioned earlier that defending against bots using Javascript might not be a good idea, as many bots are stupid and won’t execute Javascript. We can use a sort-of reverse honeypot approach to take advantage of this. The page will load with an empty field. We’ll then use Javascript to then put a secret value into the text field that the backend will validate.

For example, just use a hidden field (come up with a better name though):

<input type="hidden" name="not-a-honeypot" value="" id="id-not-a-honeypot">

Then populate it with a value.

window.onload = function() {
    document.getElementById('id-not-a-honeypot').value = 'secret-value'
}

In the backend, validate this value.

if request.POST['not-a-honeypot'] != 'secret-value':
    raise ValueError('Secret value is incorrect')

You could make this more secure by storing a random string in the session when the contact page loads. Then, inject it into the page using your templating language of choice. Later, check that the string matches when the form is posted. Don’t go too crazy with this though, if the bots do support Javascript then they’ll be able to defeat event the most secure forms protected using this method.

Bots don’t know who they are

This tip applies if your form has both First Name and Last Name fields (rather than just a single Name field). I’ve noticed that some bots are filling in the same first and last name into both fields. For example, First Name is John Smith and Last Name is also John Smith.

Once you know about this, it’s obviously easy to fix:

# check if first and last names match
# also check if there is a space in the name
if ' ' in request.POST['first_name'] \
         and request.POST['first_name'] == request.POST['last_name']:
    raise ValueError('First and last names shouldn\'t match')

Bots are short with words (or overly verbose)

I’ve noticed a lot of spam inputs will have just a single word in their message. Either that, or a novel pitching a service. You can probably pretty safely discard a form post if the message contains just a single word.

# check if the message is just a single word
if request.POST['message'].strip().count(' ') == 0:
    raise ValueError('The message is too short')

And depending on the type of messages you receive, you might be able to discard messages longer than a certain length. Take a look at the messages you’re receiving and you’ll probably find that legitimate ones aren’t very long. Use your own data to guide the decision here.

Conclusion

So that was five methods of preventing spam in web forms without using a CAPTCHA. They’re not perfect, but neither is CAPTCHA. All of them can be subverted by a real user with a real browser – but then again, so can CAPTCHA. They may inspire some other ideas to both protect your forms and make them easy for your users to complete.

About Tera Shift

Tera Shift Ltd is a software and data consultancy. We help companies with solutions for development, data services, analytics, project management, and more. Our services include:

  • Working with companies to build best-practice teams
  • System design and implementation
  • Data management, sourcing, ETL and storage
  • Bespoke development
  • Process automation

We can also advise on how custom solutions can help your business grow, by using your data in ways you hadn’t thought possible.

About the author

Ben Shaw (B. Eng) is the Director of Tera Shift Ltd. He has over 15 years’ experience in Software Engineering, across a range of industries. He has consulted for companies ranging in size from startups to major enterprises, including some of New Zealand’s largest household names.

Email ben@terashift.co.nz