Spam and the benefits of upgrading to reCAPTCHA v3

Apart from being an incredibly tasty and convenient foodstuff (and a highly effective bait for carp fishing), spam is also a really annoying digital ploy. While both are very different, one - it is said - is responsible for the name of the other. There is a famous Monty Python sketch, where a couple are airlifted into a cafe only to discover that the menu mainly consists of Spam, so no matter what they order, it’s coming with Spam regardless … you get the point. Fast forward fifty years and we’re still being served a generous helping every single day.

So what is spam and why does it happen?

Well, even though most spam looks harmless enough at its core, there is a scheme that usually revolves around convincing people to part with their money or personal details. An unsolicited bulk email is sent out to hundreds of thousands of email addresses and its content is often designed to trick its receiver into taking some kind of action.

This could be sending money abroad to a stranger in need, clicking on a link to a website that contains the secret to making megabucks for a small fee, or on a more sinister note, a fake company email that directs you to a counterfeit company website that requires you to make a payment before your internet services are cut off. Scary stuff right?

Well, thankfully these kinds of emails are usually poorly written, detected as dangerous by your email client and fairly obvious in their dishonest intentions to the average internet user. These kinds of scams are known as Phishing scams. There are other types of spam where the purpose is to have you download malicious software to your computer or phone which you’ll need to pay to be removed or simply just designed to confirm your email address exists before the details are sold on to other spammers. As I said, it’s all for financial gain and on some level, this must work otherwise the people that do this would not waste their time.

But who does this kind of thing?

This is a good question and I suppose the answer is anyone who’s looking to make some dishonest money and knows how to do it. Spambots will do most of the work. In short, spambots are computer programs created to obtain email addresses from the web, automatically create email accounts and send emails en masse. You can read more about spambots here.

So if you have an email address, you’ll be getting spammed. Thankfully most email clients these days are great at weeding out and redirecting it to those inboxes you never feel the need to look at - Gmail is especially good at this. So when it comes to your inbox it’s much less of a concern. But what about if you have a contact form on a website? How do you protect that from spam?

Contact forms on websites can also be targeted by spambots (forum spambots) and their aim is to spam your website with as many emails as possible. The trouble here is that this can affect the performance of your website (the server has to do a lot of extra work to process all those form submissions). Another issue of this is because contact forms are often forwarding submissions to an admin email address from your website's default email address, they will often be missed by the spam filter and show up in your main inbox. Ideally, you’d want a form to reject spam before it is submitted so how can we go about that?

Contact forms and spam prevention

Contact forms are one of the most common features of modern websites. These forms should be easily accessible to anyone, which means they are also easily accessible to spambots.

To help fight against spam, websites rely on a mechanism called CAPTCHA - Completely Automated Public Turing test to tell Computers and Humans Apart. CAPTCHAs were invented in 2000 and are basically little puzzles that a human can solve easily but computer programs should have trouble with.

You’ll probably recognise the below image. It is an example of the original ReCAPTCHA. Type in the words to prove you are human and your contact form will be submitted.

These were great as long as you could actually read the distorted text. And prior to computers learning to read text in images or decipher sounds as words, this was pretty resilient. However, this was soon compromised and needed to be updated.

Google’s ReCAPTCHA v2

Google’s ReCAPTCHA v2 was one of the most popular CAPTCHA solutions and was released in 2014. We are sure you will have seen this before:

The problem with ReCAPTCHA v2

Most people will solve Google’s ReCAPTCHA v2 by clicking the correct images. ReCAPTCHA v2 also allows visually impaired people to solve them, via a text or audio challenge. The audio challenge plays spoken words which you must then type in a text field, as seen below:

Over the years, the rise of mobile phone usage and Artificial Intelligence paved the way for improved speech recognition technology. Converting speech to text became possible for a computer, with even Google offering a solution in this space with their Google Cloud Speech-to-Text.

These days, ReCAPTCHA v2 can be bypassed by the use of computer scripts with a somewhat simple process:

Try to submit a form, get a popup from ReCAPTCHA v2 to solve a visual puzzle
Click the “headphones” icon until you get an Audio challenge
Feed the audio through a Speech-to-text tool
Add the solved text into the form, submit

Some examples of this vulnerability being explored publicly can be found here:

Another way to beat CAPTCHAs is to use CAPTCHA farm services. This relies on workers in low-cost countries to solve CAPTCHAs for a low fee - yes, these really exist!

The need for ReCAPTCHA v3

Faced with the shortcomings of v2, Google released ReCAPTCHA v3 in 2018 with a very ironic twist: almost no human interaction is required. Instead, per Google’s own words:

reCAPTCHA uses an advanced risk analysis engine and adaptive challenges to keep malicious software from engaging in abusive activities on your website. Meanwhile, legitimate users will be able to log in, make purchases, view pages, or create accounts and fake users will be blocked.

reCAPTCHA’s risk-based bot algorithms apply continuous machine learning that factors in every customer and bot interaction to overcome the binary heuristic logic of traditional challenge-based bot detection technologies.

This indicates that Google has a secret algorithm to evaluate the interaction with a particular page or form and give it a score; a higher score means there is a higher probability of the interaction being done by a human; a low score means that it’s most likely bot/spam activity.

Some aspects that will help Google detect humans include:

Your IP address and how many pages/forms it visits
If you are logged in to a Google account in your browser
Mouse and Keyboard movement
How fast the form was filled out

If unsure, Google ReCAPTCHA v3 will present a challenge, similar to the previous version.

What if my website is being spammed?

If your site is being spammed despite the presence of a CAPTCHA you may need to look at upgrading to its latest version. Reports are on the increase from our clients that v2 is failing and that there is a need to upgrade to v3. I’m sure that one day these reports will also come in for v3 but until then it’s currently the best defence available to stop these pesky spammers in their tracks!

8th July 2021

Author

Mani Gaspar

Developer

Mani has been a part of the Gecko team since 2009, providing expertise in coding and other technical aspects of our projects. Previously based in Edinburgh, in his early Gecko-years, before moving back home to Portugal and making everyone else jealous of the blue sky and sunshine he gets to enjoy.

@Twitter

Gecko Agency (Edinburgh & Chester)

hello@wearegecko.co.uk
t: 0131 240 3390

t: 0131 240 3390