In recent years, we have been gradually subjected to different tests to prove our human condition when surfing the internet. The button, initially simple, direct, and, at times, even nice, stating “I’m not a robot” has given way to new tools with the objective of detecting machines. Grids with images of traffic signals, crosswalks, fire hydrants, boats, bicycles, and whatever else the imagination sends, often among other objects, started to challenge not our ability to identify them, but our patience.
The test named CAPTCHA , the acronym for Completely Automated Public Turing Test to distinguish Computers and Humans (or Completely Automated Public Turing test to tell Computers and Humans Apart), has become a constant. In the early 2000s, text images or letters were enough to take down most spambots.
But over the years, after Google bought Carnegie Mellon’s search engine and used it to digitize Google Books, texts had to be increasingly distorted to prevent recognition by optical character recognition programs – programs that, inadvertently humans taking the CAPTCHAs test helped to improve.
Using machine learning, the system started to identify a greater amount of distorted letters and include random words in the test box. The evolution of machine learning in basic text, image, and speech recognition identification has increased the complexity of distinguishing between humans and machines. Some say that, in fact, the algorithms are probably even better. As the system became more vulnerable to bots, it was necessary to vary the format of authentications to continue with access control.
With the use of artificial intelligence and machine learning, any test ends up having temporary effectiveness, which soon became clear to the experts. A few years ago, Google’s machine learning algorithms registered 99.8% correct answers in the CAPTCHAs test solution, while among humans the correct answers were 33%.
To err is human
With machines evolving rapidly, new versions of the tests followed with additional validations such as the various versions of ReCAPTCHA, which analyzes data and user behavior allowing some humans to advance with a click on the “I’m not a robot” button while others must undergo other tests. There are more recent cases whose validation process does not require user interaction. In these, the system itself performs several automatic CAPTCHA tests providing a score for each access. Low scores indicate that this is likely a bot and therefore there are more scan actions.
We’ve gotten to the point where to make the software more difficult to access, it also becomes more difficult for many people to use. Researchers experimented with using the classification of people’s images by facial expression, gender, and ethnicity or tests based on common childhood rhymes in the region where the user was located. There are game-like CAPTCHAs, tests that require users to rotate objects at certain angles or move puzzle pieces into the correct position, and various other creative ideas such as using cameras or augmented reality devices in an interactive proof of humanity.
The problem with many of these tests is not necessarily that bots are highly intelligent or that some humans are not highly intelligent, but rather that humanity is extremely diverse in language, culture, training, and experiences. Perhaps the solution is simple. It may be in creating a CAPTCHA test based on common mistakes people make when clicking buttons, identifying images, or analyzing text. After all, making mistakes is human.