In 2021, over 40% of website traffic was generated by bots rather than humans. While this statistic may seem alarming, bots play an important role in the functioning of the internet, making our lives easier by sending us push notifications on discounts and promotions. However, almost 28% of all website traffic is made up of bad bots that engage in malicious activities such as spamming, stealing personal information, and deploying malware. The way in which bots are deployed determines whether they are good or bad.
The Threat of Large Language Models (LLMs)
With the emergence of accessible generative AI, such as ChatGPT, it is becoming increasingly difficult to differentiate between bots and humans. These systems are getting better at reasoning, with GPT-4 passing the bar exam in the top 10% of test takers and bots even defeating CAPTCHA tests. This development raises the possibility of a critical mass of bots on the internet, which could pose a dire problem for consumer data.
The Need for High-Quality Data in Market Research
Companies spend approximately $90 billion annually on market research to decipher trends, customer behavior, and demographics. However, the failure rates on innovation are high, with 80% of consumer packaged goods and 75% of new grocery products failing. The data these creators rely on may be riddled with AI-generated responses that do not reflect the thoughts and feelings of consumers. This could lead to a world where businesses lack the resources to inform, validate, and inspire their best ideas, resulting in skyrocketing failure rates.
How AI Can Help Combat Bad Bots
While humans are excellent at bringing reason to data, we are unable to distinguish between bots and humans at scale. The nascent threat of LLMs will soon overtake manual processes through which we identify bad bots. However, AI can be the answer to this problem by creating a layered approach, including deep learning or machine learning models, to separate low-quality data and rely on good bots to carry them out. This technology is ideal for detecting subtle patterns that humans can easily miss or not understand.
By creating a scoring system, researchers can identify common bot tactics such as spam probability, gibberish responses, and skipped recall questions. Researchers can set guardrails for responses across factors, and by applying a point system to these traits, they can compile a composite score and eliminate low-quality data before it moves on to the next layer of checks. Factors like time to response, repetition, and insightfulness can also help analyze the nature of responses and weed out the lowest-quality responses.
To ensure high-quality data, it is crucial to consistently moderate and ingest good and bad data and establish trends by analyzing the length of responses and the count of adjectives. This methodology can be scaled across regions to identify high-risk markets where manual intervention may be necessary. By fighting nefarious AI with good AI, a virtuous flywheel can spin, and the system gets smarter as more data is ingested by the models. This results in ongoing improvements in data quality and allows companies to have confidence in their market research to make better strategic decisions.