Researchers Utilize GPT-4 for Autonomous Website Hacking - Success Rate at 53%

Earlier this year, researchers discovered that OpenAI’s artificial intelligence GPT-4 could create exploits for vulnerabilities by analysing online information. Now, the team has used autonomous bot squads based on GPT-4 to successfully hack more than half of test websites – the bots were capable of self-coordination and creating additional bots when necessary.

Creating Zero-Day Vulnerability Exploits

The bots developed exploits for previously unknown zero-day vulnerabilities. In their previous work, researchers employed GPT-4 to take advantage of known vulnerabilities (CVE) for which fixes had not been released yet. They found that the neural network could create exploits for 87% of critical CVE vulnerabilities posing high risk. Researchers from the University of Illinois at Urbana-Champaign have now published the results of their new study where GPT-4 bots attempted to hack test sites by exploiting zero-day vulnerabilities.

Hierarchical Planning Method

Rather than utilising a single bot and overloading it with a multitude of complex tasks, the researchers used a group of autonomous, self-propagating agents based on a large language model (LLM). These agents employed a hierarchical planning method that assigned specific tasks to different bots. A key element in this system was the “planning agent”, which oversaw the entire operation and triggered multiple “sub-agents” to perform individual tasks. Just as a manager interacts with employees, the “planning agent” coordinated its actions with a “management agent”, which in turn delegated tasks to “expert sub-agents”, thus evenly distributing the workload.

Comparative Efficiency of the Approach

The investigators compared the effectiveness of this approach with 15 real vulnerabilities. The hierarchical planning method was found to be 550% more effective compared to how a single neural network-based bot handled similar vulnerabilities. The bot squad managed to take advantage of 8 out of 15 vulnerabilities, whereas a solitary bot could only create an exploit for three vulnerabilities.