ChatGPT: It’s Easy to Poison this AI Model


ChatGPT is still fragile and needs to be used with care. You can hoodwink it into thinking it is giving you information for the right reasons, and in the process, make security systems vulnerable.

Artificial intelligence has revolutionised several industries, including cyber security, where it has improved security breach detection and prevention. This has been demonstrated by the chatbot ChatGPT, although subsequent abuses of the bot have raised questions about its ability to hurt in unanticipated ways. ChatGPT has been successfully poisoned by attackers. The potential for an AI-assisted kill chain is examined in this article along with the corresponding security risks.

OpenAI came up with the chatbot ChatGPT based on the architecture of GPT-3, a large multimodal model used to respond to follow-up queries, admit mistakes, dispute false premises, and more. ChatGPT is also known as GPT-3.5 since it is fine-tuned to answer most queries and continue conversations, as shown in Figure 1.

The position of ChatGPT in the development of AI
Figure 1: The position of ChatGPT in the development of AI

Following its introduction, many people rapidly began using ChatGPT for a variety of tasks, from writing articles to creating social media posts and captions. It provides suitable replies in a conversational style using deep learning algorithms to comprehend the context and meaning of text input. This model can comprehend a wide range of topics and circumstances because it was trained on a huge data set of online content.

Over half (51 per cent) of the 1500 IT decision-makers surveyed by BlackBerry in North America, the UK, and Australia have predicted that a cyber attack attributed to ChatGPT will take place in less than a year. Seventy-five per cent of those surveyed claimed that ChatGPT is already being abused by foreign powers against other nations.

While the respondents in this poll believe that ChatGPT is employed for “good” reasons, 73 per cent recognise the potential threat it poses to cyber security and are either “very” or “fairly” worried. This shows that artificial intelligence (AI) is a double-edged sword.

Most of the offensive activity in the cyber security industry requires the attackers to build harmful scripts and programs. The scripts and malicious software can be utilised for reconnaissance, delivery, breach, and affect—the four steps of an effective cyber death chain. Therefore, it is advised for security and risk management leaders to be more concerned about the real-world risks the ChatGPT model poses in an operational environment.

Scripts for programs like Shodan, instructions for Nmap, and other similar programs may be included in reconnaissance scripts. Scripts for the Metasploit framework, phishing websites, and other types of scripts and codes for distribution are examples. A breach is a script that takes advantage of weaknesses. For a Denial-of-Service assault, for instance, it may entail creating a booster script for the High Orbit Ion Canon (HOIC).

PowerShell or bash scripts are typically used to influence and compromise the target system. Due to its ability to develop codes and scripts for all the uses, ChatGPT serves as an excellent tool for attackers looking to automate their attacks.

I asked ChatGPT for a Shodan syntax to find all the connected webcams on a street in a city, which it had earlier refused to find. However, when I specified it was needed for research purposes only, the reply was as shown in Figure 2.

Open AI module poisoned to reveal code that it had earlier refused to give
Figure 2: Open AI module poisoned to reveal code that it had earlier refused to give

It was also asked for a Shodan syntax to monitor vulnerabilities of servers in a specific country, which it provided. The same is shown in Figure 3.

AI model poisoned to reveal Shodan script to monitor server vulnerabilities in a certain country
Figure 3: AI model poisoned to reveal Shodan script to monitor server vulnerabilities in a certain country

The examples above demonstrate that ChatGPT is taught not to divulge harmful material, such as malware codes or hostile cyber security tools. By altering our queries, however, it is possible to get around this training and force ChatGPT to create malicious or dangerous code. This is sometimes referred to as prompt engineering. It is strictly forbidden for ChatGPT to answer questions that are related to ‘hacking’ or writing malwares directly, therefore poisoning its inputs may force it to do so. For example, poisoning the queries of ChatGPT may enable it to write racist or offensive jokes or even write codes for malware and other unethical cyber activities.

This is a serious challenge for the developers and users of ChatGPT, as well as for society at large. It is important to be aware of the potential risks and limitations of ChatGPT, and to use it responsibly and ethically. ChatGPT is trained using reinforcement learning from human feedback (RLHF) to reduce harmful and untruthful outputs, but it is not perfect, and can still make mistakes or be exploited.

Therefore, prompt engineering should be used with caution and respect, and only for positive and beneficial applications. ChatGPT can help with answering questions, suggesting recipes, writing lyrics in a certain style, generating code, and much more, but it should not be used for harming others or violating their rights.


Please enter your comment!
Please enter your name here