Large Language Models(LLMs) have become essential tools for software developers, helping them solve complex problems and even implement entire functionalities in some cases. However, while these models can accelerate development and streamline workflows, over-reliance on them comes with its own set of risks.
A well-known flaw of LLMs is their tendency to ‘hallucinate’—a phenomenon where the model fabricates information that sounds plausible but is, in fact, incorrect or entirely fictional. This hallucination becomes particularly risky in the context of package suggestions. Imagine a developer querying an LLM for help with implementing a specific functionality. The model might write up a code snippet which uses a package that does not exist in real life. An unsuspecting developer who blindly uses the suggested code without verifying the package’s legitimacy could expose themselves to an attack staged by malicious actors.
The Attack
Theoretically, if an attacker could determine that a certain package is a hallucination, they could preemptively create and upload a malicious package with the same name to popular repositories and trick developers into downloading malware onto their systems in the form of the package. This technique of preempting the availability of a trusted name and hijacking it for malicious use has been used by attackers since time immemorial. A great recent example is the technique’s use in Revival Hijacking.
A more realistic scenario would involve a developer of an open-source project using LLM-generated code that references a non-existent package for a minor functionality script. A hacker discovers this project, recognizes that the package does not exist, and quickly creates and uploads a malicious package under that name to popular repositories. They then wait for unsuspecting developers to install it, potentially gaining access to their systems or compromising the security of other projects that depend on the affected code.
In a continuous integration/continuous deployment (CI/CD) pipeline, where dependencies are resolved and updated automatically, a compromised package can infiltrate the system completely undetected. Imagine a scenario where an open-source project containing a hallucinated package makes its way into a CI/CD pipeline, perhaps as a minor utility or dependency. The pipeline, which is set to automatically pull updates and build the code, unknowingly fetches the newly created malicious package from a public repository, introducing malicious code into the production environment.
Impact and Mitigation
The impact of exploiting hallucinated packages can be significant, especially for open-source projects integrated into critical systems. If malicious code infiltrates widely used software, it could lead to severe security breaches, data theft, and compromised systems on a broad scale.
As mentioned before, open-source projects are prime targets of this attack. Hackers often target open-source software, actively searching for exploitable vulnerabilities. The transparency of open-source code, coupled with its widespread adoption by major organizations, provides attackers with ample opportunity to introduce malicious packages into the supply chain with minimal effort.
The way to mitigate the risk posed by this attack is to not blindly incorporate LLM-generated code into projects, no matter how minor the functionality may seem. Check whether not packages mentioned in the code are real and legitimate. Conduct due diligence to ensure that packages do not contain malicious code.
Strong dependency management is also essential, especially in open-source projects with significant community contributions. Active monitoring and vetting of dependencies can help thwart these types of attacks.