Haize Labs: Leading AI Safety with Public Ratings for Generative Models

In an era where artificial intelligence is becoming increasingly pervasive, the urgency for stringent safety protocols has reached unprecedented levels. Enter Haize Labs, a burgeoning start-up founded a mere five months ago by Leonard Tang, Steve Li, and Richard Liu. Despite their recent graduation, these three visionaries have collectively published 15 papers on machine learning and have already made substantial headway in the realm of AI safety. Their ambitious mission is to revolutionize the field by identifying and exposing thousands of vulnerabilities in some of the most popular generative AI programs.

The vulnerabilities uncovered by Haize Labs in tools such as the video creator Pika, the text-focused ChatGPT, the image generator Dall-E, and a system designed to generate computer code are nothing short of alarming. These AI tools demonstrated the capacity to produce violent or sexualized content, provide instructions for creating chemical and biological weapons, and even facilitate the automation of cyberattacks. Leonard Tang, co-founder of Haize Labs, encapsulates their mission succinctly: “We want to become a ‘Moody’s for AI.’ Our goal is to establish public-safety ratings for popular AI models, ensuring they are safe for general use.”

The critical nature of this mission is underscored by recent industry events that have highlighted the risks of deploying AI without rigorous safety evaluations. For instance, Google faced significant backlash when its experimental “AI Overviews” tool suggested dangerous activities such as eating small rocks or adding glue to pizza. Similarly, Air Canada’s AI-enabled chatbot erred, promising a fake discount to a traveler. Jack Clark, co-founder of AI research and safety company Anthropic, emphasized the necessity of robust safety evaluations, stating, “As AI systems get deployed broadly, we are going to need a greater set of organizations to test out their capabilities and potential misuses or safety issues.”

Despite the efforts of large corporations and industry labs, Tang noted that it remains relatively easy to manipulate AI models into performing unintended actions. “They’re not that safe,” he remarked, stressing the critical gap Haize Labs aims to fill. To address these challenges, Haize Labs employs a method known as “red teaming,” which involves simulating adversarial actions to identify vulnerabilities. Tang elaborated, “Think of us as automating and crystallizing the fuzziness around making sure models adhere to safety standards and AI compliance.”

Haize Labs has taken the proactive step of open-sourcing the vulnerabilities they uncovered on GitHub. This allows developers and researchers to study these weaknesses and work towards mitigating them. The start-up has also promptly flagged these vulnerabilities to the creators of the AI tools tested, ensuring that the issues are addressed swiftly. Graham Neubig, an associate professor of computer science at Carnegie Mellon University, highlighted the value of third-party AI safety tools, noting their impartiality and potential for higher auditing performance due to their specialized focus.

The revelations by Haize Labs are sobering. Some AI tools produced gruesome and graphic content, revealing the dark potential of generative AI when left unchecked. Tang emphasized the importance of automated systems in identifying these issues, as manual moderation can be time-consuming and expose moderators to disturbing content. “Our automated systems can root out vulnerabilities much faster and more efficiently than manual methods,” Tang said. “This not only speeds up the process but also protects human moderators from having to deal with violent and disturbing content.”

The work being undertaken by Haize Labs and similar organizations is pivotal in ensuring that AI tools are safe and reliable. The concepts of “red teaming” and automated vulnerability detection offer promising solutions to the challenges faced by AI developers. By simulating adversarial actions, these methods can identify weaknesses that might otherwise go unnoticed. Additionally, the open-sourcing of vulnerability data fosters a collaborative approach to AI safety, allowing the broader community to contribute to the mitigation of risks.

As Haize Labs continues to pioneer AI safety, several potential developments loom on the horizon. The company’s aspiration to become a “Moody’s for AI” could revolutionize how AI models are evaluated and trusted. Public-safety ratings for AI tools could become standardized, providing users with a clear understanding of the associated risks. The partnership between Haize Labs and Anthropic to stress test an unreleased algorithmic product suggests that the start-up is already making significant strides in the industry. As more companies recognize the importance of AI safety, collaborations like this one are likely to proliferate.

Looking ahead, the integration of automated red teaming and vulnerability detection into the development process of AI tools could significantly reduce the risks associated with generative AI. As the technology continues to advance, the insights and innovations from organizations like Haize Labs will play a pivotal role in shaping a safer future for artificial intelligence. Haize Labs has tested over 50 different AI models across various domains, from image generation to natural language processing. The start-up received initial funding from a grant awarded by their alma mater, providing the financial boost needed to kickstart their ambitious projects. With a dedicated team of 12 researchers and engineers, Haize Labs has already presented its findings at two major AI conferences.

Their automated systems can run tests continuously, significantly speeding up the vulnerability detection process. In an effort to democratize AI safety, the company is also developing a user-friendly platform for non-experts to test AI models for vulnerabilities. Haize Labs is in discussions with several major tech companies for potential partnerships and has established a dedicated ethics committee to ensure the responsible disclosure of vulnerabilities. As Haize Labs expands its operations internationally and develops a certification program for AI safety, the start-up is set to make a lasting impact on the field, ensuring that the technology we increasingly rely on is both safe and trustworthy.

Leave a comment

Your email address will not be published.