Giskard, a pioneering French startup, is revolutionizing the landscape of AI model testing with its open-source framework. Focused on evaluating large language models (LLMs), Giskard’s framework aims to alert developers to biases, security vulnerabilities, and the potential generation of harmful or toxic content within AI models before they enter production.
While the AI industry is buzzing with excitement around advanced models, the spotlight is shifting toward ML testing systems, especially with impending regulations like the AI Act in the EU and similar initiatives in other regions. These regulations necessitate compliance checks, compelling companies developing AI models to ensure adherence to predefined rules to avoid substantial fines.
Giskard stands out as an AI startup that not only embraces regulation but also pioneers developer tools that efficiently address testing complexities.
Alex Combessie, the co-founder and CEO of Giskard, reflected on the motivation behind the venture, stating, “My experience at Dataiku highlighted the challenges in NLP model integration. There were evident shortcomings when applying them practically, making it arduous to compare suppliers’ performance effectively.”
Behind Giskard’s testing framework lie three pivotal components. Firstly, an open-source Python library, compatible with prominent ML tools like Hugging Face, MLFlow, Weights & Biases, PyTorch, Tensorflow, and Langchain, aids in integrating the framework into LLM projects—especially retrieval-augmented generation (RAG) projects, gaining traction on GitHub.
Once integrated, Giskard streamlines the creation of a test suite to regularly evaluate models, encompassing diverse issues such as performance, misinformation, biases, harmful content generation, and more.
“Beyond just performance metrics, ethical considerations are gaining significance—both in terms of brand image and regulatory compliance,” Combessie emphasized.
The integration of tests into the continuous integration and continuous delivery (CI/CD) pipeline ensures routine assessments with immediate alerts to developers in case of anomalies, fostering a proactive approach to model refinement.
Tailoring tests to specific model use cases, Giskard collaborates with companies working on RAG, leveraging access to pertinent databases and repositories. For instance, in validating a climate change-related chatbot using an LLM from OpenAI, Giskard’s tests would evaluate the model’s accuracy in generating information without misinformation or contradictions.
Giskard’s premium offerings include an AI quality hub for debugging and comparing LLMs, catering to companies like Banque de France and L’Oréal. The future roadmap envisions this hub as a repository for regulatory compliance documentation.
Additionally, LLMon, Giskard’s real-time monitoring tool, pre-evaluates LLM responses for common issues like toxicity or fact-checking, ensuring user safety before responses are dispatched. While currently compatible with OpenAI’s APIs and LLMs, Giskard is actively developing integrations with other providers like Hugging Face and Anthropic.
Navigating AI regulation remains a complex task. Giskard’s positioning seems ideal for flagging potential misuses, particularly in scenarios involving external data enrichment in LLMs (RAG), a domain where regulations like the AI Act may focus.
With 20 dedicated team members, Giskard anticipates doubling its workforce, emphasizing its commitment to become the premier antivirus solution for LLMs.
Giskard’s innovative approach to AI model testing not only aligns with impending regulations but also underscores its dedication to fostering compliant, ethical, and reliable AI models.