大模型安全和评测这块,之前出的 Inspect 框架的英国 AI 安全研究所 (AISI) 。最近他们又开源了新工具 ControlArena。
这玩意主要用来在受控的沙盒环境里,自动化评估大模型 Agent 的自主性边界和潜在危险能力。
官网:
Inspect (https://inspect.aisi.org.uk/)
Inspect (https://inspect.aisi.org.uk/)
Open-source framework for large language model evaluations
项目:
github.com (https://github.com/UKGovernmentBEIS/control-arena)
GitHub - UKGovernmentBEIS/control-arena: ControlArena is a collection of settings, model... (https://github.com/UKGovernmentBEIS/control-arena)
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
github.com (https://github.com/UKGovernmentBEIS/inspect_ai?tab=readme-ov-file)
GitHub - UKGovernmentBEIS/inspect_ai: Inspect: A framework for large language model... (https://github.com/UKGovernmentBEIS/inspect_ai?tab=readme-ov-file)
Inspect: A framework for large language model evaluations