The topic of security has always attracted much attention in the artificial intelligence (AI) industry.
Especially after the emergence of large language models (LLM) such as GPT-4, many industry experts called for "immediately suspending the training of artificial intelligence models more powerful than GPT-4", including Musk. Thousands of people inside stood up in support and signed an open letter.
This open letter comes from the Future of Life Institute, which is jointly led by MIT professor, physicist, artificial intelligence scientist, and author of "Life 3.0" Max Tegmark and others Founded, it is one of the first organizations to focus on artificial intelligence safety issues. Its mission is to "guide transformative technologies to benefit life and avoid extreme large-scale risks."
Public information shows that the advisory board of the Future of Life Institute has a strong lineup, including theoretical physicist Hawking, entrepreneur Musk, Harvard University genetics professor George Church, and MIT physics professor Frank Wilczek, as well as actors and science communicators Alan Alda, Morgan Freeman, and others.
Recently, the Future of Life Institute invited 7 artificial intelligence experts and governance experts including Turing Award winner Yoshua Bengio and University of California, Berkeley computer science professor Stuart Russell to evaluate 6 artificial intelligence companies (Anthropic, Google DeepMind, Meta, OpenAI, x.AI, Zhipu) safety practices in six key areas, and released their first "Artificial Intelligence Safety Index Report" (FLI AI Safety Index 2024).
The report shows that although Anthropic received the highest safety rating, the score was only a "C" and that six companies, including Anthropic, still have room to improve their security practices.
Report link: https://futureoflife.org/document/fli-ai-safety-index-2024/
Regarding this report, Tegmark even pointed out pointedly on X : Anthropic first and Meta last, that is: Anthropic has the highest security, while Meta, which insists on open source, is last in this regard. But Tegmark also stated that “the purpose of this is not to shame anyone, but to inspire companies to improve.”
It is worth mentioning that the Future of Life Institute wrote in the report, “Companies were selected based on their expected ability to build the most powerful models by 2025. In addition, the addition of Intelligent Intelligence reflects our intention to make the index representative of the world's leading companies. As the competitive landscape evolves, future iterations may focus on different companies. ”
6 Assessing AI Security from Large DimensionsAccording to reports, the review Experts start with Risk Assessment, Current Harms, Safety Frameworks, Existential Safety Strategy, Governance & Accountability, and Transparency & Communication) evaluates each company separately, and finally summarizes the total security index score
Dimension 1: Risk Assessment
In the risk assessment dimension, OpenAI, Google DeepMind and Anthropic. Credited with implementing more rigorous testing to identify potentially dangerous capabilities, such as cyberattack misuse or bioweapons production, the report also noted that these efforts still have significant limitations and the risks associated with AGI are not yet fully understood. p>
OpenAI The deceptive capability assessment and improvement research has attracted the attention of review experts; Anthropic is considered to be particularly outstanding due to its in-depth cooperation with the national artificial intelligence security agency. Google DeepMind and Anthropic are the only two companies that maintain special vulnerabilities for model vulnerabilities. Companies that reward programs. Meta’s risk assessment of threat models related to autonomy, scheming, and persuasion is relatively insufficient, although hazard capabilities are assessed before model deployment. Pre-deployment assessments are almost missing and are significantly lower than industry standards.
Review experts suggest that the industry should expand the scale and scope of research and establish clear standards for acceptable risk thresholds to further improve artificial intelligence models. Security and Reliability.
Dimension 2: Current Harm
In the Current Harm dimension, Anthropic’s AI system received the highest score in the security and trust benchmark. , followed closely by Google DeepMind, the company’s Synth ID Watermarking systems are recognized as a best practice for reducing abuse of AI-generated content
Other companies scored poorly, exposing the inadequacies of security mitigations, such as Meta, which was criticized for exposing cutting-edge model weights. CanCan be exploited by malicious actors to remove security protections.
In addition, adversarial attacks are still a major problem, and most models are vulnerable to jailbreaking attacks. OpenAI's models are particularly vulnerable, and Google DeepMind has the best defense performance in this regard. Review experts also pointed out that only Anthropic and Zhipu avoid using user interaction data for model training by default, a practice that other companies can learn from.
Dimension 3: Safety Framework
In terms of safety frameworks, all six companies have signed the "Frontier Artificial Intelligence Safety Commitment" and committed to developing a safety framework, including setting Unacceptable risk thresholds, advanced protective measures in high-risk scenarios, and conditions for suspending development when risks are uncontrollable.
However, as of the release of this report, only OpenAI, Anthropic and Google DeepMind have published relevant frameworks, and review experts can only evaluate these three companies. Among them, Anthropic is recognized for having the most detailed framework content and has also released more implementation guidance.
Experts unanimously emphasized that the security framework must be supported by strong external review and supervision mechanisms in order to truly achieve accurate assessment and management of risks.
Dimension 4: Survival Security Strategy
In the survivability security strategy dimension, although all companies expressed their intention to develop AGI or super artificial intelligence (ASI), they acknowledged that such systems may However, only Google DeepMind, OpenAI and Anthropic have conducted more serious research on control and security.
The review experts pointed out that no company has proposed an official strategy to ensure that advanced artificial intelligence systems are controllable and consistent with human values. Existing technical research is still immature and immature in terms of controllability, alignment and interpretability. insufficient.
Anthropic received top marks for its detailed “Core Views on AI Safety” blog post, but experts believe its strategy will not be effective in protecting against the significant risks of super artificial intelligence. OpenAI's "Planning for AGI and beyond" blog post only provides high-level principles, which is considered reasonable but lacks actual plans, and its scalable supervision research is still immature. Research updates shared by Google DeepMind's alignment team are useful, but not enough to ensure security, and the blog content doesn't fully represent the company's overall strategy.
Meta, x.AI and Zhipu have not yet proposed technical research or plans to address the risk of AGI. The review experts believe that Meta’s open source strategy and x.AI’s “democratized access”ss to truth-seeking AI” vision, which may mitigate the risks of power concentration and value solidification to a certain extent.
Dimension 5: Governance and Accountability
In governance and accountability Regarding the governance dimension, the review experts noted that Anthropic’s founders have invested considerable effort in establishing a responsible governance structure, making it more likely that Anthropic’s other proactive efforts, such as responsible scaling policies, will prioritize safety. Also received positive reviews
OpenAI. The original non-profit structure was similarly praised, but recent changes, including disbanding the security team and moving to a for-profit model, have raised concerns about the declining importance of security in governance and accountability. The company has taken an important step by committing to implementing a security framework and publicly stating its mission. However, its profit-driven corporate structure as part of Alphabet is considered to have limited its autonomy in prioritizing security. /p>
Meta although in CYBERSEC EVAL There has been action in areas like red team testing, but its governance structure has failed to align with security priorities. In addition, the practice of releasing advanced models as open source has led to the risk of abuse and further weakened its accountability.
x.AI Although officially registered as a public benefit corporation, it is significantly less proactive in AI governance than its competitors. Experts note that the company lacks an internal review board on key deployment decisions and does not report anything substantive publicly. Risk assessment.
As a for-profit entity, Zhipu operates in compliance with legal and regulatory requirements, but the transparency of its governance mechanism is still limited.
Dimension 6: Transparency and Communication.
In terms of transparency and communication, review experts expressed their opinions on OpenAI, Google In stark contrast, x.AI was praised for supporting SB1047, demonstrating its active support. Position on regulatory measures aimed at strengthening artificial intelligence safety
Except Meta. Outside of the company, x.AI and Anthropic were also praised for their public response to the extreme risks associated with advanced artificial intelligence and their efforts to educate policymakers and the public about these issues. Experts also noted excellence in risk communication. , Anthropic continues to support governance initiatives that promote transparency and accountability in the industry
Meta. The company's rating is significantly affected by its leadership's repeated neglect and underplaying of issues related to extreme artificial intelligence risks, which reviewers consider a significant flaw.They stressed that there is an urgent need for greater transparency across the industry. x.AI Lack of information sharing on risk assessment was specifically cited as a lack of transparency.
Anthropic has gained further recognition by setting the benchmark for industry best practice by allowing the UK and US AI Safety Institute to conduct third-party pre-deployment assessments of its models.
How do experts score?In terms of index design, the six major evaluation dimensions include multiple key indicators, covering corporate governance policies, external model evaluation practices, security and fairness and robustness benchmark results. The selection of these indicators is based on broad recognition in academia and policy circles to ensure their relevance and comparability in measuring company security practices.
The main inclusion criteria for these indicators are:
Relevance: The checklist highlights aspects of safe and responsible behavior in AI that are widely recognized by the academic and policy communities. Many indicators come directly from related projects conducted by leading research institutions such as Stanford University's Center for Fundamental Model Research. Comparability: Select indicators that highlight meaningful differences in safety practices that can be identified based on available evidence. Therefore, safety precautions without conclusive evidence of difference were omitted.Companies were selected based on their expected ability to build the most powerful models by 2025. In addition, the addition of Zhipu also reflects the index's intention to represent the world's leading companies. As the competitive landscape evolves, different companies may be focused on in the future.
Figure | Overview of evaluation indicators.
In addition, when the Future of Life Institute compiled the "AI Safety Index Report", it built a comprehensive and transparent evidence base to ensure that the assessment results are scientific and reliable. The research team created a detailed score sheet for each company based on 42 key indicators, and provided links to all raw data in the appendix for public review and verification. Sources of evidence include:
Public information: Mainly from public materials such as research papers, policy documents, news reports, and industry reports, which not only enhances transparency, but also facilitates stakeholders to trace the source of information. Company questionnaire survey: A questionnaire was distributed to the assessed company to supplement internal information such as security structure, processes and strategies not covered by public data.The evidence was collected from May 14 to November 27, 2024, covering the latest artificial intelligence benchmark data, and data extraction times were recorded in detail to reflect model updates. The Future of Life Institute is committed to the principles of transparency and accountability by fully documenting and making all data – whether from public sources or provided by companies – available for review and research.
In terms of the scoring process, after completing the evidence collection on November 27, 2024, the research team will submit the summarized scoring sheet to independent artificial intelligence scientistsand governance expert panel review. The scoring sheet covers all indicator-related information and is accompanied by scoring guidelines to ensure consistency.
The review experts score each company based on absolute standards, rather than simply making horizontal comparisons. At the same time, experts are required to attach a brief description to support the score and provide key suggestions for improvement to reflect the evidence base and their professional insights. The Future of Life Institute also invites expert groups to assess specific areas, such as “survival safety strategies” and “current hazards”, to ensure professionalism and depth in scoring. Finally, the scores in each field are scored by at least four experts, and the average score is summarized and displayed in the score card.
This scoring process focuses on structured, standardized assessments while retaining flexibility, allowing experts’ professional judgment to be fully integrated with actual data. Not only does it demonstrate the current state of safety practices, it also suggests possible directions for improvement, motivating companies to achieve higher safety standards in the future.