Open source is definitely a booster for the rapid development of AI, and one of the important forces comes from Meta.
Meta has made great achievements in the open source field of artificial intelligence. From the large model LLama to the image segmentation model Segment Anything, it covers various modalities, various scenarios, and even in subjects other than AI, such as medicine. Scientific research progress and so on have also benefited from Meta’s open source model.
Recently, Meta has released a series of new open source work, and has also upgraded and iterated existing open source work, including SAM 2.1, refinement of sentence representation, etc. The open source community will usher in another carnival!
Segment Anything Model 2.1Since the SAM2 model was open sourced, the total downloads It has been used more than 700,000 times, and the demo program available online has also helped users segment hundreds of thousands of objects in image and video data, and has had a huge impact in interdisciplinary research (including medical images, meteorology, etc.) .
This Meta has updated the weight of Meta Segment Anything Model 2.1 (SAM 2.1) with stronger performance.
Open source link: https://github.com/facebookresearch/sam2
Compared with SAM2, the researchers introduced additional data enhancement technology to simulate visually similar objects and small objects, And by training the model on a longer frame sequence and making some adjustments to the position encoding of "space" and "object pointer memory", the occlusion handling capability of SAM 2 was improved.
The researchers also open sourced the SAM 2 developer kit, which will make it easier to build downstream applications based on the SAM 2 model. Users can now use their own data to fine-tune the SAM 2 training code; page demonstration The front-end and back-end code are also open source.
Spirit LM: Speech + text language modelLarge-scale language Models are often used to build text-to-speech pipelines: first through automatic speech recognition (ASR) technology transcribes speech into text, then synthesizes the text by a large language model (LLM), and finally converts the text into speech through text-to-speech (TTS) technology.
However, this process may affect the expressiveness of speech, making the model deficient in understanding and generating expressive speech.
To address this limitation, researchers built Spirit LM, which is also the first Meta open source multi-modal language model that can freely mix text and speech; by using word-by-word on speech and text datasets An interleaved method is used for training to achieve cross-modal generation.
Paper link: https://arxiv.org/abs/2402.05755
Researchers developed two versions of Spirit LM to demonstrate the generative semantic capabilities of text models and speech models Expressive capabilities: Base uses phoneme tags to simulate speech, while Expressive uses pitch and style tags to capture information about intonation, such as excitement, anger, or surprise, and then generates speech that reflects that tone .
Spirit LM is able to generate more natural-sounding speech and has the ability to learn new tasks across modalities, such as automatic speech recognition, text-to-speech and speech classification.
Layer Skip: Accelerating generation timesLarge language models are already widely used across industries and use cases, but require very high computation Speed and amount of memory, running costs are very high.
To address these challenges, Meta introduces an end-to-end solution called Layer Skip, which can accelerate the LLM generation time on new data without relying on dedicated hardware or software: LLMs are accelerated by executing partial layers of the model and using subsequent layers for verification and correction.
Paper link: https://arxiv.org/pdf/2404.16710
Code link: https://github.com/facebookresearch/LayerSkip
Research People have open sourced layer skipping inference code and fine-tuning checkpoints, including Llama 3, Llama 2 and Code Llama, these models have been optimized through the layer skip training method, which significantly improves the accuracy of early layer exit, and the layer skip inference implementation can improve model performance by 1.7 times.
A key feature of layer skip checkpointing is the robustness when exiting early layers and skipping intermediate layers, and the consistency of activations between layers. This feature is optimized and interpretable paves the way for innovative research on sexuality.
Salsa: after verificationThe security of quantum cryptography standardsIn the field of protecting data security, cryptographic research must stay ahead of attack methods.
Meta’s open source method Salsa can attack and crack Krystals Kyber, the sparse secrets in the NIST standard, allowing researchers to benchmark AI-based attacks and compare them with Compare current and future new attack methods.
Paper link: https://arxiv.org/pdf/2408.00882v1
Code link: https://github.com/facebookresearch/LWE-benchmarking
< p>The industry standard adopted by the National Institute of Standards and Technology (NIST), "lattice-based cryptography" is built on the difficult problem of "learning with errors" (LWE).This difficult problem assumes that it is very difficult to learn a secret vector if only a noisy inner product with a random vector is provided, and researchers have previously demonstrated machine learning attacks against this approach.
Meta Lingua: Accelerating research through efficient model trainingMeta Lingua is a lightweight and self-contained code base that can scale Train a language model.
The project provides a research-friendly environment that makes it easier to translate concepts into practical experiments, and emphasizes simplicity and reusability to accelerate research. The platform is efficient and customizable, allowing researchers to Test ideas quickly with minimal setup and technical burden.
Code link: https://github.com/facebookresearch/lingua
To achieve this, the researchers made several design choices to ensure that the code was both modular and self-contained , while remaining efficient, which leverages several features in PyTorch to make the code easier to install and maintain while maintaining flexibility and performance.
Researchers can focus more on the work itself and let the Lingua platform take care of efficient model training and reproducible research.
Meta Open Materials 2024: Advancing Inorganic Materials DiscoveryTraditionally, discovering new materials that drive technological advancement can take decades. But AI-assisted materials discovery could revolutionize the field and significantly speed up the discovery process.
Meta recently open sourced the Open Materials 2024 data set and model, which ranks at the top of the Matbench-Discovery rankings and is expected to pass the open sourceReproducible and reproducible research further drives breakthroughs in artificial intelligence accelerating materials discovery.
Code link: https://github.com/FAIR-Chem/fairchem
Model link: https://huggingface.co/fairchem/OMAT24
Data link: https://huggingface.co/datasets/fairchem/OMAT24
The current best materials discovery model is a closed model built based on basic research in the open source artificial intelligence community, and Open Materials 2024 provides open source models and data based on 100 million training samples and is one of the largest open datasets, providing a competitive open source option for the materials discovery and artificial intelligence research communities.
Meta Open Materials 2024 is now publicly available and will empower the artificial intelligence and materials science research communities to accelerate the discovery of inorganic materials and close the gap between open and proprietary models in the field.
Mexma: Token-level objectives for improved sentence representationCurrently, pre-trained cross-language sentence encoders are usually performed using only sentence-level objectives. train. This approach may lead to a loss of information, especially for token-level information, which ultimately reduces the quality of the sentence representation.
Mexma is a pre-trained cross-language sentence encoder that outperforms previous methods by combining token and sentence-level objectives during the training process.
Paper link: https://arxiv.org/pdf/2409.12737
Previous methods of training cross-language sentence encoders only update the encoder with sentence representations, by simultaneously using tokens Improvements have been made in this area by using level targets to better update the encoder.
The researchers hope that the research community will benefit from using Mexma as a sentence encoder. Currently 80 languages are supported, and sentence representations for all languages are aligned. When mining language data containing two kinds of text, Mexma Can more accurately identify and compare information in different languages, and perform well on other downstream tasks such as sentence classification.
Self-Taught Evaluator: Generate reward modelResearchers have launched a self-learning evaluator (Self-Taught Evaluator) that can be used to generate synthetic Preference data to train reward models without relying on human annotation.
This approach generates contrasting model outputs and trains a large language model that serves as a judge (LLM-as-a-Judge) to generate reasoning traces for evaluation and final judgment, optimized through an iterative self-improvement scheme.
Paper link: https://arxiv.org/abs/2408.02666
The researchers released a model trained by direct preference optimization, and the generative reward model performed on RewardBench Powerful, but does not use any human annotation in the creation of training data.
Its performance surpasses larger models or models using manual annotation labels, such as GPT-4, Llama-3.1-405B-Instruct and Gemini-Pro, and can also be used as an evaluation on the AlpacaEval rankings One of the best evaluators in terms of human agreement rates while being approximately 7 to 10 times faster than the default GPT-4 evaluator.
Since its release, the artificial intelligence community has embraced this synthetic data approach and used it to train high-performing reward models.