Another breakthrough in AI! "Eye typing" is faster and more labor-saving, research published in Nature sub-journal

Another breakthrough in AI!

Image source: Generated by Unbounded AI

People who are unable to speak or type due to diseases such as ALS cannot be ignored. They are in daily communication are facing huge obstacles and are in urgent need of effective auxiliary means to break down communication barriers.

Although assistive/alternative communication (AAC) equipment and eye-movement typing technology can provide certain support, frequent key operations can easily lead to eye fatigue and time costs, which seriously hinders patients with movement disorders from practicing. Natural and smooth conversations and full self-expression impact quality of life.

To solve this problem, a research team from Google and its collaborators developed a user interface (UI) driven by a large language model (LLM) - SpeakFaster.

According to reports, SpeakFaster uses fine-tuned LLM and conversational context to convert highly abbreviated English text (only the first letters of words, with letters and words added when necessary) with extremely high accuracy. ) is expanded into the required complete phrase, helping ALS patients reduce the number of keystrokes by eye movement by 57%, and increase text input speed by 29-60% compared with baseline.

The related research paper is titled "Using large language models to accelerate communication for eye gaze typing users with ALS" and has been published in the Nature sub-journal Nature Communications.

These results suggest that by dramatically increasing text input speed and reducing physical stress, SpeakFaster can help people with severe movement disorders communicate more accurately and efficiently, allowing them to participate more fully in conversations and thereby increase independence , social participation, self-expression and quality of life.

Enabling ALS patients to communicate better

SpeakFaster provides an artificial intelligence (AI)-based approach that combines LLM with a UI designed for abbreviated text entry.

Specifically, the research team first designed SpeakFaster’s UI to ensure that it allows easy input and optimization of abbreviations, ensuring that users can always convey the information they want, even if the initial prediction is not what they want. of.

They previously demonstrated that fine-tuned LaMDA (64B parameters) can expand the initial form of a word (such as "ishpitb") into a complete phrase (such as "I saw him play in the bed"), When provided with conversational context (i.e. the other speaker's turn), accuracy was as high as 77%. Failure to find an exact match tends to occur with longer, more complex phrases.

Figure | The main interaction path for abbreviated text input in SpeakFaster UI: initial letter only path.

While promising, a practical solution would need to ensure that the user is able to subsequently enter any arbitrary phrase if the initial abbreviation expansion (AE) fails, i.e. the user never encounters " dead end". Therefore, they developed a UI and two underlying fine-tuned LLMs as a complete, practical solution.

Among them, KeywordAE can expand abbreviations that mix initial letters and complete or partially spelled words. The KeywordAE model is also able to extend abbreviations consisting only of initial letters, thus providing a superset of the functionality of their previous work.

Figure | KeywordAE UI approach.

FillMask provides alternative words starting with a given initial letter in the context of surrounding words. Both models were fine-tuned using approximately 1.8 million unique triples {context, abbreviation, complete phrase} synthesized from four public English conversation datasets.

Figure | FillMask UI path.

To form a pipeline to the fine-tuned LLM, they also designed a UI with three paths, namely Initials-only AE, KeywordAE, and FillMask, to support a complete abbreviated text input experience.

Initials-only AE is the common starting point for all phrase entry workflows in SpeakFaster UI. It involves the fewest keystrokes and eye clicks of the three paths, and the initial path alone is sufficient for short and predictable phrases. When the user enters an abbreviation, the UI automatically triggers a call to KeywordAE LLM after each keystroke, including the abbreviation the user typed and all previous dialogue turns as input to the LLM. Each call returns the top-5 most likely options based on the conversation context and abbreviation, which are presented in the UI for the user to browse and select.

If the expected phrase is not found via the initial-letter-only path, SpeakFaster UI provides two alternative UI paths to help the user find the expected phrase.

The first alternative UI path is KeywordAE, which allows users to spell multiple words. A call to KeywordAE is automatically triggered after each keypress, and after each call the UI is rendered with the latest top-5 phrase expansions returned by KeywordAE LLM.

The second alternative UI path is FillMask, which is another way to recover from a failure where the exact expected phrase cannot be found. Unlike KeywordAE, FillMask only works with extensionsVery few (usually single word) cases where a word is incorrect.

KeywordAE and FillMask are two alternative interaction modes for recovering from failure to obtain the expected phrase via the initial-letter-only path. In the current study, SpeakFaster UI allows users to use FillMask mode after using KeywordAE mode, which is useful for finding the correct word in difficult-to-predict phrases.

Figure | Phrase input simulation strategy assisted by KeywordAE and FillMask.

This way, when using SpeakFaster, users first enter the first letter of the word in the phrase they want. The fine-tuned LLM then predicts the entire phrase and displays the most likely phrase based on those initials and the conversational context. If the desired phrase is not among the options, users can refine predictions by spelling out the keyword or selecting an alternative. This approach significantly reduces the number of keystrokes required, resulting in faster communication.

Afterwards, in order to evaluate the approximate upper limit of user actions saved by the SpeakFaster UI interface, the research team conducted simulation experiments. Using the Turk Dialogues corpus, they simulated three different user interaction strategies:

Strategy 1: AE using acronyms and, if that failed, iterative spelling using KeywordAE until a matching phrase was found. Strategy 2: Same as Strategy 1, but use FillMask whenever there is only one bad word left among the best matching phrase candidates. Strategy 2A: A variation of Strategy 2 that uses FillMask more aggressively, i.e. as soon as there are two or fewer incorrect words left among the best options.

SpeakFaster achieves significant keystroke savings under all three strategies compared to Gboard's predicted baseline. Under Strategy 2, using the KeywordAE v2 model, SpeakFaster achieved a key saving rate (KSR) of 0.657, which is 36% higher than Gboard’s KSR (0.482). This shows that text input efficiency can be improved to a large extent by leveraging the context-aware capabilities of LLM and the word replacement capabilities of FillMask.

Simulation results also show that SpeakFaster performs best when provided with the 5 best phrase options, and that conversational context is critical to LLM's predictive power.

Figure | Simulation experiment results show that SpeakFaster UI can significantly save keystrokes.

In addition to simulation experiments, the research team also conducted user studies to test Speffectiveness of eakFaster.

The research team measured action savings (the number of keystrokes saved compared to the full set of characters to type), usability (typing speed per minute), and the learnability of SpeakFaster UI (how much practice people need to To get used to using the system) these three indicators are used to evaluate the SpeakFaster interface.

In terms of motion savings metrics, SpeakFaster provides significant keystroke savings for ALS eye movement users and non-AAC participants compared to the traditional baseline. For non-AAC users, SpeakFaster achieves 56% keystroke savings in scripted scenarios and 45% keystroke savings in non-scripted scenarios. For ALS eye movement users, SpeakFaster also saves significant keystrokes during the scripting stage.

Figure | Left: KSR for non-AAC users. Right: KSR in ALS eye movement users. The orange and purple bars show the KSR when using the SpeakFaster system, and the blue and green bars show the KSR when using the baseline smart keyboard.

In terms of usability metrics, overall text input speed is comparable to traditional typing speed for non-AAC users. However, in a laboratory study of an ALS eye movement user, SpeakFaster increased typing speed by 61.3% during the scripted phase and by 46.4% during the non-scripted phase.

Figure | Left: For non-AAC users, there is no significant change in overall text input speed between scripting and non-scripting phases. Right: For ALS eye movement users, SpeakFaster significantly improves the speed of scripted and unscripted phases.

In addition to saving actions and improving typing speed, the learning curve and the cognitive load introduced are also key indicators for evaluating typing systems and UIs. Although ALS eye movement users have a slightly slower initial learning curve when using SpeakFaster compared to non-AAC users, ALS eye movement users can reach a comfortable typing speed in just 15 practice conversations.

Figure | By providing 6 practice conversations for non-AAC users and 15 practice conversations for ALS eye movement users, participants were able to learn the SpeakFaster system to reach a comfortable typing speed of 20-30 words per minute ( shown on the y-axis).

Although various experiments have shown that SpeakFaster has unique advantages in helping patients with severe movement disorders practice efficient communication, the current research still has problems such as a single model language, limited phrase length, high service costs, and a relatively large research sample size. Wait less for limitations.

AI is improving the lives of people with disabilities

SpeakFaster is not the first AI project dedicated to improving the lives of people with disabilities.

In 2019, BrightSign launched an AI-based Smart gloves. The gloves have a built-in predefined sign language library that can convert gestures into speech. Combined with machine learning algorithms, it allows people with hearing or speech impairments to communicate in two directions and communicate directly with others independently. People with limited mobility, such as stroke patients or those with hearing loss, can It can also be used by bereaved elderly people.

In 2021, the Chang Lab team at the University of California, San Francisco used a brain-computer interface for the first time to help an aphasic man BRAVO1 who has been paralyzed for more than 15 years. The ability to "speak" was restored. This research uses deep neural reading technology to implant electrodes in the subjects' brains to help subjects exchange information between their brains and devices, thereby restoring their ability to communicate with the world (click to view details). )

In 2024, OpenAI launched a small-scale preview of the Voice Engine model, which uses text input and a single 15. Seconds of audio samples to generate natural speech that closely resembles the original speaker has helped restore the voice of a young patient who had lost the ability to speak fluently due to a vascular brain tumor (click to view details)

2024. In 2017, Professor Su Hao's team from North Carolina State University and the University of North Carolina at Chapel Hill proposed a new method for robots to learn control strategies through reinforcement learning in a computer simulation environment, which is expected to greatly improve the performance of the elderly and people with mobility disabilities. and the quality of life of people with disabilities. (Click to view details)

I believe that AI will further improve the lives of people with disabilities in the near future.

Online Consultation