For contemporary professionals, "working anywhere and on the go" is the norm.
One second I was walking in the city, the next second I took out my laptop, put it on the steps and started working on the spot; there were always a few people in the waiting room until the second before boarding the plane. Typing words quickly on the notebook.
This is a modern society with information explosion and ubiquitous connections. While providing convenience to human life, it comes with helplessness.
Modifying key information before signing a contract, checking for omissions and filling in gaps before publishing a manuscript, and various emergencies have made mobile office processing of documents a necessity for today's workplace.
Mobile phone screens are getting bigger and bigger, making it easier for people to read longer documents on the move. However, the steps of "processing" still hinder the hands and feet of working people, and the notebook they carry with them is still on their shoulders. A heavy burden.
The arrival of large models has brought the possibility of liberating the shoulders of people in the workplace, and has also brought a breakthrough for the industry and academia to overcome the "hard nut" - document AI.
A tough nut to crackIn 1992, Adobe co-founder Charles Shimoni invented PDF, together with Microsoft’s Charles Shimoni ten years ago ·Word invented by Shimoni jointly opened the era of digital office.
The mountains of paper document work were condensed into one screen, which once pushed people's efficiency in document processing to a new level. But neither Charles would have imagined that decades later, these digital documents would in turn drag down the productivity of people in the workplace.
Engineers need to read hundreds of pages of technical documents before typing code. Analysts writing industry reports need to extract cocoons from dozens or hundreds of company financial reports to obtain common trends. They are the incarnation of paper documents in the physical world. The pile of digital documents that cannot fit on a 14-inch screen can also overwhelm countless professionals.
With the development of artificial intelligence for decades, industry and academia have long tried to use AI technology to assist people in document processing. From the earliest rule inspiration to the introduction of weights and data training in neural networks, the basic idea is "human induction of rules -> conversion into machine language (functions and codes) -> teaching computer rules".
However, as the scope of work that people need to process on computers becomes wider and wider, there is no end to the complexity of documents. Limited hardware computing power and algorithms that still need to be improved make most documents Smart is not so smart.
For example, once the article is too long or contains too many graphic elements, the summary provided by Document Intelligence is often not accurate enough, or even irrelevant to the document content;
Or the user hopes to find the corresponding answer to a certain question in the document content. Although the document intelligence will give the answer, it cannot trace the source to the original text, making it impossible for the user to further determine the accuracy of the answer.
At the same time, as digitization penetrates into all walks of life, document types become more and more complex, and each type of document corresponds to aProcessing rules, one rule corresponds to the establishment and subsequent debugging of a set of algorithms, and the entire process gradually becomes an unaffordable and uneconomical arduous task.
Autonomous driving is well known to be difficult, and the challenges faced by document AI are no less than those faced by autonomous driving.
The first is data. According to IDC, the global data volume will increase from 33ZB to 175ZB from 2018 to 2025, 80% of which is unstructured data [1], including images, audio, sensor data, etc. The common characteristics are that there is no unified format and lack of definitions , difficult to characterize;
The second is understanding ability. The complex semantics in the language and the structure of the document require stronger logical deduction skills to understand and interpret. For example, from "The task was very difficult, but Xiao Ming persisted in completing it" to "Xiao Ming persevered", and another example is the first-level title and Progression between second-level titles, one-to-one correspondence between table headers and data, etc.
The third is professionalism. Especially in highly professional vertical industries, such as professional papers, financial reports, lawyer files, etc., interpreting such documents requires long-term accumulation of professional knowledge.
Until the emergence of large models, data without manual annotation and self-supervised learning mechanisms gave computers the ability to evolve autonomously.
As one of the hardest things to crack in the industry and academia, document AI has had the opportunity to make a breakthrough. Xiaoyi, the system-level AI assistant of HUAWEI Mate X6, took the lead in demonstrating this.
Innovative exploration to break through the difficulties of literature reviewOctober this year" "Native Hongmeng" HarmonyOS NEXT officially debuted. At the Huawei Mate brand ceremony in November, the software layer took on a new look, and Xiaoyi, the re-evolved smart phone, focused attention.
HUAWEI Mate X6’s large screen and light weight are designed for mobile office use. HarmonyOS NEXT deeply integrates AI with the operating system, giving Xiaoyi breakthrough performance when processing complex documents.
Academic papers are typically complex documents that contain a large number of abstract concepts, mixed with difficult and obscure textual expressions, and complex data charts. Reading with the naked eye and understanding with the human brain are time-consuming and laborious.
However, writing a paper cannot escape reading and citing a large number of previous papers, especially the literature review process, which is a well-known "long-standing difficulty" in academic circles.
Use the file manager of HUAWEI Mate
Users can ask questions about unfamiliar concepts, and Xiaoyi will give answers. By using Xiaoyi’s ability to continuously question and answer precise sources, by highlighting relevant text, users can directly navigate to the original text for further understanding, or Continue to ask questions based on the answers. This move is likeHuman beings have the habit of marking data sources and sources when writing papers in order to pursue rigor and accuracy and to dispel readers' concerns about the accuracy of the generated content. At the same time, with the help of the highlight traceability feature, readers can quickly find the paragraphs they want to know more about, further improving reading efficiency.
During the reading process, if you encounter unfamiliar concepts, you don’t need to search on the search engine, just ask Xiaoyi directly, from "the relationship between deep learning and neural networks" to "the difference between multi-modal LLM and general models" ”, they can all give quick answers.
Thanks to the multi-modal content perception capability, even if the paper contains complex charts and large paragraphs of abstract text expressions, Xiaoyi can convert it into a beautiful and easy-to-read graphic abstract, which is lifelike. , easy to understand. For example, let Xiaoyi interpret the paper, and use Xiaoyi to generate beautiful and easy-to-read graphics and text presentation based on the document, which can more clearly give the difference between abstract concepts and facilitate understanding. It has to be said that when it comes to processing complex documents such as unstructured, diverse tables or mixed graphics and text, Huawei's layout understanding model has improved its content analysis capabilities, giving Xiaoyi a greater advantage.
Tabular data is also a form of expression that often appears in academic papers, but it has problems in presenting trends and differences that are not intuitive enough.
The system-level document assistant built on Hongmeng system also gives Xiaoyi the ability to intelligently perceive user intentions. When encountering table data similar to "LLM historical parameter scale", it can copy it Xiaoyi fully recognizes most of the text and extracts the data to generate an "LLM parameter scale scatter plot/straight line chart" to show the trend. In one sentence, a chart can be generated based on the document content, saving us the need to pour it into the data table and manually pull it out. The data is plotted. And this is the result of the deep binding between the system and AI.
The reason why Xiaoyi can correctly identify documents and give accurate summaries is also due to Huawei's layout understanding model. It is precisely because of the empowerment of this ability that Xiaoyi can do this It can accurately highlight source traces, generate charts in one sentence, and other actions. In fact, it can also understand the page number of the document. Even if it is to delete or retain a certain page of the document, Xiaoyi can just move his mouth and let Xiaoyi do it for him.
Domestic scholars are also often troubled by documents that are all in English. English proper nouns are used throughout the text. Even if the translation APP is kept, the reading speed will inevitably be greatly reduced. Xiaoyi can not only provide full-text translation, but also generate Chinese abstracts and provide Chinese questions and answers to help users understand the core of the document faster.
For dozens of pages of academic papers, Xiaoyi can greatly shorten the reading time, liberate yourself from complicated information, and more efficiently obtain arguments and data that are truly valuable for your research.
Financial reports are also filled with a lot of text, data and charts. When analysts write industry reports, they need to comb through dozens or hundreds of financial reports and extract key data for linear regression analysis.
With Xiaoyi, if you want to further analyze the tabular data in the financial report, you can also use Xiaoyi's multi-modal content to accurately perceive and interpretWith the ability to obtain analytical drawings, you can also directly generate custom tables in one sentence, such as "draw the first quarter and second quarter income into a pie chart" and other requirements.
Xiaoyi can easily play the role of "research assistant" and "assistant analyst". The most important thing is as mentioned above. It relies on the industry-leading layout understanding model.
The layout analysis model is the basis for complex document processing. It divides the document according to regions, locates key information such as titles, text, pictures, tables, etc., and then interprets it. It is the prerequisite for subsequent operations such as table extraction. Its precision determines the accuracy of these operations.
The layout understanding model behind Xiaoyi improves the ability to parse documents and achieves end-to-end optimization by adding technical methods such as long sequences, RAG, and Grounding.
Among them, long sequence refers to the context in the document, that is, "foreground summary" and "background knowledge". The longer the context that the model can use, the stronger its ability to understand the content, which directly affects summary generation. , translation and other functions.
RAG is retrieval enhancement generation technology. When a user asks Xiaoyi a question, RAG technology is responsible for retrieving related information from various data sources and providing it to the model, and then integrating this information to answer the user's question.
Grounding is used to reference specific concepts and events to ensure the reliability of the content generated by Xiaoyi by establishing connections with the real world, and to more accurately understand and describe multimedia data such as images and videos.
The powerful model gives Xiaoyi industry-leading document processing capabilities, which is part of its comprehensive capabilities.
Online document assistantWhat is required in the workplace is not only work ability, but also subjective initiative and responsiveness are regarded as manifestations of responsibility.
Similarly, users’ expectations for document smart assistants are not only the “hard power” of being strong enough to process documents, but also the “soft power” of being easier to open, being available on call, etc., and also the user experience. It's important.
And this is where Xiaoyi’s profound “internal strength” lies.
The deep integration with HarmonyOS NEXT allows Xiaoyi to be promoted to a system-level AI assistant, making it more flexible to allocate system resources such as computing power, threads, and memory.
The waiter, who is equivalent to a waiter, has been promoted to become a housekeeper in charge of personnel scheduling, purchasing and other matters, and truly has the ability to take over the trivial matters of users' lives and work.
Xiaoyi’s document intelligence has therefore become “available on call”, making the interaction with users more natural and condensing the calling method into a few simple and daily actions:
Drag - For example, academic papers sent by tutors through email attachments that are required to be read, or articles recommended for reading by other people during academic discussions on social software.Any contribution can be directly dragged to the Xiaoyi navigation bar at the bottom of the screen. Xiaoyi can recognize and interpret it, and generate a summary. Users can ask questions and follow up on the summary.
Circle - When reading literature, if you encounter difficult-to-understand theories or professional terms, you can use your knuckles to circle the content. Xiaoyi can identify the circled content and quickly recommend "Ask Xiaoyi" "Art", "Image Search" and other high-frequency functions. Users can click "Ask Xiaoyi" to ask questions about Xiaoyi's theory.
Call - When reading materials and papers in an environment that needs to be kept quiet, such as a library, with the help of the "Xiaoyi Whisper" function, you can use the document AI function without disturbing the people around you, just need Just lift the phone and speak directly about your needs about 5 centimeters away from the microphone. Even if you whisper, Xiaoyi can hear it.
A more natural interaction method + more powerful processing capabilities make Xiaoyi’s document capabilities truly practical, helping scholars and more professionals to liberate themselves from repetitive and arduous work and instead Put more energy into matters that can create greater benefits and reflect your own unique value, so as to enhance your sense of gain and accomplishment.
These emotional values are amplified into life and will also give people a real sense of the vision of "AI changes life".
2024 is regarded as the first year of large models, and AI assistants have high hopes as the most direct medium for people to use large models. People hope that AI assistants can actually share trivial affairs and work for them.
Processing complex documents has been a difficult problem that has plagued the AI industry for many years. It is also the epitome of people's heavy and inefficient work in modern society, and has become a mountain that AI terminals and intelligent agents cannot avoid.
The breakthrough of Xiaoyi Document Assistant provides a brand new template for the industry, and also marks an invisible AI super entrance-document, which is opening the door to countless industry participants.