From writing to drawing, from music to video, in recent years, the tentacles of AI have extended to all walks of life at an alarming rate. content creation field. The podcast industry is no exception. AI is gradually getting involved in everything from planning to editing, and is even trying to challenge the status of human anchors.
In September this year, Google’s NotebookLM launched a feature called “Audio Overview” [1]. This feature can convert user-uploaded content into audio conversations, which are discussed by two AI anchors. The content is smooth and the sound quality is lifelike, as if there are really two people talking. When painting AI tools like Midjourney and Stable Diffusion came out, people started discussing whether painters would be replaced. The question posed by NotebookLM has not changed - will AI podcasts replace human anchors?
AI intervenes in the podcast industryAs long as there are application scenarios, with Only matching AI products can be born. The combination of AI and podcasting also follows this rule.
In Europe and the United States, where podcasting originated, the podcasting industry has already become a red ocean with fierce competition. In the Chinese-speaking world, podcasts have quickly shed their niche label in the past two years and are gradually becoming mainstream.
According to statistics, the number of listeners of Chinese podcasts in 2023 was 117 million, and is expected to increase to 134 million in 2024, which means that 12 out of every 100 Internet users will listen to podcasts. By 2027, Chinese podcast listeners may climb to 179 million. [2]
Accompanying the expansion of the audience size is the increasing number of podcast programs. Take the podcast platform Little Universe as an example. The platform will add 20,000 new podcasts and 200,000 single episodes in 2022, and 30,000 new podcasts and 300,000 single episodes in 2023, with an average annual increase of 50%. [3] The topics of the program range from food, clothing, housing and transportation to philosophy of life, from entertainment gossip to serious current affairs, it can be said to be all-encompassing.
At the beginning of 2024, Douban officially launched the podcast function. Users can mark and rate podcasts just like using Douban Movies and Douban Reading. This marks a further increase in the influence of podcasts in the Chinese world. At the end of October, Little Universe held an offline event in Shanghai. It was originally planned to receive about 7,000 viewers, but it ended up attracting nearly 20,000 people. The appeal of podcasts in big cities is evident. [4]
As more and more people become accustomed to obtaining information by “listening”, domestic social media platforms such as WeChat, Weibo, Douban, etc. have also increased their attention to audio content and Invest. According to the survey, in addition to professional podcast platforms, the three most commonly used content platforms by Chinese podcast listeners are Bilibili, Xiaohongshu and WeChat official accounts. [5]
Although as long as you have a mobile phone that can record, you can record a podcast, but it is not easy to produce a program of acceptable quality. It usually requires planning, recording, editing, publishing and other stages. Data shows that in 2024, the average net working time of Chinese podcast creators per program will be as high as 12.9 hours, of which the editing process takes about 4.5 hours, accounting for more than one-third. [6]
The emergence of AI tools has brought new possibilities to creators. In particular, surveys show that more than 80% of Chinese podcast creators mainly operate independently or collaborate with friends, and nearly 70% of creators also need to play multiple roles such as anchors, editors, and operators. [7]
According to the JustPod survey, nearly half of the creators said they have used AI tools to assist creation, and about 40% of the creators have not tried it yet, but expressed their willingness to try it. Many creators mentioned in the survey that AI tools have brought significant efficiency improvements to their early planning and data collection. [8]
In addition, AI has also begun to intervene in more complex and time-consuming editing work. For example, the AI tool Descript can automatically transcribe audio into text and automatically remove common mantras. Users can directly edit the corresponding audio clips by deleting text paragraphs. Another tool, Auphonic, can automatically adjust volume, reduce noise, and automatically remove redundant pauses and repeated words to improve post-audio quality.
AI has also revolutionized the communication side of podcasting. Podcasts are more complex to retrieve than text or video content. This year, the Little Universe platform launched the AI search service "Ask Little Universe". After users ask questions, the service can provide an intelligent summary of the question, and display the summary and notes of the relevant podcast programs. It can even be accurate to a specific time period and directly Generate listening link.
Picture: Feedback after entering Beijing in "Ask the Universe". You can see an introduction to Beijing and related programs.
Picture source: Screenshot of the "Ask Little Universe" website
AI anchor turned outIf AI tools such as editing are like powerful assistants, helping creators to produce programs more efficiently, then the emergence of AIGC (artificial intelligence generated content) vaguely shows the potential to replace creators. potential. As long as some topics are provided, this cutting-edge AI podcast generation tool can independently complete the entire process of podcast program creation.
These tools can not only accurately extract key information, but also cleverly disassemble and rearrange the original manuscripts to simulate the communication scenes of real people, transform these contents into smooth and natural spoken dialogue, and give the program a vivid emotions and interactions.
Google's NotebookLM has this function. As long as the user provides simple content materials, NotebookLM can automatically generate audio programs.Not only does its AI anchor speak clearly, the voice is natural and smooth, the intonation is real, the modal particles and pauses are quite accurate, and it is even better than the vocal expressions of many real people. In addition to general factual statements and opinion expressions, AI anchors can also joke and improvise interesting metaphors, acting very much like real people.
For example, after entering a piece of text about traveling in Beijing, NotebookLM can generate an audio conversation between two people based on the text, talking about their feelings about traveling in Beijing, as if the two voices came from real people, and they really Been to Beijing.
Picture source: Screenshot of NotebookLM
With the stunning debut of NotebookLM, many technology companies followed closely behind Launch similar tools. For example, the Coze platform announced the launch of an AI-generated podcast function and supports human voice replacement. It is conceivable that there will be more and more such tools in the future and their functions will become more complete. The podcast market is expanding day by day, with more and more entrants. Will this prompt content creators to use AI to stand out in the fierce competition? Can AI replace human anchors?
The primary challenge facing AI is uniqueness, that is, how to create content that is both differentiated and personalized. After all, AI podcast generation tools rely on preset topics to automatically generate audio that simulates human conversation. For different listener groups, they often rely on the same database resources. With the same tools and data sources, how to create unique programs has become an urgent problem to be solved. Although some creators can rely on their unique creativity to create unique content with the help of AI, the threshold for this kind of innovation currently seems to be relatively low and can be easily copied by others because its core is driven by technology rather than personal creativity.
Given that the initial threshold for podcast production is not high and the quality of programs currently on the market varies, AI podcast generation tools do have the opportunity to replace those programs with low-quality content. From this perspective, the impact of AI in the podcasting field is quite similar to that in other fields - they often first impact jobs at the lower end of the industry, while those at the waist or top are less affected.
Aristotle once elaborated on three persuasive techniques, which map exactly three types of attractive content elements. The first is moral persuasion, that is, we accept someone's point of view because we like him or her; the second is rational appeal, which emphasizes that the information must be logical and practical; and the last is emotional appeal, which focuses on touching people's hearts with emotions. In the context of podcasting, these three strategies correspond to different types of programs, and they are also affected by AI in different ways.
Moral persuasion is reflected in the celebrity effect in podcasts. These well-known anchors can naturally attract a large number of listeners to their programs. Rational appeals correspond to knowledge-intensive programs, ranging from esoteric academic theories to practical travel tips. Emotional appeals point to programs that are engaging and emotionally stimulating, as long as they can touch the heartstrings of the audience.Whether it’s laughter or tears, sympathy or anger, it’s all success.
Among these three categories, listeners of knowledge-based podcasts are more looking forward to obtaining information of practical value. If the density or depth of information provided by live broadcasters is not as good as that of AI, they may be at a disadvantage in the competition with AI.
As for those anchors who are loved for their personal charm, their status is hard to shake. Similarly, podcasts that can profoundly touch listeners' emotions are difficult to be replaced by AI at this stage, because the deep emotional connection provided by real people is currently difficult for AI to accurately simulate and replicate. Just like a podcast about travel, listeners are more receptive to AI anchors sharing practical travel tips, such as which attractions are worth visiting, how to arrange itineraries efficiently, etc. However, when it comes to sharing personal travel experiences, such as stories encountered, it is difficult for listeners to accept Narrated by AI anchor.
However, in fact, podcast programs are often the product of the interweaving and fusion of the above three types in different proportions, and cannot be simply reduced to a certain category. Therefore, the impact of AI is also more multidimensional and complex. In addition, although the role of podcasts as a source of information cannot be ignored, for the majority of listeners, they do not expect to obtain instant and practical information from podcasts. The emotional comfort and companionship provided by podcasts are also of irreplaceable importance.
Live anchor: heartbeat soundObtained from information In terms of efficiency, vision undoubtedly has the advantage. Text can be skipped, videos can be fast-forwarded, and although audio can be played at double speed, it is usually limited to 1.2x. Once this range is exceeded, the listening experience will be greatly reduced. Given that podcast programs are often one or two hours long, even if played at 1.2x speed, it is still not possible to listen to them quickly in a short time.
This raises a question. In this era of fragmented communication, if the audience is just for quick access to knowledge or information, why should they listen to podcasts that can easily last dozens of minutes?
For many listeners, the appeal of podcasts is not limited to obtaining information. The sense of reality and companionship brought by the sound are equally important. The former points to the practicality of podcasts, and the latter points to the emotionality.
Although many listeners will care about the practicality of podcasts, this does not include immediate and practical information. Suppose a person is going to have a Western meal for friends tonight and wants to know some recipes and cooking techniques. He is unlikely to choose to listen to a podcast about Western cooking, but will use a search engine or social media.
Research shows that the best podcast programs in the minds of listeners are somewhere between pure chat and "listening to lectures". [9] This means that listeners are "picky" and they want to obtain useful information, but the density of this information cannot be too high or too low. In fact, this has an important connection with the listening scene of podcasts, which often exist as a kind of background sound.
Statistics show that only 3% of Chinese podcast listeners listen intently without being distracted by other matters. [10] The vast majority of listeners listen to podcasts simultaneously during daily activities such as commuting, housework, sports and fitness. In these scenarios, the integration of auditory information takes up almost no additional attention space. Economists generally believe that the essence of the media industry lies in the "attention economy", that is, content creators strive to attract the audience's attention and maximize their attention through various forms of content such as images and text. However, creators must realize that compared to visual information, auditory information occupies a relatively lower level of attention.
Therefore, even if an AI anchor is better than a real person in providing information, it will still be difficult to replace the status of a real person anchor. The reason is that the purpose of listening to podcasts is not simply to obtain information efficiently. Listeners who have extremely high requirements for information efficiency often do not choose podcasts as a source of information.
Some astute product developers may have insight into this contradiction. Some podcast programs have high content quality and high information density, but listeners may be very pressed for time. Now some large model applications have launched the function of "AI watching podcasts". As long as the podcast link is provided, AI can summarize the program content. In this scenario, the sound has been completely stripped away, and what matters is the message itself.
Data shows that multi-person chat and dialogue podcasts are the most commonly listened types by listeners. [11] Some scholars said that the word "authenticity" often appears when describing the impression that podcast sounds give to listeners. Authenticity is the core factor that affects listeners' evaluation and interaction with podcasts. Some listeners can even accept rough recording environments and The anchor is "spraying wheat" because these "mistakes" reflect the authenticity of the human voice. [12]
This is the most difficult place for AI to replace humans. Deep down in our hearts, human beings are always very concerned about whether the outside world is "sincere" to them. When seeking companionship, what we pursue is not only the act of companionship itself, but also the sincerity of being willing to accompany. It is not difficult to understand why people repeatedly think about whether their pets really like them? Can psychiatrists really feel the same way? After all, money can buy pets or psychological counseling services, but the deep affection of pets for their owners and the care of psychologists for helpers cannot be easily guaranteed by money. Audiences can believe that the live anchor is sincerely sharing what he or she is thinking, but can they trust the "sincerity" of AI?
In a podcast program, it is very important to clearly introduce the identity of the host. Listeners naturally want to know who the voice they hear comes from, and to a certain extent, the same content spoken by people with different identities may have very different effects - the fat man laughed at himself when he said he ate too much, while others said this Call it a personal attack. When almost all shows start recording, the anchor will introduce himself or show his background in detail on the information page. But how do AI anchors introduce themselves? How do we know the "background information" of the AI anchor?
According to statistics, after subscribing to a new podcast, the vast majority of listeners will listen back to previous episodes and enjoyAim for complete listening, whether done in one sitting or in multiple sessions [13]. To a certain extent, this shows that after the audience establishes an emotional connection with the anchor, they will further listen to other programs based on their recognition of the anchor. Therefore, even though the voice of the AI anchor is highly close to that of a real person, at this stage, it is difficult for people to establish the same emotional connection as a real anchor.
Recalling the offline activities of Little Universe mentioned above, if listeners have been listening to a live anchor for a long time, they will naturally look forward to seeing the anchor in offline activities to understand him more comprehensively. But what if it is an AI anchor? How should we “meet” AI anchors? What is the significance of offline activities? How can AI anchors achieve seamless online and offline connections like real people and bring that wonderful sense of familiarity to listeners?
This largely depends on society’s acceptance of the “human nature” of AI. Currently, most people have difficulty forming a real emotional connection to a machine or computer program. Therefore, even if AI can make people happy or inspire sympathy, it will be difficult to achieve the deep emotional resonance between people.
Of course, AI technology is constantly improving, and human perception of AI will also change in the future. Perhaps one day, people will treat AI like real people, and by then, it will be natural for AI anchors to replace real ones. However, by that time, AI may have replaced real people in more and more important areas of social life, and podcasting is just one of the relatively less important areas.