News center > News > Headlines > Context
Can AI generate podcasts?
Editor
2024-12-09 15:02 6,984

Can AI generate podcasts?

Image source: Generated by Unbounded AI

Some time ago, Google’s AI tool NotebookLM launched a dialogue generation function. Users upload files such as e-books, web pages, or reports, and NotebookLM can generate an English conversation between two people based on the file content, and the voice sounds very close to a real person. What's even more amazing is that the dialogue generated by NotebookLM is meaningful and seems to really be able to understand the content of the file.

Following this, AI companies of all sizes launched similar tools. Coze, the AI ​​development platform owned by ByteDance, announced the launch of an AI-generated podcast function that can replace human voices and timbres. The startup PodLM has also launched an AI podcast generator. In addition to customizing AI voices, it also supports publishing the generated podcasts to the podcast platform with one click.

For a time, a large number of posts emerged on social media, lamenting that "AI is subverting podcasts" and even "the podcast industry is going to die."

· In October 2024, NotebookLM added a new audio dialogue “customization” function

Unfortunately, all the above-mentioned AI tools cannot generate podcasts. What they generate are voice conversations, book readings, or material summaries, none of which are podcasts.

The core part of podcasts is human dialogue. The anchor talks about his real experiences, opinions and emotions. The listener also knows that there is a living person behind the voice. This is a trust that goes beyond technology and tools, and is a connection between people.

In the short term, AI will not be able to provide human connections. Because the dialogue generated by AI is not complex enough to perfectly simulate human emotions. The listener knows that there is an AI tool behind the sound. Moreover, AI only generates a voice dialogue from the received documents. The content is second-hand and the timbre is customized. Without first-hand experience, insights, and emotions, AI tools are only suitable for functional scenarios such as reading a book quickly.

However, AI is a long-term trend in changing the content industry. It can assist content creators and greatly improve work efficiency. In post-production, content distribution and other aspects, AI is bound to change the podcast industry.

In the short term, AI cannot generate podcasts

While discussing "AI "Can I generate a podcast?" Before, we first need to answer a question: What is a podcast?

Podcasting is fundamentally a form of media. Hosts record their own conversations, upload the conversations to the podcast platform, and then distribute them through RSS technology.

Among them, the core of podcasts is human dialogue. Human conversations include emotional expressions, impromptu interactions, first-hand experiences and insights, and the resulting chat atmosphere. Even AI can easily replace other podcast production steps, such as helping to write outlines, generate cover pages, one-click uploading, and convert speech to text. But AI still has difficulty simulating real human conversations.

This is a purely technical issue. The effectiveness of AI voiceThe results are not good enough. If AI speech can be completely inaudible, then listeners will not be able to tell whether it is a human or AI behind the sound. But it’s not just a technical issue, because it involves why people listen to podcasts. The main purpose of people listening to podcasts is not to obtain information efficiently.

For podcasters, efficiency is not everything. Human characteristics are more important than information efficiency.

In the Internet industry, a common view is that information efficiency comes first. Especially after the rise of ByteDance, Zhang Yiming’s belief in information efficiency has become a prominent doctrine. Zhang Yiming once said in an interview that the efficiency of information is more important than the display of information. The most important thing is to improve distribution efficiency and meet users' information needs. Products such as Toutiao and Douyin rely on recommendation algorithms to distribute content, which greatly improves the efficiency of information distribution and reception, thereby achieving commercial success.

However, podcasts obviously violate this point. Podcasts are not the most efficient medium for information. The information efficiency of public accounts and short videos is much higher than that of podcasts. What can be explained in a 2,000-word public account or a 5-minute short video can be replaced by a podcast, which will last about an hour.

The difference between podcasts and these media is the human voice, human emotions and human characteristics - the anchors and guests tell their stories calmly, with laughter, silence and subtle changes in tone, naturally express emotions. In recent years, the popularity of podcasts around the world has once again proved the importance of people. That is, people not only need text content and short videos with high information efficiency, but also need podcasts with low information efficiency but full of "human touch".

Of course, podcasts still need to improve information efficiency. For example, in a podcast episode, it is always a good thing if the host and guests can talk about the topic more concisely and clearly.

Only on the premise of maintaining the core advantage of human characteristics, podcast programs can improve information efficiency. Otherwise, podcasts without human characteristics, such as those emotionless manuscript reading programs, are essentially competing with WeChat Reading and Ximalaya Listening Books, and do not belong to podcasts.

Therefore, the real meaning of "Can AI generate podcasts" is, can AI simulate human characteristics?

Theoretically, as long as AI companies develop deeper emotional modeling, more delicate speech synthesis, and more natural dialogue systems, listeners will not be able to tell whether the voice is behind a human or an AI. Well, AI can certainly simulate human traits. Therefore, we pointed out in the previous article that "Can AI generate podcasts?" is a purely technical issue.

However, in the short term, AI companies cannot solve this technical problem. AI can generate smooth conversations, but it cannot simulate human characteristics for the time being.

The conversations generated by NotebookLM are very close to real people and can provide a chat atmosphere and a sense of companionship. However, its dialogue is too smooth and sounds like a broadcaster's voice, which is incompatible with the casual atmosphere of talk podcasts.

Also, the biggest problem with AI is that it has noThe method produces first-hand knowledge. Whether it is ChatGPT, NotebookLM, Doubao, or Kimi, they all generate second-hand information based on existing data on the Internet by learning and processing the correlation of these data. In essence, the content output by AI is the reorganization and expression of existing knowledge, and there is no original first-hand knowledge in it.

How can we produce first-hand knowledge? People need field work, talking to others, independent thinking and real life experience. They also need a little self-awareness, realizing that they are a subject, daring to draw conclusions, and generating new experiences, insights and experiences from many similar information. emotion.

AI is currently unable to do this. AI tools can only generate similar, second-hand information based on input content. The Wutiaoren band has a very distinctive style. Someone input Wutiaoren's lyrics into an AI tool to automatically generate a song. The reaction of Nishina, the lead singer of Wutiaoren, was that the AI-generated song was like "chicken essence and MSG sprinkled on a rubber band. There is nothing edible for humans at all. It is all plastic."

· Wutiaoren, bilibili< /p>

Podcasts are similar to music in that at their core they are human qualities. Just like AI is temporarily unable to generate music with a distinctive style like Wutiao Ren, AI cannot generate podcasts.

AI-assisted podcast production

If you ask the big model Claude, "Can AI generate podcasts?"

Claude's answer is:

It is suggested that AI can be used as an auxiliary tool: used to generate content frameworks and first drafts to assist in post-production as a supplement to human hosts, rather than a complete replacement

Continue to ask Claude, "Why do you suggest using AI as a tool for podcasting? auxiliary tools instead of Where are the main tools? ”

Claude gave specific suggestions on the division of labor:

The ideal collaboration model AI is responsible for: content planning and outline generation, real-time data query and supplementation, post-editing and processing of text, transcription and summary. Humans are responsible: the core content outputs emotional interaction and resonance, improvises and adjusts to grasp the overall quality of the program

AI is a long-term trend in changing the content industry. It will further lower the threshold of creation, allowing individuals to mobilize more intellectual resources, thereby generating more quantity Much content.

As part of the content industry, AI will certainly change podcasting. However, as Claude said, AI will serve as an auxiliary tool, not directly replacing humans for content output, but mainly changing the podcast production process.

In terms of word processing, writing program outlines and Shownotes are areas where large language models are good at. ChatGPT, Claude and Beanbao can all assist humans. In terms of post-processing, AI noise reduction and AI editing can greatly improve post-processing efficiency. Vocut, Phonic's AI noise reduction function is more useful than AU's built-in noise reduction function. Both Vocut and Cutting support the speech-to-text function. Users can directly edit text, and the AI ​​tool is responsible for editing the corresponding audio. There are also some podcast hosting platforms that support AI to generate cover images and AI to divide program chapters.

·Vocut AI editing function

AI will intensify this impact. It further enables all people with good expression skills to continuously and stably produce podcasts and express their experiences, opinions and emotions to the public as long as they can use AI tools without complicated pre-preparation and post-editing.

Not only on the content supply side, a more promising area of ​​AI is information distribution, where AI-driven recommendation algorithms are used to distribute podcast content.

The entire Internet has experienced a transformation in distribution methods, and content distribution has changed from a subscription system to a recommendation system. From the Web 1.0 era, users manually followed podcasts and received emails; to the Web 2.0 era, FaceBook, Toutiao and Douyin automatically recommended content for users. Blogs, which appeared at the same time as podcasts, have gone through a transformation from a subscription system to a recommendation system. Today, a large amount of traffic in the traffic pools of Twitter and Weibo (originally microblogs) comes from algorithm recommendations.

The original form of podcasting was a subscription system. The basic podcast format requires listeners to manually import the RSS link into a general-purpose podcast client so that they can subscribe and listen to the program. Obviously, this method is too complicated. The podcast platforms that have emerged in the past few years have adopted the mainstream subscription method of Internet platforms. YouTube, Spotify, and Little Universe do not require users to import RSS links. They only need to click follow to subscribe and listen to podcasts.

So, will podcasts further shift from a subscription system to a recommendation system? In media such as social media, official accounts, and short videos, Internet giants have verified that recommendation algorithms can fundamentally improve the efficiency of information distribution. Internet platforms with high information efficiency are bound to compete with Internet platforms with low information distribution efficiency.

Only when the content supply is insufficient, because there is too little content on the Internet, the subscription system will have information efficiency. At present, with the sufficient supply of podcast content, listeners have the opportunity to receive more content that interests them, not just the content they subscribe to. The information efficiency of the subscription system is too low, and podcast platforms will inevitably need methods other than subscription systems to assist in distributing content.

The real question is, can podcast platforms also use recommendation algorithms to improve the efficiency of information distribution?

It seems to be established at present. YouTube, Spotify, and Cosmos are all using recommendation algorithms to assist in the distribution of podcasts. YouTube is the world's largest podcast platform, and it mainly relies on recommendation algorithms to distribute content. Podcasts are just one of the massive content forms on the platform. Spotify also uses recommendation algorithms to distribute music. Music and podcasts are similar in that they both have human qualities at their core. Since music can be distributed using algorithms, podcasts can probably be distributed as well.

Little universe personalityAlgorithm recommendation

AI can do more and is expected to improve the efficiency of podcasts in receiving information.

Both Spotify and Apple Podcasts will transcribe verbatim transcripts and automatically divide them into chapters. These AI tools transform linear, audio podcast content into non-linear text information. It's like turning a river that can only flow from beginning to end into a map that can be viewed at will. Obviously, these methods make it easier for listeners to receive information and improve the efficiency of podcasts in receiving information.

· Spotify launches podcast transcription feature in 2023

The only question is, will the recommendation algorithm homogenize podcasts and thereby harm the diversity of podcasts? Will transcribing verbatim transcripts and automatically dividing chapters into chapters make podcasts more information efficient and less human? Or to summarize the question into one question, how will AI change the content ecosystem?

This is a question that only practice can answer.

Keywords: Bitcoin
Share to: