News center > News > Headlines > Context
The conflict between centralized data architecture and Web3 decentralization concept
Editor
20 hours ago 2,349

Author: Michael O’Rourke Source: cointelegraph Translation: Shan Oppa, Golden Finance

To realize the full potential of open data and enjoy low-cost large language model (LLM) training, convenient research data sharing, and unstoppable DApp hosting, we must transition it from centralized infrastructure to decentralized architecture.

At present, open data is the main driving force for the global emerging technology economy, with a market valuation of more than US$350 billion. However, many open data sources rely on centralized infrastructure, which runs contrary to Web3's concept of autonomy and censorship resistance.

To unleash the full potential of open data, it is necessary to move towards decentralized infrastructure. Once the open data ecosystem turns to a decentralized and open architecture, multiple vulnerabilities in user applications will be resolved.

The application scenarios of decentralized infrastructure are very wide, including:

•Host Decentralized Applications (DApp)

•Running trading robots

•Sharing research data

•LLM training and reasoning

In-depth study of these use cases, we will find that decentralized architectures use open data more efficiently and practically than centralized infrastructure.

LLM training and inference cost is lower

The release of open source AI DeepSeek once caused the evaporation of US technology market by $1 trillion, fully demonstrating the power of open source protocols. This is a warning that we should focus on a new global economy with open data at its core.

Currently, closed, centralized AI models are expensive to train, which also affects the ability of LLM to train and generate high-precision results. For example, the final training cost of DeepSeek R1 is only about $5.5 million, compared to the cost of OpenAI’s GPT-4 training cost of more than $100 million. However, the emerging AThe I industry still relies on centralized infrastructure platforms such as LLM API providers, which contradicts the philosophy of open source innovation.

In fact, hosting open source LLMs such as Llama 2 and DeepSeek R1 are both simple and inexpensive. Unlike stateful blockchains that require continuous synchronization, LLMs are stateless and only need to be updated regularly.

Although LLM is relatively simple to run, the computational cost of performing inference on an open source model is still high, because node operators require GPU computing power. But it is worth noting that these models do not require real-time synchronous updates, thus saving a lot of money in long-term operation.

The rise of general basic models such as GPT-4 has made the development of new product based on context reasoning possible. However, centralized companies such as OpenAI do not allow any random network to access their trained models for inference.

In contrast, decentralized node operators can act as AI endpoints to provide customers with deterministic data, thereby supporting the development of open source LLM. Decentralized networks lower the entry barrier by authorizing operators to start gateways on the network.

These decentralized infrastructure protocols process millions of requests on their license-free networks through open source core gateways and service infrastructure. Therefore, any entrepreneur or operator can deploy its gateway and enter emerging markets.

For example, a team can use decentralized computing resources to train LLMs on license-free Akash, which can provide customized computing services at 85% lower than centralized cloud service providers.

At present, AI companies spend about $1 million a day on infrastructure maintenance to run LLM inference services. This means that the AI ​​infrastructure market can reach about $365 million in annual size (SAM).

Data shows that market conditions are pointing to the huge growth potential of decentralized infrastructure, and the decentralized development of AI computing resources will bring greater innovation space to the industry in the future.

Accessible research data sharing

In scienceIn the field of academic research, data sharing combined with machine learning and large language models (LLM) has the potential to accelerate the research process and improve human life. However, data acquisition is limited due to the high-cost academic journal system. These journals selectively publish only studies approved by their committees and are mostly hidden behind expensive subscription fees and are difficult to access widely. With the rise of blockchain-based zero-knowledge (ZK) machine learning models, data can now be shared and computed in a trustless environment while protecting privacy without leaking sensitive information. Therefore, researchers and scientists can share and access research data without deanonymizing potentially restricted personally identifiable information.

To sustainably share open research data, researchers need a decentralized infrastructure that can reward them for data access and eliminate intermediaries. An incentive open data network ensures scientific data remains accessible outside expensive journals and private companies.

Unstop DApp Hosting

Centralized data hosting platforms such as Amazon Cloud Services (AWS), Google Cloud and Microsoft Azure are very popular among application developers. Although these platforms are easy to access, centralized platforms have a single point of failure risk, affect reliability, and can lead to rare but reasonable service disruptions.

In the history of science and technology, it is not uncommon for the Infrastructure as a Service (IaaS) platform to fail to provide uninterrupted services. For example:

•In 2022, MetaMask temporarily denied access from users in certain regions due to Infura's compliance with U.S. sanctions. Although MetaMask is decentralized, its default connections and endpoints rely on centralized Infura access to Ethereum.

•Infura customers also experienced interruptions in 2020.

•Solana and Polygon During peak traffic, the centralized remote procedure call (RPC) service is overloaded, causing network congestion.

In a booming open source ecosystem, a single company has difficulty meeting a variety of developer needs. Currently, there are thousands of Layer 1 blockchains, Rollup solutions, index services, storage on the marketStorage protocols and other middleware protocols, covering different segmented use cases.

Most centralized platforms (such as RPC providers) continue to build the same infrastructure, which not only creates friction but also slows growth and affects scalability, as the protocol focuses on rebuilding the foundation rather than developing new features.

In contrast, the success of decentralized social networking applications such as BlueSky and AT Protocol shows that users’ demand for decentralized protocols is growing. Abandoning centralized RPC and moving to open data access, these protocols remind us of the importance of building and adopting decentralized infrastructure.

For example, the Decentralized Finance (DeFi) protocol can obtain on-chain price data from Chainlink without relying on a centralized API to obtain price information and real-time market data.

The current Web3 market has approximately 100 billion serviceable RPC requests, with a cost of between $3 and $6 per million requests. Therefore, the total addressable market size (TAM) of Web3 RPC is approximately USD 100 million to USD 200 million per year. With the steady growth of the new data availability layer, RPC requests may exceed 1 trillion times per day.

In order to adapt to the development of open data transmission and enter the open source data market, it is imperative to turn to decentralized infrastructure. Open data requires decentralized infrastructure

In the long run, we will see a general blockchain client offloading storage and networking capabilities into a dedicated middleware protocol.

For example, Solana was the first to promote decentralized storage, first storing its data on chains like Arweave. Solana and Phantom are, again, the main tools for handling the traffic of TRUMP presidential campaign Meme token trading, an important moment in financial and cultural history.

In the future, we will see more and more data flowing through infrastructure protocols, which will make the middleware platform an important dependency on the protocol layer. As protocols become more modular and scalable, this will create space for open source, decentralized middleware integration at the protocol layer.

Let the centralized companyAs an intermediary for light client header data, it will become unfeasible in the future. Decentralized infrastructure requires no trust, distributed, cost-effective and censorship. Therefore, decentralized infrastructure will become the default choice for application developers and enterprises, promoting a mutually beneficial and win-win growth model.

Keywords: Bitcoin
Share to: