In 2008, Satoshi Nakamoto envisioned the first distributed public ledger (database). This system was built upon cryptographically stored blocks of data. Each of these blocks of data secured the other, ultimately creating a network of immutable and tamper-proof databases.
The initial use-case for this distributed ledger technology was directed toward creating digital currencies. However, the influx of developer activity on the blockchain has resulted in a plethora of novel use-cases, reaching far beyond the scope of what Satoshi Nakamoto originally envisioned.
In this paper, we’ll be highlighting the conceptual and architectural differences between blockchains and traditional database systems to find an answer to the question: “Can current blockchain systems efficiently compete with traditional databases?”
There’s no denying that digital instruments, along with the invention of the internet, have contributed to revolutionary changes in data-driven fields such as banking, real estate, and healthcare. Many of these fields have become dependent on the utilization of databases. Data dependency emphasizes how reliable, secure, and accessible data must be to keep up with technological advances. As a result, data has become one of the most valuable commodities for businesses and individuals.
In traditional systems, data is generally structured within a centralized database, consisting of a client-server relationship. The client can access the server to request or modify data– as long as the client is granted permission by the central authority governing the database server. Centralization is a critical component of traditional databases as it facilitates data management and provision, consequently making the process of accessing data relatively faster than its counterpart (decentralized databases). With that said, there are drawdowns to a centralized architecture.
Due to how reliant the world has become on the transfer of data, databases have become an obvious target for cyber attacks. The more accessible a server is, the more vulnerable it becomes. The wide variety of attack vectors made available through a client-server architecture makes it difficult to create and maintain defensive mechanisms. As long as the control of a server is maintained through a centralized custodian, attack routes can come from within an organization, or from an outside entity.
Whatever the case, the confidentiality of data is broken in most breaches, jeopardizing intellectual property, and leading clients to lose faith in a company. What’s more, some attack vectors don’t even need to involve breaching the data directly. By flooding a database with fake requests, an attacker can make it so the server is too busy dealing with these counterfeit requests to process legitimate ones, essentially ceasing access to the data (denial-of-service attacks, or DoS).
Blockchain technology addresses the primary pain points of traditional databases through its use of an immutable, tamper-proof distributed ledger technology (DLT). Before we can understand the key differences and benefits of DLT versus traditional databases, it’s essential to understand that there are different types of blockchains (summarized in Figure A).
Public Blockchains (Bitcoin, Ethereum, Litecoin)
Also known as permissionless blockchains, public blockchains are decentralized networks where any pseudo-anonymous individual has the ability to participate in the administration of the blockchain network by becoming a node operator. Public blockchains are ideal for those who seek a completely decentralized system.
Public blockchains were the original frameworks; they’re tamper-proof and immutable, thus protecting against data corruption. Furthermore, public blockchains are decentralized without any fixed point of failure, reducing the number of routes an attacker can take relative to traditional databases. Finally, public blockchain systems are publicly verifiable and transparent, enabling use-cases such as decentralized identity.
While complete decentralization can be viewed in a positive light, there are some cons to these permissionless blockchains. Public blockchains are unregulated, making them difficult to integrate into internal structures guided by strict criteria. It’s also worth mentioning that speed is one of the most significant drawbacks of public blockchains due to how long it takes to process single blocks of data. The disadvantage of having an immutable architecture is that if fundamental issues were to be found, the options for resolving the issue are extremely limited and could result in having to completely re-create/fork the blockchain. These issues can be summarized by saying that public blockchains, at least in their current state, have scalability issues compared to their competitors.
For an in-depth read on Public Blockchains, I recommend reading this paper.
Private & Consortium Blockchains (Quorum, Ripple, Hyperledger)
Private blockchains are permissioned networks that are governed by a central authority. The concept of a private Blockchain implies that a person or organization must be invited to join the network. To put it another way, it’s a closed blockchain that can only be accessible by those who have been granted permission.
Admins can also do database administration, such as speed optimization and reducing the blockchain’s size to increase performance. Consortium blockchains, also called Federated blockchains, are another form of permissioned blockchains that share very similar features to private ones. The main difference is that Consortium networks are slightly more decentralized. Unlike private blockchains, where you have one central authority in control, consortium blockchains are governed by multiple organizations working together. Permissioned blockchains are ideal for companies that want the benefits of DLT with the security of a closed-off ecosystem.
Private blockchains are naturally smaller than public blockchains, enabling a much faster system as the amount of data being processed is scaled down in multiples relative to its public counterparts. Additionally, private blockchains are considered safer as the number of network actors is limited to who the administrator allows access. Generally speaking, private blockchains are more scalable within their purpose than public blockchains.
Private blockchains are the centralized counterpart to public blockchains. While central control has its benefits, it also has drawbacks. Private blockchains require trust, as the credibility of a network is tied to the credibility of authorized nodes. Fewer nodes introduce increased security risks, as it becomes easier for a bad actor to gain access.
For further reading on how private blockchains function, I recommend going through MultiChain’s whitepaper, found here
As the name suggests, hybrid blockchains were designed with elements from both private and public blockchains. The hybrid blockchain architecture can be identified by the fact that it is not accessible to all users while remaining highly customizable and providing classic capabilities like security, integrity, and transparency. Hybrid blockchains are ideal for organizations that want the security and transparency that comes with public blockchains but want to maintain control through centralization elements of private blockchains.
Hybrid blockchains share many of the same advantages as private blockchains, such as faster transactions, cheaper network fees, & increased security. In addition, while a private organization can own a hybrid blockchain, it cannot change transactions, creating a trustless environment where transactions are easily verified.
Hybrid blockchains are only partially decentralized; they lack some transparency that public blockchains have.
A majority of current permissioned blockchain systems are presented as stand-alone end-to-end transaction processing systems. An organization that wants to tap into the benefits of blockchain technology would have to integrate an entirely new database management system, resulting in high costs and tremendous effort.
To solve for this, systems have been created that successfully incorporate elements of both databases and blockchain technology. This is achieved through the utilization of two strategies for efficiently creating purposeful hybrid systems:
- The first strategy is to build a blockchain system from the ground up before adding database functions to it later.
- The second strategy begins with a database system and later adds a blockchain structure. In this next section, we’ll briefly discuss both strategies, along with current integrations and solutions.
Database Systems Optimized With Blockchain Features
As we know, most organizations currently exist outside of the blockchain, powered by their centralized systems. The easiest way for these organizations to take advantage of the benefits that come with distributed ledger technology is to integrate some features into their existing systems. These systems use traditional database systems, and utilize transaction-based replication. Each node within the network governs a database and executes transactions, achieved through the utilization of a consensus mechanism. A few examples of current integrations and solutions are:
BigchainDB is a blockchain-enabled database that aims to combine the best of both DLT and traditional database systems. It is built on top of MongoDB, which derives features like permissions, scalability, and a complete query language that can query all block data(transactions, metadata, etc).
For further reading on BigchainDB, refer to their whitepaper here
ProvenDB is a similar blockchain-enabled database that utilizes MongoDB, for further reading on ProvenDB, refer to their lite paper here.
Cassandra, which was first designed in 2008 using Java, is a distributed, open-source NoSQL database that achieves linear scalability through the use of a distributed architecture. Organizations can manage enormous amounts of data across hybrid and multi-cloud settings thanks to Cassandra’s ability to handle petabytes of data and thousands of concurrent operations per second,
For further reading on Apache Cassandra, refer to their docs here.
ChainifyDB is different than the other two examples we’ve mentioned so far in that it doesn’t have its own database; Instead, it adds a blockchain layer on top of existing databases which then plugs into the ChainifyDB network, where a record of all of the data is stored and managed by database nodes. All other database nodes are notified about the new addition when a record is added to one database. The records kept within ChainifyDB are decentralized, immutable, and transparent since they are only written to the databases after consensus is reached.
Blockchain Systems Optimized With Database Features
Regarding accessibility and query performance, blockchains were not originally designed to compete with traditional database systems. However, they are widely employed for managing essential data due to their built-in security measures. With that said, the current state of blockchain technology is extremely limited compared to its predecessor. The last hybrid system we’ll mention are blockchain-native systems that utilize a separate database layer to leverage traditional infrastructure for performance and capability. A few examples of current integrations and solutions are:
CovenantSQL is a blockchain-based Byzantine Fault Tolerant relational database built on top of SQLite, and serves as an infrastructure that developers can use to build decentralized applications. CovenantSQL inherits qualities such as decentralization, transparency, and immutability from its native blockchain architecture while simultaneously having full SQL support for optimized performance and scalability,
For further reading on CovenantSQL, including all of its unique features, refer to their documentation.
Postchain is a blockchain framework designed by ChromaWay in Sweden with permissioned blockchains in mind(such as Hyperledger). Postchain’s architecture uses a relational database as a blockchain data store while simultaneously defining blockchain logic using SQL stored procedures. Postchain has a network of nodes that maintain datasets by utilizing a proof-of-authority consensus mechanism. These sets of data are then stored in an SQL database. It’s also important to note that the logic behind Postchain’s transactions can be easily defined in SQL code,
For further reading on Postchain, including an in-depth look at their architecture, refer to their docs here.
FalconDB is a blockchain-based collaborative database aiming to address the primary pain points of shared databases. Conventional shared databases generally utilize a central server for cooperative storage, which could be easily compromised and become malicious. Furthermore, the efficiency of traditional systems also comes into question as their performance is derived from how much data is within the database and the server’s capabilities. A more capable server generally requires more hardware, making it very costly for individual users to participate. FalconDB was created to solve these issues by leveraging its architecture to minimize hardware requirements and ensure security guarantees without sacrificing query performance.
For an in-depth read on FalconDB, refer to this report written by researchers from UCB and UoU.
Querying Data in Blockchains
Venror Vinge, an emeritus professor of mathematics at San Diego State University, stated in a talk about “A Singularity Sensation” that humankind is currently experiencing a problem that he calls “data glut”- We’ve produced more data than we are capable of organizing without aid. Therefore, processing methods such as indexing to facilitate deriving conclusions from data is the best-case scenario for efficiently utilizing databases.
Imagine that you were given a 1500-page book and were told to find an answer to a specific question. If this massive book didn’t have an index, you would have to scan through each page individually until you find the piece of data you need. However, if the same book has an index, all you would have to do is reference the index list, massively reducing the amount of time you would be spending otherwise. Databases work similarly–Querying specific data points without using an index is highly inefficient as it could result in using up a considerable amount of time.
Blockchains are decentralized public ledgers, effectively serving as massive databases where transactions are recorded to this ledger based on asymmetric cryptography and distributed consensus mechanisms. These databases consist of transactions that are recorded and stored in blocks secured by consensus algorithms (PoW, PoS, etc.). As mass adoption takes place and developer activity increases, efficient data processing methods will be crucial for the reliability and scalability of blockchain networks. With that being said, querying blockchain data comes with its own set of issues.
Challenges of Querying Blockchain Data
A blockchain is only a database in the most generic sense of the word by being a collection of stored data. Unlike traditional databases, where data is stored in one place(generally), blockchain data is stored sequentially, making it inheretingly challenging to query.
In addition to how relatively challenging querying blockchain data can be, it can also be expensive and time-consuming since each block only stores the “hash” of the previous block and is supplied with new data points. Finding an individual data point requires a lot of time and resources, consequently reducing performance.
All of the data stored in traditional databases can be queried by using specific querying languages known as Structured Query Languages (SQL), which enables scalability as SQL’s compatible with most modern enterprise systems.
Unfortunately, there are currently no query languages compatible with blockchain data structure. The lack of a query language, and the distribution of data amongst blocks, can lead to issues with how data is interpreted and relayed within a node(data entanglement). With that said, there are a few methods that are currently being utilized, such as:
Centralized API Services
Numerous businesses offer central databases and APIs for accessing blockchain data. These solutions work better for niche use-cases where centralization is preferred as it’s generally secure and significantly faster than decentralized alternatives. The drawback would be the potential for service outages or restricted access to the data related to external problems on the database.
The Graph was founded in 2017 by three blockchain developers who experienced the challenges of querying blockchain data firsthand while building dApps. The Graph is an open-source protocol that introduces an efficient and decentralized way to index and query blockchain data, and currently supports 19+ ecosystems. It achieves this by utilizing subgraphs for data storage and the GraphQL API for data requests.
For further reading on The Graph, refer to their documentation here.
In many application sectors, blockchains are the new standard technology for storing data in an unchangeable, tamper-resistant manner with high availability. However, with that said, a blockchain is more of an abstract idea than a particular application of technology. As a result, there are various ways to construct a blockchain, each with unique benefits and drawbacks. The potential for applying blockchain technology to different real-world sectors is vast. Yet, to compete with traditional database structures, there are still numerous research efforts to be made on security, data history, and querying capabilities to realize blockchain technology’s potential fully.