More from Aptos Labs
Keep up with the latest from Aptos Labs
Quorum Store: How Consensus Horizontally Scales on the Aptos Blockchain
By Brian Cho and Alexander Spiegelman
- Aptos Labs has launched Quorum Store, the first deployed implementation of Narwhal.
- Quorum Store significantly improves consensus throughput by removing the leader bottleneck.
- Quorum Store decouples data dissemination from metadata ordering, allowing validators to disseminate data asynchronously in parallel.
- Stress tests of Qurum Store show 12x and 3x TPS improvement in consensus-only tests and end-to-end processing, respectively.
Aptos Labs is thrilled to announce Quorum Store, the first deployed implementation of Narwhal. Quorum Store significantly improves consensus throughput by removing the leader bottleneck. Instead of relying on a single leader to broadcast big chunks of data as part of the consensus protocol, Quorum Store decouples data dissemination from metadata ordering, allowing validators to disseminate data asynchronously in parallel.
The Bottleneck in Blockchain Consensus
Most currently deployed blockchains implement a leader-based consensus. In these protocols, a leader drives progress and is responsible for disseminating user transactions as part of consensus proposals. As a result, leaders’ resources are constantly over-utilized while the rest of the validators are mostly idle. This means that the overall system throughput is dictated by the leaders’ ability to broadcast data. Note that rotating leaders does not solve this problem as it only shifts the bottleneck from one leader to the next.
Narwhal: Decoupling Data Dissemination from Consensus
As first observed in Narwhal, data dissemination can and should be decoupled from the consensus logic. The key idea behind Narwhal is that validators can continuously stream batches of transactions to one another in parallel. Then, when leaders propose blocks in consensus, they specify the set of (previously shared) batches to include in the block. This reduces the amount of data required to be shared on each block proposal as the leader does not need to list and send the transactions in each block, but can instead simply specify the batch identifiers (metadata).
Say one validator can broadcast T transactions per second. Then, the throughput of any leader-based consensus is bounded by T. However, with Quorum Store, n validators can broadcast nT transactions per second, increasing the bound on the consensus throughput accordingly.
As before, the consensus logic is responsible for providing a total ordering of transactions. However, with Quorum Store, leaders include only metadata in their proposals. This metadata is ordered in consensus and mapped to the corresponding transaction batches before execution. As a result, high-load consensus latency and throughput improve dramatically. Moreover, the message complexity of the consensus protocol becomes much less important for overall performance. Thus, in the trade-off between low latency and low communication cost, Jolteon is used, which improves Hotstuff latency by 33%.
Ensuring Liveness with Proof of Availability
The subtle part of decoupling data dissemination from metadata ordering is to ensure all validators have the transactions they need to execute. For example, if a malicious validator sends the data to some but not others and the metadata corresponding to that data is ordered by consensus, then all honest validators must be able to retrieve the data.
To this end, Quorum Store provides a proof of availability, which guarantees that the data behind the proof will be available in the system until the specified expiration time therein.
A high-level validator pipelined architecture overview is covered below outlining the following 4 components:
- Mempool is responsible for holding the set of potential user transactions
- Quorum Store’s core functionality is to pull batches of transactions from Mempool, broadcast them, and form their proofs of availability
- Consensus pulls proofs from Quorum Store, orders them, and pushes them to Execution
- Execution uses Quorum Store to map proofs to transactions batches and executes them
The implementation was tested in a decentralized network environment, across 100 validators, in over 30 different countries and stressed the network to confirm low latency, high throughput, and healthy gas market fees. 12x and 3x TPS improvements were achieved in our consensus-only test (no execution) and end-to-end processing, respectively. Check out our reproducible performance benchmark tests for details on how to verify these end-to-end results.
Interestingly, even though Quorum Store adds a round trip, the end-to-end latency improves under high load. This is because removing the leader bottleneck reduces consensus latency dramatically in this case.
Scaling Out Consensus
One of the great benefits of Quorum Store is that it allows for horizontal scalability. Data dissemination is parallelizable by nature, meaning that it scales linearly with more hardware. To scale out consensus, all we need is to add more machines to run Quorum Store. Currently, each validator is running one machine. Instead, each validator can run several machines, one to run consensus to order the proofs and the rest to run Quorum Store in parallel. Check out Figure 7 in Narwhal to see an experiment with linear scalability in the number of machines that reaches 600k TPS.
Protocol and Implementation
Now to highlight the protocol and some of the practical considerations taken. For better readability, some optimization and implementation details were omitted. In a nutshell, to form proofs of availability, all validators repeat the following in parallel:
- Pull transactions from mempool.
- Arrange transactions into batches, based on their gas price, and then pick each batch’s expiration time.
- Broadcast batches of transactions to all other validators.
- Persist received batches, sign their digests, and send back the signatures.
- Collect a quorum of signatures to form a proof of availability.
When a validator signs a batch, it makes a promise to persist the batch until the specified expiration time and to respond to requests for missing batches from other validators.
An availability proof for a batch is a quorum of stake-weighted 2f+1 signatures. A valid proof guarantees that at least stake-weighted f+1 honest validators provided a signature. That means that at least stake-weighted f+1 honest validators will persist the batch until its expiration. Therefore, a proof guarantees that a non-expired batch will be available in the system in a way that any validator that is missing the data will be able to fetch it from other honest validators.
Fetching the data
After being ordered by consensus, the proofs are pushed to execution. At this point, Execution asks Quorum Store to map the proofs to their corresponding transactions. In the common case, Quorum Store will store the transactions locally since an honest batch creator broadcasts it to all. However, in the worst case, Quorum Store will need to fetch the transactions from a remote validator. The proof of availability guarantees that enough remote validators store the data, but if done naively, it can increase latency. To this end, we implement an optimization to pre-fetch remote data as soon as the Consensus block, which contains the proofs, is first seen. This way, by the time the block is ordered, Quorum Store already locally stores the transactions.
Quorum Store is perfectly load balanced. Since all validators broadcast batches and aggregate proofs in parallel, resource utilization is equal across validators, resulting in data dissemination at network speed. Note that since all validators have to get all batches in order to execute, the above “naive” fetching mechanism is optimal. Other cryptographic tools such as erasure codes used to disseminate and fetch/reconstruct the data would only add complexity and yield no benefits in terms of load balancing.
Thanks to Quorum Store, consensus is now the most performant component in our pipelined architecture. To prevent it from overwhelming the other components in the system, each validator adjusts their local batch creation rate based on backlog, similar to classic congestion control systems like TCP. In addition consensus applies fairness between batches created by different validators to ensure validators get a fair block space share.
Reading batches from persistent storage, e.g., a hard disk, may be expensive. We implement a concurrent memory cache that allows Execution to read data from Quorum Store in parallel. The cache contains a subset of the batches Quorum Store persists in storage.
To prevent DDoS attacks, a quota is enforced on both the in-memory cache and the storage. Before signing and persisting a batch from validator v, validators check v’s memory and storage balance. If both balances are fine, the batch is persisted to storage and stored in the cache. Otherwise, it might be persisted or be ignored, depending on the storage balance, depending on the storage balance.
To make sure honest validators do not exceed their quotas, we garbage collect expired batches. To guarantee liveness, we need that for every ordered batch b, an honest validator is either able to (1) get b’s transactions and execute them or (2) state sync to a state that includes them. To this end, expired batches are cleaned based on timestamps in committed blocks. A validator commits a block when it gets a quorum certificate indicating that a majority of validators executed it. These quorum certificates are used in state sync to convince validators that the state is valid. Therefore, if a validator cannot get a batch because all honest validators garbage collected the batch, it is guaranteed that enough honest validators executed the batch and have a quorum certificate for the corresponding state sync.
Quorum Store: How Consensus Horizontally Scales on the Aptos Blockchain was originally published in Aptos on Medium, where people are continuing the conversation by highlighting and responding to this story.