Title:
- A Semantic-Integrated LSM-Tree-Based Key–Value Storage Engine for Blockchain Systems
- ChainKV: A Semantics-Aware Key-Value Store for Ethereum System
Authors:
- Qian Wei, Zehao Chen, Xiaowei Chen, Yuhao Zhang, Xiaojun Cai, Zhiping Jia, Zhaoyan Shen, Yi Wang, Zili Shao, and Bingzhe Li.
- Zehao Chen, Bingzhe Li, Xiaojun Cai, Zhiping Jia, Lei Ju, Zili Shao, and Zhaoyan Shen
Journal/Conference:
- IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024
- SIGMOD '23
Source: https://dl.acm.org/doi/abs/10.1109/TCAD.2023.3348771, https://dl.acm.org/doi/10.1145/3626713
Abstract:
- Blockchain systems play an important role in distributed ledgers, database systems, etc. As more and more blocks are mined, the storage burden of blockchain system is significantly increased. The current blockchain system uniformly transforms all its data into key–value (KV) items and stores them to the underlying log-structure merged tree (LSM-tree) storage engine ignoring the software semantics. Consequently, it not only aggravates the write amplification effect of the storage engine but also increases the redundancy of data query steps, resulting in the performance bottleneck of blockchain system. In this article, we propose a semantic-integrated LSM-tree-based KV storage engine for blockchain systems, called Block-LSM, which significantly improves the data synchronization and data query efficiency of blockchain system. Specifically, we first design a shared prefix scheme to transform blockchain data into ordered KV pairs to alleviate the key range overlaps of different levels in the underlying LSM-tree-based storage engine. Moreover, we propose to maintain several semantic-orientated memory buffers to isolate different kinds of blockchain data and implement memory buffer space management strategy to further improve memory efficiency. To save space overhead, Block-LSM further aggregates multiple blocks into a group and assigns the same prefix to all KV items from the same block group. We also reduce step redundancy in transaction queries by modifying the body data storage format. Finally, we implement Block-LSM in a real blockchain environment and conduct a series of comparative experiments with the typical blockchain system Ethereum. The evaluation results show that Block-LSM significantly reduces up to 7.56x storage write amplification and increases throughput by 8.64x compared with the original Ethereum design. In terms of data lookups (i.e., transaction and account lookup), Block-LSM improves the throughput by 50% compared to the original Ethereum design.
- The Log-Structure Merged tree (LSM-tree) based key-value (KV) store has been widely adopted as the storage engine for blockchain systems, such as Ethereum, in which blockchain data are uniformly transformed into randomly distributed KV items for persistence. However, blockchain semantics are ignored during this process, making the blockchain storage suffer from heavy read/write amplification problems. Moreover, as the Ethereum network scales up, tremendous data further exacerbates its storage burden. Until now, most studies have focused on sharding, data archiving, decentralized distributed storage, etc., to mitigate the burden of the storage layer. However, the incompatibility between Ethereum semantics and the characteristics of the storage engine is ignored.
In this paper, we present ChainKV, a new semantics-aware storage paradigm to improve the storage management performance for the Ethereum system. Firstly, based on Ethereum blockchain semantics, ChainKV separately stores different types of data in multiple storage zones in the KV store to mitigate the read/write amplification problem. Secondly, following the mechanism of the verification process in the authenticated data structure (ADS), a new ADS data transformer is proposed to exploit the data locality when persisting ADS. Moreover, a new space gaming caching policy is adopted to coordinate the cache space management for two independent storage zones. Finally, we propose an optional lightweight node crash recovery mechanism to eliminate functional redundancy between the Ethereum protocol and the storage engine. The experimental results indicate that ChainKV outperforms the prior Ethereum systems by up to 1.99× and 4.20× for synchronization and query operations, respectively.
Presentation material:
2024-07-31 A Semantic-Integrated LSM-Tree-Based Key–Value Storage Engine for Blockchain Systems.pdf
2.53MB
Introduction



ChainKV





Block-LSM


Conclusion

