pymerkle
Merkle-tree in Python
Storage agnostic implementation capable of generating inclusion and consistency proofs.
Installation
pip install pymerkle
This will also install cachetools as a dependency.
Basic API
Let MerkleTree
be any class implementing the BaseMerkleTree
interface; e.g.,
from pymerkle import InmemoryTree as MerkleTree
tree = MerkleTree(algorithm='sha256')
Append data into the tree and retrieve the corresponding hash value:
index = tree.append_entry(b'foo') # leaf index
value = tree.get_leaf(index) # leaf hash
Current tree size:
size = tree.get_size() # number of leaves
Current and intermediate state:
state = tree.get_state() # current root-hash
state = tree.get_state(5) # root-hash of size 5 subtree
Inclusion proof
Prove inclusion of the 3-rd leaf hash in the subtree of size 5:
proof = tree.prove_inclusion(3, 5)
Verify the proof against the base hash and the subtree root:
from pymerkle import verify_inclusion
base = tree.get_leaf(3)
root = tree.get_state(5)
verify_inclusion(base, root, proof)
Consistency proof
Prove consistency between the states with size 3 and 5:
proof = tree.prove_consistency(3, 5)
Verify the proof against the respective root hashes:
from pymerkle import verify_consistency
state1 = tree.get_state(3)
state2 = tree.get_state(5)
verify_consistency(state1, state2, proof)
Security
This library requires security review.
Resistance against second-preimage attack
This consists in the following standard technique:
Upon computing the hash of a leaf node, prepend
0x00
to the payloadUpon computing the hash of an interior node, prepend
0x01
to the payload
Resistance against CVE-2012-2459 DOS
Contrary to the bitcoin specification, lonely leaves are not duplicated while the tree is growing. Instead, a bifurcation node is created at the rightmost branch (see next section). As a consequence, the present implementation should be invulnerable to the CVE-2012-2459 DOS attack (see also here for insight).
Topology
Interior nodes are not assumed to be stored anywhere and no concrete links are created between them. The tree structure is determined by the recursive function which computes intermediate states on the fly and is essentially the same as RFC 9162 (Section 2). It turns out to be that of a binary Sakura tree (Section 5.4).
Storage
This library is unopinionated on how leaves are appended to the tree, i.e., how
data is stored in concrete. Cryptographic functionality is encapsulated in the
BaseMerkleTree
abstract class, which admits pluggable storage backends
through subclassing. It is the the developer’s choice to decide how to
store data by implementing the interior storage interface of this class.
Any contiguously indexed dataset should do the job. Conversely, given any such
dataset, we should be able to trivially implement a Merkle-tree that is
operable with it.
Optimizations
The performance of a Merkle-tree depends on how efficiently it computes the root-hash for arbitrary leaf ranges. The recursive version of this function is slow (e.g., RFC 9162, Section 2).
This operation can be optimized using iterations on ranges whose size is a power of two. This has the effect of making proof generation five times faster, while peak memory usage remains reasonably low and sublinear with respect to size. Further boost is given by caching. Practically, a pretty big tree with sufficiently long uptime will respond instantly with negligible penalty in memory usage.