ToolszkEVMConceptsSparse merkle trees

Simple Smt

Understanding the finer details of how the Storage state machine operates requires a good grasp of the way the zkProver's sparse Merkle trees (SMTs) are constru

Understanding the finer details of how the Storage state machine operates requires a good grasp of the way the zkProver's sparse Merkle trees (SMTs) are constructed. This document explains how these SMTs are built.

Consider key-value pair based binary SMTs. The focus here is on explaining how to construct an SMT that represents a given set of key-value pairs. And, for the sake of simplicity, we assume 8-bit key-lengths.

A NULL or empty SMT has a zero root. That is, it has no key and no value recorded in it. Similarly, a zero node or NULL node refers to a node that carries no value.

A binary SMT with one key-value pair

A binary SMT with a single key-value pair (Ka,Va)(K_{\mathbf{a}}, \text{V}_{\mathbf{a}}), is built as follows.

Suppose that Ka=11010110K_{\mathbf{a}} = 11010110. In order to build a binary SMT with this single key-value (Ka,Va)(K_{\mathbf{a}}, \text{V}_{\mathbf{a}}),

  1. One computes the hash H(Va)\mathbf{H}( \text{V}_{\mathbf{a}}) of the value Va\text{V}_{\mathbf{a}},

  2. Sets the leaf La:=H(Va)\mathbf{L}_{\mathbf{a}} := \mathbf{H}( \text{V}_{\mathbf{a}}),

  3. Sets the sibling leaf as a NULL leaf, simply represented as "0\mathbf{0}",

  4. Computes the root as roota0=H(La0)\mathbf{root}_`{a0}` = \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{0} ), with the leaf La\mathbf{L}_{\mathbf{a}} on the left because the lsb(Ka)=0\text{lsb}(K_{\mathbf{a}}) = 0. That is, between the two edges leading up to the root, the leaf La\mathbf{L}_{\mathbf{a}} is on the left edge, while the NULL leaf "0\mathbf{0}" is on the right.

See the below figure for the SMT representing the single key-value pair (Ka,Va)(K_{\mathbf{a}}, \text{V}_{\mathbf{a}}), where Ka=11010110K_{\mathbf{a}} = 11010110.

A Single key-value pair SMT

Note that the last nodes in binary SMT branches are generally either leaves or zero-nodes.

In the case where the least-significant bit, lsb of KaK_{\mathbf{a}} is 11, the SMT with a single key-value pair (Ka,Va)(K_{\mathbf{a}}, \text{V}_{\mathbf{a}}) would be a mirror image of what is seen in Figure 3. And its root, root0a=H(0La)roota0\mathbf{root}_{0a} = \mathbf{H}( \mathbf{0}\| \mathbf{L}_{\mathbf{a}} ) \neq \mathbf{root}_`{a0}` because H\mathbf{H} is a collision-resistant hash function.

This example also explains why we need a zero node. Since all trees used in our design are binary SMTs, a zero node is used as a default sibling for computing the parent node. This helps to differentiate between roots (also between parent nodes) because a root node actually identifies an SMT. Therefore, it is crucial to distinguish between roota0=H(La0)\mathbf{root}_`{a0}` = \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{0} ) and root0a=H(0La)\mathbf{root}_{0a} = \mathbf{H}( \mathbf{0}\| \mathbf{L}_{\mathbf{a}}) because they represent two distinct trees.

Binary SMTs with two key-value pairs

Consider now SMTs with two key-value pairs, (Ka,Va)(K_{\mathbf{a}}, \text{V}_{\mathbf{a}}) and (Kb,Vb)(K_{\mathbf{b}}, \text{V}_{\mathbf{b}}).

There are three distinct cases of how corresponding SMTs can be built, each determined by the keys, KaK_{\mathbf{a}} and KbK_{\mathbf{b}}.

Case 1

The keys are such that the lsb(Ka)=0\text{lsb}(K_{\mathbf{a}}) = 0 and the lsb(Kb)=1\text{lsb}(K_{\mathbf{b}}) = 1.

Suppose that the keys are given as Ka=11010110K_{\mathbf{a}} = 11010110 and Kb=11010101K_{\mathbf{b}} = 11010101.

To build a binary SMT with this two key-values, (Ka,Va)(K_{\mathbf{a}}, \text{V}_{\mathbf{a}}) and (Kb,Vb)(K_{\mathbf{b}}, \text{V}_{\mathbf{b}}),

  1. One computes the hashes, H(Va)\mathbf{H}(\text{V}_{\mathbf{a}}) and H(Vb)\mathbf{H}( \text{V}_{\mathbf{b}}) of the values, Va\text{V}_{\mathbf{a}} and Vb\text{V}_{\mathbf{b}} , respectively,

  2. Sets the leaves, La:=H(Va)\mathbf{L}_{\mathbf{a}} := \mathbf{H}( \text{V}_{\mathbf{a}}) and Lb:=H(Vb)\mathbf{L}_{\mathbf{b}} := \mathbf{H}( \text{V}_{\mathbf{b}}),

  3. Checks if the lsb(Ka)\text{lsb}(K_{\mathbf{a}}) differs from the lsb(Kb)\text{lsb}(K_{\mathbf{b}}),

  4. Since the lsb(Ka)=0\text{lsb}(K_{\mathbf{a}}) = 0 and the lsb(Kb)=1\text{lsb}(K_{\mathbf{b}}) = 1, it means the two leaves can be siblings,

  5. One can then compute the root as, rootab=H(LaLb)\mathbf{root}_{\mathbf{ab}} = \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}}).

Note that the leaf La\mathbf{L}_{\mathbf{a}} is on the left because the lsb(Ka)=0\text{lsb}(K_{\mathbf{a}}) = 0, but Lb\mathbf{L}_{\mathbf{b}} is on the right because the lsb(Kb)=1\text{lsb}(K_{\mathbf{b}}) = 1. That is, between the two edges leading up to the rootab\mathbf{root}_{\mathbf{ab}}, the leaf La\mathbf{L}_{\mathbf{a}} must be on the edge from the left, while Lb\mathbf{L}_{\mathbf{b}} is on the edge from the right.

Note that in this particular example, any second key KbK_{\mathbf{b}} with its LSB being 11 would result in the corresponding values being siblings. The algorithm unfolds as we build Merkle trees with more key-value pairs.

See the below figure for the SMT representing the two key-value pairs (Ka,Va)(K_{\mathbf{a}}, \text{V}_{\mathbf{a}}) and (Kb,Vb)(K_{\mathbf{b}}, \text{V}_{\mathbf{b}}), where Ka=11010110K_{\mathbf{a}} = 11010110 and Kb=11010101K_{\mathbf{b}} = 11010101.

Two key-value pairs SMT - Case 1

Case 2

Both keys end with the same key-bit. That is, the lsb(Ka)=lsb(Kb)\text{lsb}(K_{\mathbf{a}}) = \text{lsb}(K_{\mathbf{b}}), but their second least-significant bits differ.

Suppose that the two keys are given as Ka=11010100K_{\mathbf{a}} = 11010100 and Kb=11010110K_{\mathbf{b}} = 11010110.

To build a binary SMT with this two key-values, (Ka,Va)(K_{\mathbf{a}}, \text{V}_{\mathbf{a}}) and (Kb,Vb)(K_{\mathbf{b}}, \text{V}_{\mathbf{b}});

  1. One computes the hashes, H(Va)\mathbf{H}(\text{V}_{\mathbf{a}}) and H(Vb)\mathbf{H}( \text{V}_{\mathbf{b}}) of the values, Va\text{V}_{\mathbf{a}} and Vb\text{V}_{\mathbf{b}} , respectively.

  2. Sets the leaves, La:=H(Va)\mathbf{L}_{\mathbf{a}} := \mathbf{H}( \text{V}_{\mathbf{a}}) and Lb:=H(Vb)\mathbf{L}_{\mathbf{b}} := \mathbf{H}( \text{V}_{\mathbf{b}}).

  3. Checks if the lsb(Ka)\text{lsb}(K_{\mathbf{a}}) differs from the lsb(Kb)\text{lsb}(K_{\mathbf{b}}). Since the lsb(Ka)=0\text{lsb}(K_{\mathbf{a}}) = 0 and the lsb(Kb)=0\text{lsb}(K_{\mathbf{b}}) = 0, it means the two leaves cannot be siblings at this position because it would otherwise mean they share the same tree-address 0, which is not allowed.

  4. One continues to check if the second least-significant bits of KaK_{\mathbf{a}} and KbK_{\mathbf{b}} differ. Since the second lsb(Ka)=0\text{second lsb}(K_{\mathbf{a}}) = 0 and the second lsb(Kb)=1\text{second lsb}(K_{\mathbf{b}}) = 1, it means the two leaves La\mathbf{L}_{\mathbf{a}} and Lb\mathbf{L}_{\mathbf{b}} can be siblings at their respective tree-addresses, 00 and 10.

  5. Next is to compute the hash H(LaLb)\mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}}) and set it as the branch Bab:=H(LaLb)\mathbf{B}_{\mathbf{ab}} := \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}}) at the tree-address 0. Note that the leaf La\mathbf{L}_{\mathbf{a}} is on the left because the second lsb(Ka)=0\text{second lsb}(K_{\mathbf{a}}) = 0, while Lb\mathbf{L}_{\mathbf{b}} is on the right because the second lsb(Kb)=1\text{second lsb}(K_{\mathbf{b}}) = 1.

  6. The branch Bab:=H(LaLb)\mathbf{B}_{\mathbf{ab}} := \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}}) needs a sibling. Since all the values, Va\text{V}_{\mathbf{a}} and Vb\text{V}_{\mathbf{b}}, are already represented in the tree at La\mathbf{L}_{\mathbf{a}} and Lb\mathbf{L}_{\mathbf{b}}, respectively, one therefore sets a NULL leaf "0\mathbf{0}" as the sibling leaf to Bab\mathbf{B}_{\mathbf{ab}}.

  7. As a result, it is possible to compute the root as, rootab0=H(Bab0)\mathbf{root}_{\mathbf{ab0}} = \mathbf{H}(\mathbf{B}_{\mathbf{ab}} \| \mathbf{0}). Note that, the branch Bab\mathbf{B}_{\mathbf{ab}} is on the left because the lsb(Ka)=0\text{lsb}(K_{\mathbf{a}}) = 0, and 0\mathbf{0} must therefore be on the right. That is, between the two edges leading up to the rootab0\mathbf{root}_{\mathbf{ab0}}, the branch Bab\mathbf{B}_{\mathbf{ab}} must be on the edge from the left, while 0\mathbf{0} is on the edge from the right.

See the below figure depicting the SMT representing the two key-value pairs (Ka,Va)(K_{\mathbf{a}}, \text{V}_{\mathbf{a}}) and (Kb,Vb)(K_{\mathbf{b}}, \text{V}_{\mathbf{b}}), where Ka=11010100K_{\mathbf{a}} = 11010100 and Kb=11010110K_{\mathbf{b}} = 11010110.

Two key-value pairs SMT - Case 2

Case 3

The first two least-significant bits of both keys are the same, but their third least-significant bits differ.

Suppose that the two keys are given as Ka=11011000K_{\mathbf{a}} = 11011000 and Kb=10010100K_{\mathbf{b}} = 10010100. The process for building a binary SMT with these two key-values, (Ka,Va)(K_{\mathbf{a}}, \text{V}_{\mathbf{a}}) and (Kb,Vb)(K_{\mathbf{b}}, \text{V}_{\mathbf{b}}) is the same as in Case 2;

  1. One computes the hashes, H(Va)\mathbf{H}(\text{V}_{\mathbf{a}}) and H(Vb)\mathbf{H}( \text{V}_{\mathbf{b}}) of the values, Va\text{V}_{\mathbf{a}} and Vb\text{V}_{\mathbf{b}} , respectively.

  2. Sets the leaves, La:=H(Va)\mathbf{L}_{\mathbf{a}} := \mathbf{H}( \text{V}_{\mathbf{a}}) and Lb:=H(Vb)\mathbf{L}_{\mathbf{b}} := \mathbf{H}( \text{V}_{\mathbf{b}}).

  3. Checks if the lsb(Ka)\text{lsb}(K_{\mathbf{a}}) differs from the lsb(Kb)\text{lsb}(K_{\mathbf{b}}). Since the lsb(Ka)=0\text{lsb}(K_{\mathbf{a}}) = 0 and the lsb(Kb)=0\text{lsb}(K_{\mathbf{b}}) = 0, it means the two leaves cannot be siblings at this position as it would otherwise mean they share the same tree-address 0, which is not allowed.

  4. Next verifier continues to check if the second least-significant bits of KaK_{\mathbf{a}} and KbK_{\mathbf{b}} differ. Since the second lsb(Ka)=0\text{second lsb}(K_{\mathbf{a}}) = 0 and the second lsb(Kb)=0\text{second lsb}(K_{\mathbf{b}}) = 0, it means the two leaves cannot be siblings at this position, because it would otherwise mean they share the same tree-address 00, which is not allowed.

  5. Once again he checks if the third least-significant bits of KaK_{\mathbf{a}} and KbK_{\mathbf{b}} differ. Since the third lsb(Ka)=0\text{third lsb}(K_{\mathbf{a}}) = 0 and the third lsb(Kb)=1\text{third lsb}(K_{\mathbf{b}}) = 1, it means the two leaves La\mathbf{L}_{\mathbf{a}} and Lb\mathbf{L}_{\mathbf{b}} can be siblings at their respective tree-addresses, 000 and 100.

  6. One then computes the hash H(LaLb)\mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}}), and sets it as the branch Bab:=H(LaLb)\mathbf{B}_{\mathbf{ab}} := \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}}) at the tree-address 00. The leaf La\mathbf{L}_{\mathbf{a}} is on the left because the third lsb(Ka)=0\text{lsb}(K_{\mathbf{a}}) = 0, while Lb\mathbf{L}_{\mathbf{b}} is on the right because the third lsb(Kb)=1\text{lsb}(K_{\mathbf{b}}) = 1.

  7. The branch Bab:=H(LaLb)\mathbf{B}_{\mathbf{ab}} := \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}}) needs a sibling. Since all the values, Va\text{V}_{\mathbf{a}} and Vb\text{V}_{\mathbf{b}} , are already represented in the tree at La\mathbf{L}_{\mathbf{a}} and Lb\mathbf{L}_{\mathbf{b}}, respectively, one therefore sets a NULL leaf "0\mathbf{0}" as the sibling leaf to Bab\mathbf{B}_{\mathbf{ab}}.

  8. One can now compute the hash H(Bab0)\mathbf{H}(\mathbf{B}_{\mathbf{ab}} \| \mathbf{0}), and set it as the branch Bab0:=H(Bab0)\mathbf{B}_{\mathbf{ab0}} := \mathbf{H}(\mathbf{B}_{\mathbf{ab}} \| \mathbf{0}) at the tree-address 0. The hash is computed with the branch Bab\mathbf{B}_{\mathbf{ab}} on the left because the second lsb of both keys, KaK_{\mathbf{a}} and KbK_{\mathbf{b}}, equals 00. Therefore the NULL leaf "0\mathbf{0}" must be on the right as an argument to the hash.

  9. The branch Bab0:=H(Bab0)\mathbf{B}_{\mathbf{ab0}} := \mathbf{H}(\mathbf{B}_{\mathbf{ab}} \| \mathbf{0}) also needs a sibling. For the same reason given above, one sets a NULL leaf "0\mathbf{0}" as the sibling leaf to Bab0\mathbf{B}_{\mathbf{ab0}}.

  10. Now, one is able to compute the root as rootab00=H(Bab00)\mathbf{root}_{\mathbf{ab00}} = \mathbf{H}(\mathbf{B}_{\mathbf{ab0}} \| \mathbf{0}). Note that the hash is computed with the branch Bab0\mathbf{B}_{\mathbf{ab0}} on the left because the lsb of both keys, KaK_{\mathbf{a}} and KbK_{\mathbf{b}}, equals 00. That is, between the two edges leading up to the rootab00\mathbf{root}_{\mathbf{ab00}}, the branch Bab0\mathbf{B}_{\mathbf{ab0}} must be on the edge from the left, while "0\mathbf{0}" is on the edge from the right.

See the below figure depicting the SMT representing the two key-value pairs (Ka,Va)(K_{\mathbf{a}}, \text{V}_{\mathbf{a}}) and (Kb,Vb)(K_{\mathbf{b}}, \text{V}_{\mathbf{b}}), where Ka=11011000K_{\mathbf{a}} = 11011000 and Kb=10010100K_{\mathbf{b}} = 10010100.

Two key-value pairs SMT - Case 3

There are several other SMTs of two key-value pairs (Kx,Vx)(K_{\mathbf{x}}, \text{V}_{\mathbf{x}}) and (Kz,Vz)(K_{\mathbf{z}}, \text{V}_{\mathbf{z}}) that can be constructed depending on how long the strings of the common least-significant bits between KxK_{\mathbf{x}} and KzK_{\mathbf{z}} are.

In general, when building an SMT, leaves of key-value pairs with the same least-significant key-bits share the same navigational path only until any of the corresponding key-bits differ. These common strings of key-bits dictate where the leaf storing the corresponding value is located in the tree.

Edit on GitHub

Last updated on