Padding-kk state machine
All Keccak-related state machines are accessed through the Padding-KK state machine. It is therefore responsible for handling queries from the Main state machin
All Keccak-related state machines are accessed through the Padding-KK state machine. It is therefore responsible for handling queries from the Main state machine. Common queries are requests for digests of messages, together with validation of these digests.
This document describes the internal mechanism of the Padding-KK SM. It explains how the state machine validates hash values, input string lengths, and input string readings to ensure that padding requirements are followed.
First, keep in mind that the Padding-KK SM's operations are byte-based, whereas the Keccak-F SM's actions are bit-based. This discrepancy in string formats is handled by the Padding-KK-Bit SM.
Padding and the Keccak-F SM
Although the hashing of messages is carried out by the Keccak-F SM, the padding happens in the Padding-KK SM. Messages are presented to the Padding-KK SM as byte-strings in hexadecimal form. But the Padding-KK-Bit SM ensures that these are presented as bit-strings to the Keccak-F SM.
Even though Keccak-F SM receives strings of any length as inputs, each input-message to the Keccak-F SM is first split into blocks of 1088 bits (i.e., 136 bytes), called the bitrate (or rate). If the tail-end of the splits is shorter than 136 bytes, or if the original message is shorter than 136 bytes, a specific string is appended to it in order to form a full 136-byte string. The appended bits (or bytes) are referred to as the padding.
Keccak's first padding-rule is to append the string of the appropriate length. That is, the padding always consists of two 1's and a string of 0's between them.
The second padding rule: If the input-message is exactly 136 bytes long, or a multiple of 136 bytes, then a block of 136 bytes consisting of just the padding , must be appended.
It is crucial to emphasise that the Polygon zkEVM follows the Keccak construction used in the Ethereum. So then the Padding-KK SM does not append any other 'fixed' bits to the padding, such as appending "" as it is done in the FIPS's SHA-3. Therefore, as far as the Keccak-F hash function is concerned, the Polygon zkEVM does not follow the FIPS.202 Standard.
Input strings and padding
Consider the strings that need to be hashed.
Note that the bits (or bytes) of these strings cannot be simply combined and presented as one stream of bits (or bytes) as though they belong to one long string. But each string need to be treated as an individual string, and thus must first be separately padded in line with the above-mentioned padding rules. Only thereafter, depending on the SM involved, can the bytes or the bits be fed into the relevant SM.
The idea here is to map each string to one or several blocks of 136 bytes (1088 bits), and include the proper padding at the tail-end part.

Observe that, as shown in the first string in the figure above; Whenever the length of a certain string is a multiple of the block length of 136 bytes, a new block containing only the padding must be appended. Once this is done, the new resulting string can be provided to the Keccak-F SM for hashing.
Dealing with Main SM queries
The Padding-KK SM is in charge of validating that the padding rule is correctly performed, as well as validating each of the prescribed operations;
- Validate the lengths of given strings . (i.e., length check),
- Validate the hashes of certain strings . (i.e., digest check),
- Validate reads from 1 to 32 bytes of string . (i.e., read check).
Next is to prepare the machinery that enables the Padding-KK SM achieve these three checks.
Setting up columns for verification purposes
We now design a series of columns that enable us to completely verify correctness of every state transitions.
-
: This register stores every byte of the padded input (as commented before), one byte per row.
-
: This register stores an increasing sequence of integers starting from 0, changing its value at the beginning of a new string. Of course, an address completely determines the corresponding string of a certain byte.
-
: This register represents the connection between two blocks. If is 1, the actual block is in the same string with the previous block. Otherwise, the last block is in the previous string.
-
: This register flags the last row of every block. If is 1, the next block starts in the next row of the table.
-
: For each string, this register is a decreasing sequence of signed integers. Starting at the length of the current string and it keeps decreasing until the last block of the string is reached. Observe that this value can be negative since a padding may be present in the string. ( is short for 'remaining'.)
-
: For each string, this register stores original length (i.e., the length of the string before padding was appended). Therefore, this register remains constant for all rows of each string. Observe that, at the first row of each string, coincides with .
-
: A computed register which is 1 whenever is zero, and 0 otherwise.
-
: The register is 1 just after the byte , corresponding to the appearance of padding bits. Observe that it can happen that is constantly 0 among a full string. This is because we can have the situation where the padding only consists of the byte .
-
: This register is actually computed from the registers , and as follows,
This means should be 1 whenever two things hold true. First, if , and secondly, if the next block to be processed is contained in the next string.
Observe that and cannot simultaneously be "1" (i.e., not in the same row). Moreover, it is very important to include in this computation because alone cannot help detect that padding has occurred. For instance, if , then would remain constant, continuing to record 0's as if no padding took place.
Example: Padding check using columns
Let us illustrate this design with a table, using the columns defined above. Suppose the following strings have been padded and are ready for hashing:
where "|" indicates the end of a 136-byte block. In the below table, these 136-byte splits between blocks are indicated with a horizontal line.
The above table illustrates how the columns can be used to record the executional trace of the Padding-KK State Machine. As it is our general approach, a strategy towards achieving verifiable computations, the next task is to describe the state transitions of the Padding-KK SM in terms of polynomials.
Description of state transitions in PIL
By capturing the relationships between and among the columns (registers defined above) in terms of equations, amounts to translating the execution of the padding into a verification code written in PIL.
- In order to guarantee that the value recorded in the register decreases until is 1 (that is, the end of the string), use the relation,
- How can we validate the fact that was constructed as expected?
First observe that changes to 1 immediately after a 1 was recorded in the register. (i.e., immediately when the padding starts, and when .)
Secondly, notice that after this point, remains 1 until (and including when) equals 1. After which, becomes 0.
Hence, these behaviour can be captured as,
- Verifying correctness of the register requires two constraints;
→ Checking that is constant in each block
→ Checking two specific situations,
-
When is 1 and is 0 : in this case, the next value of should be 0, because of the block change but within the same string.
-
When both and are 1 : in this case, the next value of should be 0, due to the string change.
These two scenarios are verified with the following constraint,
-
The register is constant within each string. It must therefore satisfy this relationship,
-
Checking that and coincide at the first state of each string, use the constraint,
where is a committed column and it is such that .
In fact, is a shifted version of , which is used to ensure that, when starting a string (and therefore, ), then .
-
Let us now specify the relations that satisfy the register. As commented on before, is constant within each string. Hence, if and only if .
The constraint for the register is therefore,
Note that going from one string to the next, the values of the register form an increasing sequence (increasing by steps of 1 from one string to the next).
However, since the polynomials utilized in this scheme are all cyclic (due to evaluations on roots of unity), there is a need to ensure that the register resets to whenever the reading returns to the first row.
For this purpose, a register called is added. And is 1, if is 1 and the string, that the last block belongs to, is not the last one.
Similarly, another register called is added, and it is defined such that
See below table for a comparison between the and non- registers.
-
In order to grapple with the increasing but cyclic sequence of , the following constraint is used,
-
Now, checking whether is 1 if and only if is 0, is done reversely by first committing the column , the inverse of , and computing as:
And then, as usual, check the relation, .
-
Next is the register which stores the input byte if and only if the current row does not corresponding to the padding. is computed from the , and registers. This ensures loading the padding bytes at their correct positions.
In fact, this register is used in the Plookup of the next state machine.
Observe that can be computed as
Let us carefully analyze this equation:
- If , and are all , then equals . As mentioned above, at the non-padding rows, we just store .
- If is 1, is 0 and is 0, then equals , which is the first byte of the padding.
- If is 0, is 1 and is 0, then equals , which are the intermediate bytes of the padding.
- If is 0, is 1 and is 1, then equals , which is the last byte of the padding.
- Lastly, we should consider the special case where the padding is only the byte . In this case, is 0 at the last row meanwhile and are both 1. Therefore, equals , as we wanted.
See below table for all the above cases when computing .
Hash output check
We have thus far only dealt with correct inputs of the padding. Now, we introduce several columns to deal with the hash itself.
Since KECCAK-256's output is 256 bits long, we use eight (8) registers each of 32 bits to store the result of the hash function. Denote these registers by .
As columns, these registers remain constant within a single string, because they represent the hash of a given string. The following constraints are therefore added,
A combination of other KECCAK-related state machines is used to verify correctness of the output hash. The reason for this is that, for compatibility with the KECCAK-256 hash function, we first need to describe all inputs in terms of bits.
Length and read check operations
In this section we give a description of the design that allows us to verify the lengths of input strings and the read operations.
Length check
The length check is almost trivial because the register has already been introduced.
Suppose one is given a length and an address . And the aim is to check if the string corresponding to the address has length .
The strategy is to use Plookup to verify that the given table contains a row with and an address in the last row of the string (i.e., when ). That is to say, the table of columns; , and ; is as displayed in the table provided below.
The PIL code for this Plookup looks like this:
where flags whenever the operation is checked and the length is stored in .
Read check
Checking reads requires more columns to be introduced. Recall that, given three parameters;
- the address of a string,
- the position of the starting byte of the string, and
- the total number of bytes that we want to read.
The intention is to read the bytes of the string from the first to thirty second.
It takes 8 registers, each of 32-bits, to store the output of the read operation. Let us denote these registers by:
Further 8 factor polynomials are needed so as to correctly place the input bytes with respect to their right power of 2.
Before looking into the roles of the registers and factor polynomials, three more columns are necessary for describing the verification of the read operations. These registers are; , and ; and are defined as follows.
-
: A register that specifies the number bytes to be read. It is a number register, containing numbers between 1 and 32, and it remains constant in each of the readings we want to perform.
-
: A decreasing sequence of values, starting from the length value of the read minus 1 (i.e., ) and ends at 0, for each read. This is important for positioning each byte at the correct power of 2.
-
: This is a computed column, using the same inverse technique as before. It records instances when the register is 0. First, the register is committed, allowing to be expressed as
which satisfies the constraint, .
Example: Read check using columns
Suppose we want to read the first 10 bytes of the address 6, and once finished, read the next two bytes of the same address thereafter. Here is the correct way of constructing the polynomials; , and . Consider the string, .
Note that the horizontal line is used to separate every 8 bytes, which are the exact number of bytes stored in a register. We will later on see why this is important.
Several observations
Firstly, note that is 1 if and only if is 0.
Secondly, if we want to express the two read values as an array, the output should be
where the first element of the array has 10 bytes, whilst the second has 2 bytes. This coincides with each of the defined values.
Thirdly, observe the relationships that these registers; , and ; have to satisfy.
The register decreases in every read, so we can express this as
The register remains constant during each read operation, hence the following constraint applies,
The first of every read is . Hence, we need to specify the following relationship
In addition, we need to ensure that the address resets immediately after a string ends. Otherwise, we read from a different string. For this reason, we need to introduce the following constraint
which, when combined with the previous constraints enforces the desired results.
Registers and polynomial factors
Let's look closely at the registers and polynomial factors .
Consider the previous example, where the first element of the 'output' array is
Obviously, this element does not fit in a 32-bit register. So it needs to be split over three registers; , and ; respectively as,
Moreover, each byte has a corresponding weight , for some , such that,
Henceforth, we should reflect this in our state machine using the columns and for our previous example.
Observe that registers are actually cumulative up to the register's maximal capacity of 4 bytes is reached, in which case the next reads are stored in the next register.
The registers are responsible for correct positioning of each byte with respect to powers , for , providing the decomposition stated above.
Lastly, observe that the rows corresponding to , store the complete result of the read in the registers . These are the rows where validity of the read operation is checked via a Plookup.
Constraints on registers
We look closely at constraints pertaining to and .
The idea here is to compute a column defined by,
and then verify that the next state coincides with whenever is not equal to 1. This confirms that the registers are not only sequentially read, but are also correctly computed from the previous states. Hence, the following constraint needs to be added;
So far so good.
Now observe that the tuple is not totally arbitrary. In fact, it needs to be checked whether and . This is done via a Plookup. That is, checking if the row corresponding to is contained among the previously computed table of constants having all possible combinations of the previous columns.
Read operation and Main state machine
There are a few subtle details concerning the connection between the Read operation and the Main SM to be taken into account.
For instance, in order to avoid problems, reading from the last (probably smaller) block is not allowed. A constant column is therefore created specifically for checking if a read is done at the last block. So, is 1 if and only if the current state is not at the last block. Hence, the Plookup in the Main SM's PIL code (main.pil code; lines 663 to 676):
Please note that these quoted code lines may vary across different versions. They should only be taken as examples for a particular code version.
Note that is marking the position in the string of the byte where the reading starts. The Plookup shown above, maps the committed polynomial in the Main SM to the linear combination of the committed polynomials; , and ; of the Padding-KK SM as,
Let us make use of the following example to clearly see why this linear combination works.
Example: Main SM and Padding-KK connection
Suppose we want to read bytes 2, 3 and 4 of the string . The table of the reading, reflecting only the relevant columns, is as follows.
Observe that, among the rows with (the ones where the reading happens), only the row in contains the complete read-result, , in the column. That is, although , meaning the reading starts with byte 2 which is in , the reading is only complete in . And that's the exact row where the linear combination works out to be
Last updated on
Keccak framework
The zkEVM, as an L2 zk-Rollup for Ethereum, employs the Keccak hash function to achieve seamless compatibility with the Ethereum blockchain at Layer 1. However,
Padding-kk-bit state machine
The 136-byte output of the Padding-KK SM must first be translated to bits before it can be used as the Keccak-F SM's input. This is where the Padding-KK-Bit sta