ethereum, currency, trading

Practical ZK-SNARKs for Ethereum

2018 was a difficult year for cryptocurrencies. The stratospheric heights of 2017 crashed in January and never recovered during the year. That was a mad year, full of ICOs and febrile enthusiasm about unlimited promises on the blockchain front.

But if 2017 was the year of mirages and expectations, 2018 was a year or reckoning, or realising that the work was still to be done. It was also a year where a lot was achieved, now away from the radar of the press and the noise of euphoric investors. On the Ethereum front, we’ve had several innovations but a major topic was scaling with the use of Zero-Knowledge technology. These currently come in two flavours:

Vitalik Buterin himself has been an enthusiastic proponent of using this technology to raise the throughput of Ethereum’s network and ZCash has gone live with a full implementation of ZK-SNARKs again in 2018.

Although the cryptographic community has been talking and researching this concept for a few years now, in the blockchain area we are just starting, and it is still difficult to find documentation on how to get into the technology and use it. There are a few good references I particularly like:

I have previously written in this blog about STARKs, mainly as a personal study guide to Vitalik’s posts. This post is my first on ZK-SNARKs, and comes from a different standpoint: it relates my experience of going to the trenches and actually implement a working prototype of the technology.

Where to start?

The above references do a good job at walking the reader through the several different concepts involved in the theory of ZK-SNARKs: what is Zero-Knowledge (ZK), what statements can be proven in ZK, what are NP languages and decision problems, elliptic curve cryptography, pairings on elliptic curves and the whole flow from statement to ZK-SNARK.

After reading all of this, I was confident I could understand the maths involved (albeit not to keep it all inside my head at the same time), but one basic question remained unsolved. Where do I start? How do I actually prove what I want to prove?

It was almost like someone had assembled a whole car by hand in front of you, told you you could take it to anywhere you wanted, but didn’t give you the keys. The ignition was lacking.

The problem is that you can’t just do a SNARK for any program that you want, you can only use them with circuits in a very specific format.

The progress from a function to a ZK-SNARK. Courtesy Vitalik Buterin.

And these references don’t show you how to get there. They will either show you a very simple computation that trivially translates to an Algebraic Circuit or they will pass over the matter entirely. When the time came to try to prove a simple statement, like this

“I, the prover, know a pre-image s such that Hash(s) = H, where H is publicly known”,

I just did not know how to start writing the necessary corresponding circuit.

It was not my fault either. The authors of Zerocash (Eli Ben-Sasson again, and Alessandro Chiesa among others) specify in a deep and technical paper how they constructed the circuits for the ZK-SNARKs used in that crypto-currency. This is built of several different components, one of which is the circuit for computing a single hash (technically, the hash compression function of SHA256) which takes almost 28000 algebraic gates! This is not something you want to do by hand.

When I got to this stage, one thing became clear: we need a tool to produce the circuits. The same team that wrote the Zerocash paper also produced a C++ library to create circuits, called Libsnark. The barrier to entry, though, is too high. In order to turn out a ZK-SNARK prototype fast, I needed something else. And that something turns out to exist already: ZoKrates. Again, I’m indebted to a very helpful tutorial that set me on the tracks of how to use it.

ZoKrates

ZoKrates is a tool that bridges the gap between Libsnark and human comprehension. It makes the building of ZK-SNARKs accessible by providing an easy language to write functions and then translating them to circuits. The documentation is not great, in fact it is very sparse, but gives you some basics.

On the other hand, there is an extensive set of examples that show you all the capabilities of the ZoKrates language. ZoKrates only has two real types: numbers of a fixed prime field (field), and limited-size arrays of these numbers (field[n]). It has the capacity to evaluate conditions, which themselves are booleans, but these cannot be assigned to any variable.

Keep one thing in mind. The elements of field look like integers, but with a fundamental difference: they are members of the field Zp where p is a 254-bit prime. The only numbers we can use are between 0 and p-1(included).

It is normal in programming languages to be limited in the number of bits used to represent a number. But usually, we can use any word of a given size (eg 256 bits in Solidity’s uint, 32 bits in C#’s int). ZoKrates is different. All arithmetic is computed modulo p, and because this is a prime, it can’t be a power of 2. Therefore, there are many numbers of the limit size (254-bits) that are not part of the group and so cannot be used. Don’t be upset with this, ZK-SNARKs in Ethereum have a specific format that requires computation in this field and so this type is an optimal representation. In fact, the prime p was chosen to be exactly the order of the groups used by Ethereum’s EIP197, which gives Ethereum the ability to execute and invoke pairings from smart contracts.

The main objective of ZoKrates is to enable the verification of somearithmetic calculations in a modular field. This is the interesting part: at any point we can verify an equality using this format

a == b

This works just like assertions: if the equality fails, the program will abort. Since we want to build ZK-SNARKs, most often we’ll be coding predicates, ie functions that evaluate to True or False. The ZoKrates idiom for this is to write an assertion of what we want to prove and follow it immediately with

return 1

Inequalities are more difficult, and also much less efficient, as they noticeably increase the number of gates in the circuit. To assert two values are different, we can use this idiom:

0 == (if a == b then 1 else 0 fi)

That is, if a == b then the result of the if is 1, and the assertion fails. Otherwise the result if 0, and the assertion succeeds.

Here is a very simple example that shows the prover knows the factorization of some number:

 
// a and b are factorization of c

def main(field c, private field a, private field b) -> (field):

field d = a * b

c == d

return 1

For my interests, the best thing ZoKrates has is the ability to compute the SHA256 Hash function. That makes it really easy to prove one knows the pre-image to a given hash:

 
import "LIBSNARK/sha256packed"
def main(private field a, private field b, private field c, private field d,
field v0, field v1) -> (field):

h0, h1 = sha256packed(a, b, c, d)

v0 == h0
v1 == v1

return 1

A good thing in ZoKrates is that it is not restricted to predicates: you can do normal computations, like multiplications, sums, etc, and return their result. This may even be made of more than one element.

ZoKrates Flow

There are several stages to a ZoKrates pipeline that we will usually want to do. They go from the original function to a the proof verification. These are covered by the following commands.

  • compile: generate the circuit for the original function. This returns two files that specify the circuit in binary and human-readable form. Be wary, these files can be huge.
  • setup: this generates a proving key and a verification key for this circuit. They only need to be generated once, but the process is pseudo-random and if repeated will create different values.
  • compute-witness: a ZK-SNARK proof asserts that the prover knows some private information that satisfies the original function. This private information is called witness, and consequently every witness will have a different proof. Witnesses have to be encoded before they can be used by the ZK-SNARK, and that work is done here. Therefore, this step is required before the proof-generation.
  • generate-proof: this is the final step of the ZoKrates flow. It produces a proof, using both the proving key and the witness encoding of the previous step.
  • export-verifier: this is an optional step, that creates a Solidity contract to verify the proof. In a full flow, it is important to have a verifier of some sorts, of course, and it is great that ZoKrates creates one to run on-chain. But that is slow, and if you only want the ability to create a ZK-SNARK independently of Ethereum, you can write a verifier in another language. That requires the use of appropriate libraries to compute the Elliptic Curve Pairings, and not all the crypto libraries out there can do that (this is a matter for another post, later on).

ZoKrates is a very good piece of software, but is still in a developmental stage. For example, the documentation does not say anything about command line arguments, and I had to find that information in the source code itself. By doing that, I also learned that ZoKrates supports two kinds of ZK-SNARKs, and you have a command line option to specify which one you want to use: the Pinnochio protocol (PGHR13) and the Groth-Maller construction (GM17). This can be specified by with the option -b for the commands setup, export-verifier and generate-proof.

The other main command options are:

  • compile -i -o: specify input and output paths.
  • setup -i -p -v -m: specify the paths for the input, the proving key, the verification key and the meta-information.
  • export-verifier -i -o: specify input and output paths.
  • compute-witness -i -o -a: input and output paths as above; a space-separated list of public and private arguments to the original function, which make up the witness.
  • generate-proof -w -p -j -i: paths to the witness, proving key, proof and meta-information.

Performance

ZoKrates was very easy to use in my first attempt. If you don’t specify any path, files will be created and read from their default locations and everything works smoothly. The language for the original function is terse and can be a bit challenging but my example was as easy as can be. In a few minutes, I had generated a proof and and verified it with the generated contract.

I then ported the verifier to Python, using py_pairing and then C++, using libff. I verified the same proof with all three. The Python version was the slowest. In my particular computer and for my particular function, it took about 4 seconds. The Solidity version was around 2 seconds. I implemented the C++ version to try to get a speed improvement, and I was not disappointed: I brought it down to about 40ms.

The generation of circuits and proofs was between 5 ad 15 seconds depending on the operation. The key and proof sizes were the biggest surprise: the proving key in my case was huge, at 32Mb for the verification of a simple hash. Verifying two hashes led to a proof of about 65Mb.

I also briefly explored the scalability of this construction. The results support the predictions of the academic papers: the size of the proof size and of the verification key; and the verification time practically did not change with different circuits. On the other hand, the size of the proving key seems to scale linearly with the number of gates, and so does the time to generate the circuit and the witness. The time to generate the proof did not seem to vary.

The above values suggest to me that SNARKs are in a good position to be used in the limelight. Verification can be done fast, and while the proof generation is relatively slow (it is ok if it needs to be done only once per transaction, but not if a single transaction requires many proofs), the point is that it is done only once, and the majority of operations will be verifications.

All in all, I am very happy with ZoKrates. It is convenient to use and can help you have ZK-SNARK running and in one morning. Given how forbidding the maths behind this concept is, I was very happily surprised and I believe this will help bring ZK-SNARKs to the main stream. This has been a very rewarding exploration.

I hope you are as interested in Zero-Knowledge arguments as I am, and decide to give ZoKrates a try. If you need any help, you can always contact me. Comments are always welcome, and if you like this post, please say so below and share it with others.

See you next time, for another adventure in the land of developing for Blockchain.

4 thoughts on “Practical ZK-SNARKs for Ethereum”

  1. Hi, Alex!

    Since Zokrates field elements all belong to the filed Zp, how can one compute the SHA256 digest of an input with unlimited size.

    It is my understanding that the Zokrates SHA256 implementation takes four field elements, concatenates them, and computes the hash of the resulting concatenation. This means the maximum input size for this function is 4*FIELD_SIZE.

    Do you have any suggestions on how to make this function accept larger inputs??

    The purpose would be to generate pre-image proofs for larger amounts of data (up to 2/3KB).
    Is this possible?

    Thank you very much in advance, and thank you for the insightful articles!

  2. Hi Joao, that is an insightful comment, thanks for asking.
    As you’ve understood, the circuit for computing SHA256 in a ZK-Snark does not accept inputs of an arbitrary size. And indeed that is a limitation of any Snark. Since Snarks implement circuits, they are bound by the restrictions of circuits, which always have fixed input and output size.

    To understand what is happening above, I have to tell you of Merkle-Damgard first. This is a construction method for general-length hash functions, sort of a template.

    This uses at its core a compression function of fixed input and output size. The function SHA256 used by ZoKrates only proves the correct execution of this compression function. Which means that if you want to extend it to a message with an arbitrary number of blocks, you have to do some extra work.

    The Merkle-Damgard construction divides the original message in a number of blocks. Each block is fed to the compression function, taking one of the input slots. The other input is either an IV or the output of the previous block.

    Now, what you can do is repeat the construction for 1 block and present just as many proofs as you need. For example, suppose you have a ZK-Snark that proves H(a,b) = c, where a,b,c are 256 bit arrays and (a) is private, but (b) can be kept public.

    If you have a message M = M1 M2 M3, where each Mi is 256 bits, and H(M) = Z, you can create 3 hash proofs in the following way.

    Proof 1: H(M1, IV) = O1 (intermediate output of the first block. Make this a public variable, as well as IV.
    Proof 2: H(M2, O1) = O2
    Proof 3: H(M3, O2) = Z

    Basically you reimplement the Merkle Damgard construction with 3 SNARK proofs. You can extend this to any number N, but remember your proof is no longer succinct: it is linear in the input message size.

    If you absolutely must keep both inputs of the basic circuit private, then you are out of luck. You can no longer make an arbitrary length proof, because the verifier can no longer check the MD construction is correct (that is, that the input to a step is the output of the previous one).

    Instead, you have to bring the MD construction inside the circuit, which fixes the number of wires and inputs you can use, and limits you to a well-defined number of blocks. The advantage, though, is that whatever you do you will have a succinct proof, of the same size as that of a single block.

    I hope this helps.

  3. sandeep kiran p

    Hi Alex,

    Thank you for the above post. It gave me the confidence I need to jump into Snarks and Zokrates. Firstly, I am a newbie when it comes to ZKP and all the crypto behind it. But I want to implement a ZKP scheme that does the following

    func(h, m1, m2, PK, sign, res, g2, r, c)
    {
    return (
    h > m1 && m2 > h &&
    (1 == Verify(PK, sign, m1, m2)) &&
    res == (g2 ^ r) * (h ^ c)
    )
    )
    }

    Basically the proof should return true if h is within the range [m1,m2] and a signature (sign variable above) on m1,m2 can be verified using a public key PK and the input res satisfies the above equation.

    h, m1, m2, r are all private variables. PK, signature, res, c, g2 are all public parameters.

    I’ve chosen zokrates and snarks as I believe the proofs are going to be short. In my case the prover and verifier are implemented in C language (nothing related to Ethereum). So I would need the verification do be done in C/C++. I would need your help to understand if the above logic can be implemented in Zokrates or not. What can I expect about proving and verifications times with C++ endpoints?

    Any comments would really help me in my project.

    Regards
    Sandeep

    1. Hi Sandeep,

      thank you for your comment, it is great to see your interest in ZKP. First of all, sorry for the delay in answering, but I have only recently returned from holidays.
      Your question hits on one of the first problems I had when I first entered this field: how to verify a signature inside a Snark. I have to tell you that is not easy.

      So, first things first. The advantage of picking Snarks is that the proof is going to be very small and will be verified very fast, especially in C++. The main disadvantage is that producing that proof can be serious work and take a long time. But there is another problem, which is, can you specify your condition in an R1CS?

      You have three types of condition in your circuit. The inequalities are not a problem, since ZoKrates provides support for that out of the box.

      As for the signature scheme, you will have to settle on something that ZoKrates supports already. I would not recommend trying to implement signature verification from scratch for several reasons, since signature schemes are not trivial. Nowadays, they require operations on some elliptic curve and typically a hash step.

      As far as I can see, ZoKrates already provides support for EdDSA which is a good choice of signature algorithm, which is something they didn’t have when I started writing about Snarks in this blog.

      Regarding the third condition, this may be a problem. In an R1CS, multiplication and addition are native operations, but exponentiation is not. That means you would have to provide several multiplication circuits to implement the exponentiation, but if you have a variable exponent, you would need a circuit of variable size, which you can’t have in an R1CS.

      What I’d suggest in here is to bring the third condition out of the Snark, if you can. That condition is very algebraic, and looks like something you could prove with a Sigma-protocol. The only connection between this condition and the others is the private variable `h`. So, you could try something like this (I haven’t actually checked the details, but I think the top-level strategy works):

      1. create a commitment `c = Comm(h)`, where Comm is a commitment function that you can verify algebraically and in a Snark. Since ZoKrates now supports Pedersen hahses, this seems a good choice.
      2. prove `res == (g2 ^ r) * (h ^ c) AND c == Comm(h) in a Sigma protocol if possible. Note that `c` is a public variable
      3. prove in a Snark that `h > m1 && m2 > h && [signature verification] && c == Comm(h)

      As for times, I can’t give you accurate details. The time for proof generation depends on the complexity of the circuit, and of course on the hardware you have available. But proof verification in C++ on a desktop or laptop should take only a few milliseconds.

      I hope that helps, and wish you good luck.

Leave a Reply