An Explanation Between Cartoons and Greek Symbols
If you spend a bit more time reading about cryptocurrencies, blockchains, or many other related technologies, you likely run into the term zero knowledge proofs (ZKP). To me, the term sounds like a paradox. How can you prove anything with zero knowledge? How do you event know what to prove, if you have zero knowledge?
So I tried to build myself some understanding on this. In this article, I try to share this understanding of what zero knowledge means in ZKP, and what is the proof really about. And how do the two relate to each other.
I originally posted this articke on Medium, where it can still be found.
I start with the simple example of the ZKP and Ali Baba cave, which seems to be often used as a simplified ZKP example. I then proceed with an example of what ZKP might look like as a cryptographic approach, and how Monero uses Pedersen Commitments to prove input and output amounts match without revealing the actual amounts. I look at it as a zero knowledge proof of the matching sums of input and output amounts, without revealing the amounts.
My goal in writing this article was to build a more intuitive understanding that fits between the Ali Baba cave, and the cryptographic theories. So no formulas with Greek symbols, but not just a Alice and Bob in a cave either.
The Basic Example: Ali Baba Cave
The usual example given for Zero Knowledge Proofs is the Ali Baba cave. This is a circular cave containing a door that unlocks only with a specific passphrase. Alice claims to know the passphrase, and wants to prove this to Bob, without revealing anything about the passphrase. The following image shows the layout of the imaginary cave:
The overall story is that Alice enters the cave, and walks up to Point 3 or Point 4. From there, she can return to the entrance via two paths, Path A and Path B. Once she is at one of the points, Bob comes to the entrance (Point 2) and calls for her to arrive there via one of the paths, which he chooses on random. Depending on which side of the door Alice is, she always has a 50% chance of being able to return to Bob without passing through the door. Otherwise, she needs to know the passphrase. Bob does not know if she used the door or not, he only sees her arrive through the correct path he called out.
The points of interest, as I marked then in the figure, are:
- Point 1: Alice and Bob meet outside the cave. Alice enters the cave and walks to Point 3 or Point 4.
- Point 2: A fork of two paths leading deeper into the cave. Once Alice reaches Point 3 or 4, Bob comes here, picks Path A or Path B,and calls for Alice to arrive via the chosen path.
- Point 3: If Alice waits here, and Bob calls for her to come out through Path A, she can walk there without knowing the passphrase to the door. If Bob calls for Path B, she has to open the door to arrive through Path B.
- Point 4: Same as Point 3, but here she has to know the passphrase if Bob calls for her to arrive via Path A.
The following animation illustrates the different scenarios that can play out:
I initially found the Ali Baba cave to be a quite confusing example of Zero Knowledge Proof. Because we obviously know a lot, including:
- There is a door in the cave
- The door is locked, and there is a magic passphrase to open it
- Alice is in the cave, (at the door)
- Bob is outside
- The cave is circular, and there are two paths in/out of it, with no shortcuts between them
- Alice claims to know the passphrase
- After the “proof” is presented, we somehow trust Alice knows the passphrase
So we know a lot about this problem, and the term zero knowledge seemed confusing to me. I eventually figured the following about this problem:
- The protected secret is the passphrase
- The Zero Knowledge here refers to not knowing anything that helps reveal the passphrase (or more generally, the protected secret)
- The proof is in the action of Alice arriving through the correct path every time. Repeated so many times that, statistically, starting on the called side (thus no passphrase required) every time is incredibly unlikely
- In the end Bob gains trust that Alice knows the secret, but gains zero knowledge of the secret itself (the passphrase). He may gain other knowledge, such as Alice arriving through correct path.
If Alice does not know the passphrase for the door, she has a 50% chance to pick the right path on the first run, 25% to get it right twice in a row, and so on, until the joint probability of always getting it right gets statistically incredibly small.
There are various other properties that are often mentioned as required for a good ZKP. Stack Exchange provides nice and concise definitions for two main ones, soundness and completeness:
- Soundness: the proof system is truthful, so whatever it proves, it is true. If the proof shows Alice knows the passphrase, she should really know it.
- Completeness: the proof system is comprehensive, so it can prove all true statements. If Alice knows the passphrase, the system should always show it as true. You cannot know, and fail to prove it.
I recommend the Stack Exchange post for more details, if interested.
Cryptographic Example: Matching Sums in Monero
The Ali Baba cave is an interesting ZKP example, but for me it was a bit hard to figure out how this relates to ZKP in blockchains, cryptocurrencies, and similar constructs. For this, I had a look at how Monero uses the Pedersen commitment to hide the amounts in transactions, while at the same time verifying that the output sums match input sums.
Before going into Monero and its use of the Pedersen Commitment, I will present some base concepts: Elliptic Curves (EC), and their use for basic commitments. Followed finally by Pedersen Commitments on Monero.
Elliptic curves (EC) are a data structure that is commonly used in public-key cryptography. Their mathematical properties make them suitable for creating secure keys and for other cryptographic operations. They are also used in building the Pedersen Commitment in Monero, so we need some basic understanding about them for this article.
EC use in cryptography is referred to as Elliptic Curve Cryptography (ECC). In relation to cryptocurrencies, two commonly used (and standardized) curves are curve 25519 and Secp256k1. Secp256k1 is used by Bitcoin, and 25519 by Monero and many other cryptocurrencies (and other applications). The curves are commonly visualized like this:
Generally, the actual elliptic curve is described as a more abstract structure, but this visualization serves as a way to get a bit more concrete idea of what is being discussed. The elliptic curve’s curve points are used as a basis for many operations in ECC. The red dots in the following figures illustrate such points:
The red lines in these figures also illustrates an important concept related to ECC operations such as adding curve points together. This is the concept of finding lines that connect two dots on the curve, intersecting the curve at a third point. So, a line and three points it passes through.
The code I used to plot these base curves (and some of the following examples) is available on my Github.
So what is a commitment? We could look at the dictionary for a general definition, but here it refers to defining (binding) a value in a way that it cannot be changed, while hiding it. Since the value cannot be changed, we can say we are committed to it. At a later time, we can reveal the value by providing a secret to open its container. This leads to the two defining factors commonly used for (cryptographic) commitments:
- hiding: until the committed value is revealed, it cannot be discovered
- binding: once a commitment is made, its value cannot be changed
There can be different levels of hiding and binding, typically the tradeoff is between being perfectly hiding/binding and computationally hiding/binding. But that is more details than needed for this article at this point.
A typical example of a simple commitment is putting a piece of paper (e.g., a voting ballot), with a number, in a locked box, and providing the key later:
In the above example, the big chef puts a voting ballot (with number 5 written on it) into the locked chest. The chest is then given to the little chef, who has no key. The big chef has now committed to a vote. It is inside a box, with no visibility, thus it is hidden. It can no longer be changed as the box is locked and given to someone (little chef) who has no key. Thus, the commitment has become binding. But little chef does not yet know what the vote is. At a later time, the key is given to little chef, and the vote (commitment) is revealed.
Basic Commitment with ECC
Like the above example with the chefs, Elliptic Curves can be used to build cryptographic commitment schemes. I will try to illustrate this here, starting from a simple example using a single curve point, and leading to the Pedersen Commitment. I borrow some basic ideas from the example in the Grim documentation. Thanks for the ideas!
As for notation, I will use a to represent the committed value. Mainly because if you read about Monero, this seems to be the general notation used. I guess referring to the committed amount of Monero.
Using a Single Curve Point
A curve point named H is defined and made public. We call this the public base point. This base point is actually so public that it is generally defined in the curve’s standard specification, such as for curve 25519 and for secp256k1. With a as the committed value (amount), the commitment c based on H becomes c = a * H. This c defines another point on the same Elliptic Curve, based on the base point H. The following figure illustrates this process:
Here, the blue point is the base point H on the elliptic curve (not the real one for 25519, I just picked something for illustration). The commitment value would be the red point, which would equal to a*H, in this case with value a of 5, c=5*H.
Disclaimer: The below figure illustrates the simplified (fake) notation of Elliptic Curve math I use in this article, with values increasingly moving on the curve in the direction of the dotted red line. For example, 5*H in this figure is 5 line segments from H into the direction of the arrow. I use such increases for more intuitive illustration, real Elliptic Curve math is more complex and behaves different. I will describe the actual math briefly at the end of this article for anyone interested, with some link to find out more. My simplification here is just for illustrative purposes.
Curves vs Chefs
Lets consider the above simple curve commitment as the chefs example. The big chef calculates c=5*H, and publishes c. The little chef does not see the value a (5) being committed, he only sees curve point c. To reveal the commitment, the big chef then reveals a=5, and the little chef can verify that a*H=c. Due to properties of Elliptic Curves and related mathematics, finding any other value x where x*H=c ,other than a, is considered very difficult, and computationally unfeasible (outside Quantum computing). Thus, this commitment is considered computationally binding.
Once a is revealed, the verification of c is trivial. Because H is a public value, the little chef knows it. Because c was published earlier , he can check a*H=c when a is revealed. If it matches, the commitment and the given a is verified.
If we compare this to the chefs example with the chest earlier, the following concepts should match:
- chest = published (commitment) curve point c
- ballot = committed value a
- key = curve point equation a*H. Or just a, since H is known.
The Problem Here
The problem with the above example is that this process is good at binding the commitment value a, but very poor at hiding it. Lets assume we have 10 choices of values we could put on the chefs ballot, from 1 to 10. It would be trivial to brute force all these point values. Just calculate a*H for all a=1–10, and compare the result to the commitment c.
To address this, something called the blinding factor is used.
The Blinding Factor
The blinding factor is commonly referred to as r. This r is just a (secure) random number. Generally in range 1-2²⁵⁶, so very large. A slightly naive approach is to add this to a as in (r+a)*H. The following figure illustrates this with r=11 and a=5, resulting in published c=(11+5)*H=16*H:
Here, r is shown as the green dot. With this, there is no way to tell what a is, without knowing what r is. However, this has an added problem where any combination of a+r could be revealed as the commitment. Thus this approach would lose the binding property, as the commitment could be changed after publishing it. For example, for the above example we could publish any of (a=4, r=12), (a=5, r=11), (a=6, r=10), or many others as the commitment where the sum of r and a is 16. The commitment needs to be binding.
And solving this problem finally moves on to the Pedersen Commitment. For this, another base point G on the same curve can be used. This leads to the final commitment form of r*G+a*H=c. This is finally what is called the Pedersen Commitment (with EC). Using my made-up EC notation, we could visualize this like so:
Now we are back to computationally binding, and at the same time hiding commitment. As I mentioned earlier, due to properties of elliptic curves, and the associated mathematics, it is considered extremely difficult to forge a commitment with different values of r and a, so that it would still match the published c=r*G+a*H. And since both r and a now use a different base point, they cannot be changed as when just summing (a+r)*H above. That’s what they tell me, and it’s enough for my goal in this article.
Homomorphic sums: Hiding the Values in Monero
Monero applies the Pedersen Commitment to hide the actual amounts in transaction inputs and outputs, while at the same time verifying that the sum of amounts in inputs and outputs match. Sounds like sorcery, so lets see how this works.
The homomorphic property of the Pedersen Commitment means that we can actually perform mathematical operations on the encrypted values, and it works just as if the values were not encrypted. In this case, this is based on the homomorphic nature of the Elliptic Curve math. We can simply add curve points together, and compare the results, even without revealing the actual transaction Monero amounts. This is what the magic in the matching Monero sums without knowing the values is based on. But I need concrete examples, so lets make one.
First, a brief description of how cryptocurrency transactions generally works. At a high level, a typical transaction looks something like this:
Each transaction takes a set of inputs that are spent (TxINs). Spending here means the cryptocurrency in those inputs are transferred to some output addresses on the blockchain, as transaction outputs (TxOUTs). An associated transaction fee is taken by the network, and treated as one of the TxOUTs. In Bitcoin, the transaction inputs and outputs, and their values are visible for anyone to see. In that case, the transaction looks pretty much just like the figure above, for anyone observing the blockchain.
Monero is a cryptocurrency especially focusing on privacy, and does many things to hide various parts of these transactions from the public view. One of those is to hide the amounts being transferred. Monero uses the Pedersen Commitment to hide the transaction amounts (values). In this case, the process now looks something like this:
In this case, the input and output values are hidden from the public view. Only the fee is made visible. The only way to see the hidden values is if you have access to the required keys to decrypt the transaction. Thus the values are only visible to the sender and receiver. No other blockchain network nodes, or any other outsider, can see the concrete values that are in the transaction.
However, to maintain a valid ledger of transactions, the system must be able to validate that the total sum of the transaction outputs match its inputs. That is, no bigger amount of Monero is spent than is put in. Otherwise people could just spend some small amount of coins (Monero) in the transaction inputs, and create transaction outputs for much larger sums. This would effectively create Monero out of thin air, and break the entire system. Kind of like the central banks, and the associated monetary system in the “real world”, except we are still waiting for that pyramid to fall ;).
Matching input and output sums
As described earlier, EC Pedersen Commitments used by Monero use two public curve points, G and H. Each input and output Tx (TxIN and TxOUT) has its own Pedersen Commitment, so each one has its own a and r defined, like this:
In the above, a always matches the amount of Monero used in the TxIN or TxOUT. As I discussed earlier, the r is basically a large random number. Here I just picked some (small) numbers to illustrate the calculations.
The r in the above figure now becomes the G multiplier, and a the H multiplier. Generally, the numbers (especially r) would be bigger, but these work for an example.
For the TxIN inputs in the above, the commitments would now be (using r and a values from image above):
- 10$: c = 14*G + 10*H
- 30$: c = 85*G + 30*H
- 10$: c = 45*G + 10*H
Using the homomorphic addition property, the total commitment of all the TxIN’s is then:
- Cin: 14*G + 10*H + 85*G + 30*H + 45*G + 10*H = (14*G + 85*G + 43*G) + (10*H + 30*H + 10*H)
For the TxOUTs, the commitments would be:
- 8$: c = 33*G + 8*H
- 40$: c = 28*G + 40*H
- (fee) 2$: x*G + 2*H
The blinding factor for the fee is still undefined (x) for now. To calculate x, let’s sum the other parts first.
Using homomorphic addition, the total commitment of all the TxOUT’s is:
- 33*G + 8*H + 28*G + 40*H + x*G + 2*H = (33*G+28*G+x*G) + (8*H + 40*H + 2*H)
Visualizing the Amount on the Elliptic Curve
It is perhaps simpler to look at the a and r calculations here separately (for points H and G respectively). First the a and H:
- TXin: 10*H+30*H+10*H=50*H
- TXout: 8*H + 40*H + 2*H=50*H
Both of these end up at the final curve point of 50*H. The final curve point is the part that is published in the transaction (as part of the commitment, combined with the blinding factors*G). The homomorphic nature of it allows comparing the final points to see that the sums match in input and output. This gives the binding property, as finding other matching a and r values would be computationally unfeasible, while allowing to match the amounts.
Next, we need to add all the blinding factors to gain the hiding property.
Visualizing the Blinding Factor on the Elliptic Curve
As for amount H above, for the blinding factor G we can do the same:
- TxIN: 14*G+85*G+45*G=144*G
- TxOUT: 33*G+28*G+x*G=61*G+x*G
As noted above, the x in these equations is the r for fee. Using my simplified EC visualization approach, it looks like this without the fee:
Now we need to make also the TxIN and TxOUT curve points match here. TxIN final curve point is 144*G, so we need to make TxOUT final curve point match that.
Matching the TxIN curve to the TxOUT Curve Using Fee Blinding Factor
As you may have guessed, we can match the TxIN and TxOUT curve points for G like this:
- r for fee (x above) = (sum of TxIN multipliers for G) – (sum of TxOUT multipliers for G, without fee) = (14+85+45)–(33+28)=83
The commitment for the fee then becomes:
- 2$: c = 83*G + 2*H
Filling in the last missing spot (x) in the transaction, we get:
And when we plug this into the total output G calculation and figure above, we get the following:
- TxIN: 14*G+85*G+45*G=144*G
- TxOUT: 33*G+28*G+x*G=61*G+(144–61)*G=61*G+83*G=144*G
And we can now see that both input and output end up at the same curve point for G just like for H:
The total summed commitments are now:
- TxIN: 144*G + 50*H
- TxOUT: 144*G + 50*H
We can now subtract the summed input and output commitments, and check if they match the commitment to zero.
Commitment to Zero
To prove that the total transaction inputs match the outputs, we can compare the sum of their commitments to a commitment to zero. Besides my bank account balance, commitment to zero here simply means:
- z = 0 * H + 0 * G
And if we calculate the difference of the input and output commitments from the above, we get:
- TxIN commitment (c_in): 144*G + 50*H
- TxOut commitment (c_out): 144*G + 50*H
- c_in-c_out: (144*G+50*H) – (144*G+50*H) = 0*G + 0*H
- This is the same as commitment to zero above (z = 0*G + 0*H)
And so we have successfully proven that our total sum of inputs matches the total sum of outputs. Without revealing anything about the actual transaction amounts. This is true, since only the single, final curve points are included in the transaction data. There is no r or a in any form included in the public commitment. Only thing included is the final curve point that is the sum of 144*G+50*H.
This should reveal no information about the amounts, and, I believe, provides a practical example of zero knowledge proof with more cryptographical constructs. It proves the input and output amounts match, but tells you nothing (you gain zero knowledge) about the actual amounts.
The trialing code I used to make the above calculations and the verify that the input and outputs commitments and their differences comes to match the commitment to zero is available on my Github.
I am still no in-depth expert on cryptography, but for anyone interested to read more on homomorphic encryption and its relation to ZKP, a good start seems to be this post on Stack Exchange (as usual..). However, a brief look into actual EC math is in order after all the simplified examples I did above.
EC Addition and Multiplication, Achtually
All the above examples of elliptic curve addition, or multiplication, I showed the points simply move along the curve by x line segments, where x was the base point multiplier (x*G or x*H). For example, for 5*G I simply moved the point 5 line segments forward on the curve starting from G. This is not how real EC math works.
First, a simple addition of two points. Adding two points on an Elliptic Curve gives you a third point. The following figure illustrates this in four steps:
In the above figure the four steps are the following:
- Step 1: Define the points you want to add (P1+P2).
- Step 2: Find a line connecting these two points.
- Step 3: The line should always intersect the curve at a third point. Find that third point (P3).
- Step 4: Reflect P3 across the x-axis to find P4. P4 is now the result of adding P1+P2. So P1+P2=P4 in the above figure.
Why is the Elliptic Curve math done like that? I have no idea, none of the materials I read explained that part. But for me it is enough to have a good enough idea of what is happening and discussed for this topic.
Elliptic Curve Cryptography takes a bit specialized approach to this. Instead of adding two different points together, a public base point is defined, and added to itself N times. Using G to describe a base point, the following figure again illustrates this process in another 4 steps:
Again, G here is just a random point I picked for illustration, not any official point. In the above figure, the four steps are now:
- Step 1: Find the defined public base point G. Since we are adding G to itself, it is used as both P1 and P2 from the previous example on adding P1+P2. Instead of P1+P2, we now do G+G.
- Step 2: Find the line that passes through P1 and P2, or G and G in this case. For a single point, you could draw any line that passes through it (i.e., infinite possible lines). Someone smart has then decided that the tangent line is to be used. It is the line that just touches the curve at point G.
- Step 3: Find the point where the tangent line crosses the curve, this matches P3 from the previous example.
- Step 4: Reflect P3 across the x-axis to get P4, similar to the previous example. This results in G+G, or 2*G. Continue adding G to this to get 3*G, or add 2*G to itself to get 4*G. Repeat until you get the point you desired. For example, in the Monero example above I used values such as 14*G, 85*G, and so on. Just iterate this process to get to the x in x*G.
Additionally, there are other related points, such as use of prime numbers and modulos, for cases such as where the results would not fit in the N bits used. This gets called EC over finite fields, and other such fancy terms. However, the level I presented in this article is enough for me to have a basic understanding to read the blockchain articles, understand the topics, and possibly experiment a bit with the topic.
While all this sounds a bit complicated, luckily other people have already implemented all this in different libraries. For prototyping the topics in this article, I used an Elliptic Curve library for Golang, where the operations and calculations are implemented and verified for me. But I do find it very useful to understand what and why am I doing in those operations.
As I noted before, you can find some example code where I use a library for these calculations on my Github.
Computational vs Information Theoretical
Earlier I discussed how a commitment should be both hiding and binding. Until the committing party reveals the committed value, it should remain hidden. And it should not be possible to change the committed value after committing to it. In other words, the commitment should be binding. I believe I already mentioned the terms information theoretically hiding, and computationally binding.
The Pedersen Commitment is considered to be information theoretically hiding, meaning that no matter how much computing power you have, it is not possible to 100% determine the original committed value, if it is not revealed. As far as I understand, this is because there are always multiple possible values that could fulfill the commitment.
Because of this same reason, the Pedersen Commitment is considered to be computationally binding. Because there are multiple possible values that could fulfill the EC equations, with sufficient computing power you could find a different (and thus a “false”) commitment that would satisfy the verifier. In practice, for the Pedersen Commitment, finding such a point is known as the EC discrete logarithm problem, which is considered too hard to solve without Quantum computing.
This appears to be a common tradeoff between the hiding and binding properties. Perfect (information theoretical) hiding results in computational binding, and perfect binding in computationally hiding.
Back to the Roots: What is Zero Knowledge Proof?
After the lengthy detours, back to the original question. What is Zero Knowledge Proof? Looking back at the Ali Baba example, and the use of Pedersen commitment in Monero, I will try to summarize my take.
- We know a lot about the problem, and are happy share a lot of that knowledge.
- We have a specific piece of information we want to keep secret.
- We wish to share zero knowledge of that secret, but prove we know it, or that something related to it is true.
- The proof comes from applying a specific process according to the domain in question. This results in (statistically) sufficiently strong proof that we hold the proof as verified.
- There are likely some properties that need to be assumed true, such as having true sources of randomness, or lack of Quantum computing.
In the Ali Baba cave, the process is that of Alice always arriving via the correct path to Bob, repeated so many times that the combined probability of Alice always picking the right path by chance is incredibly small. Assumptions including no cheating (no path to bypass door in cave), and no broken randomness (you cannot predict which path Bob calls).
In the Monero proof of matching input and output amounts via Pedersen Commitments, the process is in the Homomorphic calculations holding true. While this case is only computationally binding, the computing power and the associated discrete logarithm problem is considered so hard that the chance of “faking” the proof is also considered incredibly small. Assumptions here include no solution to the discrete logarithm problem (i.e., no Quantum computing), and again true sources of randomness (cannot guess r).
There are plenty of examples on Ali Baba cave, and resources such as Zero to Monero that are filled with Greek symbols and formulas. I wrote this piece in order to find myself the middle ground on what ZKP means, and how the conceptual examples like Ali Baba translate to the cryptographic world, and what they mean at a higher level. I hope this will be useful to someone else. At least I feel I got a better understanding to my questions.
This article also allowed me to get familiar with the Pedersen Commitment, which I had often seen mentioned, but wondered about the real meaning. In the end, I find ZKP, as most topics, is not too complicated once you get past the terminology. Just a lot of new concepts for me, which takes time.
In the case of Monero and the way it verifies and anonymizes its transactions, there are also various other features I have not touched here. These include range proofs (a.k.a. bulletproofs in current implementation) to ensure the Pedersen Commitments are always using positive numbers, ring signatures to hide the sender addresses, stealth addresses for hiding recipients, key images to identify transactions, and (Diffie-Helman) key-exchanges to communicate the r and a from sender to receiver in the Pedersen Commitment. But these are out of the scope of this article.
Just to repeat the general disclaimer, I am not a cryptographer. So do your own research if you really need to understand the details. Hope this was useful for someone anyway. Cheers.
Just to remind once more, the codes I used to trial this article are available on my Monero Scraper Github project. At the time of writing, under src/golang/pedersen and src/python/experiments/ellipticcurves.