Tag Archives: cryptocurrency

Mapping Ring Signatures and Stealth Addresses in Monero

With an example from a recent XHV hack (of June 2021)

Introduction

I originally published this article on Medium, where it is still available.

Since Monero has been forked and used as a basis for multiple other blockchains, the same concepts of Ring Signatures and Stealth Addresses also apply more broadly to those forked blockchains as well. One of these forked projects is the Haven Protocol (XHV) private stablecoin project. The example I will use is from the June 2021 XHV hack, highlighting an interplay of the Stealth Addresses and Ring Signatures, and how they were relevant in how the XHV developers tried to address the hack.

Since the exploits were not fully rolled back on the XHV blockchain, the data should still be there to check in practice for anyone interested. As long as the blockchain exists, anyway. In any case, lets start with a brief look at the relevant Monero privacy features to set the stage.

Monero Privacy Features

Monero uses multiple privacy enhancing mechanisms. These include Pedersen Commitments, Ring Signatures, Ring Confidential Transactions (RingCT), and Stealth Addresses. These can be divided into techniques for hiding the amount of Monero transferred in a transaction, hiding the receiver of a transaction, and hiding the sender. That’s a lot of hiding there, so lets see about each one.

Transaction Amounts

One core aspect of transaction anonymity is the transaction amount, i.e., how much funds is the transaction spending / sending. This is relevant to hide the amount itself, but also for the sake of correlating the inputs of one transaction with outputs of another transaction. For example, if you could look at a specific sum that is sent in one transaction, and find the same sum spent later in another transaction, you could deduce that the recipient of the first transaction is likely the spender of the second one, or that it matches the price of some other asset that you think was acquired.

Monero originally did not hide the amounts, and the transaction input and output sums were visible on the blockchain. You can still see this if you look at the earlier Monero transactions on the blockchain, for example this one from year 2014 in block 100000.

These days Monero uses a cryptographic scheme called Pedersen Commitment to hide the amounts. I described the Pedersen Commitment in detail in my article on Zero Knowledge Proofs. Related to this, later in this article I also discuss Ring Signatures, which also used to need to take the amounts into account. However, since the introduction of the RingCT protocol, the Ring Signatures now also hide the amounts along with the Pedersen Commitment.

So, in general, with regards to transaction amounts, the current Monero-based blockchains hide them well, and this is very transparent to the user. That is, the user does not need to really concern about it, as it happens always and automatically when using the protocol.

Transaction Receiver

The approach to hiding the recipient of a transaction in Monero is called Stealth Addresses. There appears to be some ambiquity with regards to what a Stealth Address means in the Monero community, but here I refer to its use as the transaction receiver addresses visible on the blockchain.

Stealth addresses are different from the wallet addresses. The Monero wallet address, as I refer to it here, is the one you get if you start the official Monero Command Line (CLI) wallet, and type the “address” command. Or the same in GUI or whatever wallet you use. This wallet address actually never appears on a Monero-based blockchain, only the Stealth Addresses do.

Given a target (receiver) wallet address, the sender wallet generates a one-time Stealth Address. This generated Stealth Address is used in the blockchain as the transaction receiver address. Elliptic Curve mathematics are used to create a Stealth Address from public keys encoded in the given receiver wallet address, along with using a large randomized factor. This makes each Stealth Address unique. Even if the same sender sends Monero to the same receiver, the Stealth Address is always different and unique for each transaction. This is the mechanism to hide the transaction recipient.

For the details of building Stealth Addresses, I recommend this excellent Stack Exchange post, and similarly excellent article by Luigi1111. When looking at the blockchain, without matching private keys, a Stealth Address cannot be matched to any wallet address. Only the receiver has access to the private keys matching the public keys used to generate the Stealth Address, and thus only their wallet can identify transactions intended for them.

Monkey See, Monkey Do (a Doodle)

To make this a bit more visual, I made the following most beautiful doodle about the mail system as a metaphora for how a Stealth Address works.

My most impressive doodle on using mail as a metaphora for a Stealth Address. Pics from Pixabay: monkey, envelope 1, envelope 2, envelope 3, heads 1, heads 2, heads 3, heads 4, trashcan, piggymailman.

Consider the process in the figure in 7 steps listed in it:

  1. George (the Monkey on left) wants to send 10 Monero to Bob (balding guy on right). So he starts to build a Monero transaction, and putting the 10 Monero in it.
  2. George has Bob’s wallet address. Maybe he found it from Bob’s Github project, asking for donations. George wants to send a few bucks. Using the public keys encoded in Bob’s public wallet address, George’s wallet generates a unique Stealth Address to send funds to Bob.
  3. George puts this newly generated Stealth Address on the 10 Monero transaction as the receiver address.
  4. George gives the transaction to the postman for delivery. The postman here represents the Monero peer-to-peer blockchain network. Consisting of nodes running the Monero Daemon software.
  5. The postman, or in this case the Monero network, delivers a copy of the transaction to every single user of the blockchain. Or their wallets.
  6. Everyone tries to read and open the transaction by attempting to fit their private keys to the Stealth Address. Or their wallet automatically does this.
  7. Everyone else discards the transaction, since their keys did not match, except Bob, whose private keys match the public keys George used from his wallet address. So Bob gets the 10 Monero and adds them to his wallet (piggybank) for later spending.

For the purposes of this article, the key takeaway is that for each transaction a unique receiver address is generated, and this is the one you see on the blockchain as the transaction receiver address. And this is (what I call) the Stealth Address. This is an important element for the next step of hiding the transaction sender using Ring Signatures, and how we can later use this to look for links between the Ring Signatures and Stealth Addresses. The XHV hack I discuss later will illustrate this link more concretely.

Transaction Sender

As I noted already, the approach to hiding the sender of a transaction in Monero is called Ring Signatures. A Ring Signature is so named due to the ring-like structure in how the algorithm is applied. The general idea of this algorithm is to create an anonymity set of possible signers, and hide the true sender/signer in that anonymity set. A signer in this case refers to a transaction output (Stealth Address) that could be spent in the transaction the Ring Signature is attached to. The following illustrates a Ring Signature (yes, its just a doodle based on my understanding):

My Ring Signature doodle. Green is the real tx, hiding in yellow decoys.

In the above figure, I colored T9 (as Transaction Output 9 in the ring) in green to illustrate it being the real input in this case. This is just for illustration, as the real input / signer is randomly placed on the ring. Because if it was always at the same position, it would be obvious which one it is. The Ring Signature algorithm processes these in a ring-like sequence, hence the name.

Monero currently enforces a ring size of 11, meaning the ring contains the real transaction input and 10 fake decoy transaction outputs (sometimes called mixins). Ring size used to be open to user change, but the forced count of 11 is currently used to avoid distinct ring sizes giving away additional metadata about specific users and thus weakening the overall anonymity set.

The signatures in a ring are generated using a set of Stealth Addresses from real transactions on the blockchain. Anyone can prove that one of the signatures is real, but not which one. This gives every ring participant plausible deniability. The real signer is of course the one who sends the real funds, or spends the output they received at that Stealth Address. But you cannot tell which of the 11 it is. Further, a key image is included in each transaction spend to make sure each output is only spent once.

For details on Ring Signatures implementation, I recommend the excellent Stack Exchange answer for Monero and RingCT specifics, or the Wikipedia article with example code for the original Ring Signature scheme.

For the purposes of this article, it is enough to understand that Stealth Addresses define the transaction recipient, and when a transaction output is spent, the Stealth Address of that spent output appears in the transactions Ring Signature. But the Stealth Address may also appear as fake decoys in other transactions.

Visualize it: Stealth Address and a Ring Signature from XHV Explorer

To illustrate, here is a screenshot of the Haven protocol (XHV) explorer showing one of the hacked coinbase outputs (normally amounts are hidden but coinbase / miner transactions show them):

Screenshot of a coinbase output on XHV chain. This link should take you to it, if the explorer exists as used to.

Note that the above screenshot shows how each transaction output has a visible Stealth Address for the receiver. And it is always unique even if going to the same receiver wallet. All the outputs in the above screenshot are actually paid to the same (hacker) wallet, but have different Stealth Address each. In the above screenshot I highlighted one row with the 101k xUSD output. Let’s map it to a Ring Signature.

Here is a screenshot of the explorer, showing how the above highlighted Stealth Address is used in a Ring Signature:

Screenshot of explorer with a ring signature using the previous screenshots Stealth Address. Link.

In this screenshot I marked it as the real output. Generally we could not tell if it was a real input or a decoy input. In this case, I believe it is the real spend, as I will explain soon, due to some telling ways the hackers spent it. However, in general this should illustrate how the Stealth Address appears in Ring Signatures as the real or decoy one. From the public data on the blockchain we cannot really tell which one it is.

Finally after all that lengthy explaining, lets look at the promised XHV hack, and how the Stealth Addresses and Ring Signatures play out in that.

Example by Looking at the XHV Hack

First, some brief background to understand the XHV exploit descriptions, and their relation to Stealth Addresses and Ring Signatures a bit better.

What is XHV

XHV, or Haven Protocol, is called an “algorithmic stablecoin”. The basic XHV token is, surprisingly, called XHV. It can be traded on exchanges, and sent between wallet, similar to XMR for Monero. To create the algorithmic stablecoin functionality, XHV implements also other token types on the blockchain, such as xUSD to track US dollar value, xBTC to track Bitcoin value, and xJPY to track the Japanese Yen value. Special transactions can be created on the XHV blockchain to change one of these token types to another, and these tokens can be sent between wallets just like the base XHV itself.

The tracked prices for USD, BTC, etc., are determined by Chainlink oracles, such as the one for XHV-USD. Without going into too much detail, these concepts are only relevant for this article at a higher level with the multiple asset types, as they come up discussing the following examples with the XHV exploits.

The Exploits

At the time of the example used here, there were multiple different exploits applied in the course of a few days on the XHV blockhain. I will use the first exploit on that list as an example. This exploit abused some missing checks on how the miner (coinbase) transaction is created. The output of the other exploits could be analyzed similarly if interested, as they are all listed on the XHV teams Medium post.

The first exploited miner transaction on that list is in the XHV block 882877. Specifically, the attackers used an exploit to take advantage of missing checks for validating mining rewards / fees in the miner reward (a.k.a. coinbase) transaction for the additional asset tokens. In the block 882877 mining transaction (also the screenshots above in theRing Signatures section), you can see 101k xUSD and 6.73 xBTC being created out of thin air. The miner transaction is exploited again in block 883040 for the same amounts of 6.73 xBTC and 101k xUSD.

So how does this relate to Stealth Addresses and Ring Signatures? After the XHV developers noticed / were notified of the abnormally large miner rewards, they attempted to stop the hackers from using those hacked 101k xUSD and 6.73 xBTC outputs. This is where the Stealth Addresses and Ring Signatures of those transaction outputs come to play.

Attempting to Block Spending of the Exploited Funds

As I described earlier, Monero does not include the actual receiver address on the blockchain. Instead, it creates the one-time Stealth Addresses for each transaction, based on the receivers cryptographic public keys encoded in their wallet address. And while you cannot map a single Stealth Address to any specific recipient/wallet, you can map it to Ring Signatures of specific transactions, where the output identified by the Stealth Address is possibly used as input / spent (or as a decoy). The screenshots earlier illustrated this.

In this case, the XHV developers would have taken the Stealth Addresses used as target addresses for payment of the exploited transactions, and scanned the blockchain to find any Ring Signatures using these Stealth Addresses as inputs, and thus any potential spends of those funds. With no matching Ring Signatures / transaction spends found, one could assume that the funds were not yet spent / moved from the initial target Stealth Address. Reading their report, this would have been the case they assumed.

To block the attacker from using their exploited funds, they provided the biggest XHV mining pools with a modified software that discarded any new transactions using those Stealth Addresses as part of their Ring Signatures. These biggest mining pools would give them control of well over 51% of the network hash power. This 51%+ hash power gives anyone power to discard any blocks mined by someone else, in this case those that would have contained those Stealth Addresses from the exploited outputs as inputs.

However, reading their report and the Medium post, the developers seem to have made some mistake in their scanning, and the idea that the funds were not spent was not the real case. We can verify this simply by using the Monero Daemon RPC API that is shared by the XHV Daemon. We can query a network node with this API for blocks and transactions, and look for any with a specific Stealth Address as an input in the Ring Signatures. Let’s see how it looks.

Mapping Stealth Addresses to Ring Signatures

To build this example, I queried the XHV network for transactions that have the Stealth Addresses of the exploited miner transactions in their Ring Signatures. Once I found any, I repeated this scan one more time, to find transactions using those potential spends as inputs in their ring signatures. The following figure shows the results of this process, and how the exploited miner transactions spread into the blockchain in the blocks coming after it:

Hacker TX fan-out into the blockchain and its ring signatures.

The above figure illustrates how the two exploited transaction outputs fan out from the original exploited miner transaction from block 882887. The two exploited miner transactions in the above are the ones on left side of the figure, labeled 882877 xUSD 100k and 882877 6.73 xBTC. If you look at the miner transaction on the block explorer, the xUSD transaction is paid to Stealth Address 3b8e80a279000d3d32826010b665e2c78aaf25d1744c49cac8a5da154d293875, and the xBTC to Stealth Address 7dec1f53dd30416458926bd35e9c1fff98d0ac86fc185a8fedfad546ff7d4546.

Scanning for these Stealth Addresses as part of a Ring Signature, the xUSD output is used as input in Ring Signatures of three transactions, in blocks 882939, 883064, 883410. The xBTC output is similarly used as input in two Ring Signatures, in blocks 882939, and 883041.

I highlighted 882939 in the above figure in green, since it seems quite clear to me that this is the actual transaction to spend the exploited funds from block 882877. I cannot prove it 100% because the Ring Signature obfuscates the sender in 10 other decoys for both the xUSD and xBTC transaction. However, this is a good illustration of where these privacy mechanisms don’t necessarily alone hide you, if your actions betray you through other heuristics.

I say this because block 882877 contains transactions using both the xUSD and xBTC exploited outputs as ring members. Both outputs also appear in any Ring Signature for the first time in this same block. With the way Monero chooses its decoys, having two transactions in the same block using decoys from the exact same previous transaction is extremely unlikely. Even more, Monero has an unlock time of 60 blocks for miner (coinbase) transactions such as the exploit in block 882877, and the difference between 882939 and 882877 is 62 blocks. So 882939 is one of the first blocks where this hacked output could ever be spent. What a coincidence to have all this happen together. This illustrates how the technology can work perfectly fine, but your actions can still betray you. Probably makes no difference to the hacker, but a good example.

Why Only 2–3 Rings Use the Exploited Outputs

The above figure shows how there are only 3 and 2 blocks where the xUSD and xBTC outputs appear as ring inputs but for the other transactions in the figure there are many more. This is the effect of the XHV developers efforts to stop their use, which was just a bit late as they already were used as real inputs/decoys in those 3 and 2 blocks. But if not, the stopping likely would have worked fine as the cutoff shows.

Fanning Out Into the Blockchain

In the figure above, I expanded the two transaction outputs from the xUSD payment in the block 882939 as examples of what a more realistic fan-out from an output would look like into multiple Ring Signatures in the following blocks. The first (marked output1 in the figure) is paid to stealth address 71d224f0043fdf8c5b6e8bc292f1c1e07bf820d707c2d9b04b6335d49bf37e4b. The second fanned-out output (output2) is paid to stealth address afa169a74b1581cafedf215219ca7b7708802ec13945540d3fb9ba30e188d842.

As the figure shows, the output 1 of block 882939 can be found as (input) member of Ring Signatures in transactions in blocks 883000, 883105, 883155, 883247, 883302, 883987, 883993, 884002, and 884003. Similarly, output 2 can be found in 882970, 883903, 883925, 883986, 883991, 883996, 884440 and 885705. If you want to check for yourself, just open these links and search for the stealth address I copy-pasted above.

Relevance to User Privacy

Beyond all this technological aspect, reading about hacks and all can be interesting. Or maybe everyone already fell asleep. But is any of this relevant to a normal user?

If you only care about having some basic privacy, or if you don’t care about privacy and anonymity at all, you are likely fine just using the Monero based protocols as is. If you do care more deeply about it, there are a few points we can gather from this that are good to keep in mind:

  • When you receive any transaction, consider re-sending it a short time after to yourself, or in general churning it a few times. This will obfuscate what stealth address the funds are in, beyond just inclusion in other Ring Signatures. But beware of obvious patterns such as the one above for the miner exploit.
  • Consider other metadata that could be used to identify and link your transaction. In the above XHV hack example, the metadata was the use of multiple outputs from the same miner transaction at the same time, right after the original transaction unlocked.

However, to make this more complicated, churning and other techniques are not so simple to do perfectly either. Some related discussion in this Stack Exchange post. To make it perfect, your churning should look like real decoy patterns look. But it can get a little complicated to do such perfect optimizations, so mostly some basic consideration for above points is likely good enough. It is especially tricky as I am now aware of any specific support for having granular control over which transaction outputs are spent, or even visibility to the specific inputs and outputs in a Monero wallet.

In general, timing related analysis seems to have been one of the most common ways to try and analyze Monero transactions. To look more deeply into the decoy selection algorithm, I wrote a small study on that here but it got a bit lenghty. It seems to happen to me a lot. So to avoid even more overly long writing, and too much digression, I will save my details on decoy selection and timing analysis for another day.

Other Metadata

Besides time-based analysis, various other issues related to transaction analysis, Ring Signatures, decoys, and related topics have been identified and addressed over Monero lifetime. A very nice and extensive discussion on this topic is available in the Breaking Monero YouTube series by the Monero team. I won’t go into any of those details here, but if interested in the topic, it is definitely a great series to watch.

Future Views on Ring Signatures

While the current Monero ring size is 11, recent Monero Labs Research topics include Triptych and related ideas that aim to enable much larger ring sizes in practice. This should make any analysis of the transaction sender, and the relations between them even more difficult. It should also provide broader decoy patterns, making it harder to fail to match them for your own transactions.

Obligatory Promotion of Software Testing

Having worked over many years in software testing (mainly the engineering part of it), I will take the chance for a few words about these cryptocurrency hacks and what I think about the topic, even if it is not really about Ring Signatures and Stealth Addresses. Sorry about that.

Regarding the XHV hack, the reports they released seem to point to quite lacking software development practices (search for Lessons Learned). There appears to have been little to no tests, lack of code reviews and audits, and getting any (helpful) community involvement in development seems to have had little to no support. Pretty dire reading for an open-source financial project running openly decentralized over the internet, and at times valued in hundreds of millions of dollars or more.

Reading those lessons learned, at least the XHV team seem to recognize this now and work to improve. I guess it remains to be seen how well one can change fundamental processes and peoples working practices in one go.

Of course, the XHV team are also not alone in this as, for example, about one month later (July 2021), another cryptocurrency project called Thorchain had similarly multiple hacks amounting to millions of dollars in damages. While not as bad, some of the reports I read pointed at similar limits in development processes, testing, etc. In general such hacks seem to occur almost monthly in cryptocurrency projects, especially for the newer projects with a larger new codebase (and thus untested attack surface). Maybe it goes with the territory, but maybe it wouldn’t hurt to put some thought into testing and security.

Of course, everyone makes mistakes, and I guess it is most important to learn from those mistakes. But that should not excuse ignoring even basic levels of quality assurance.

Conclusions

I hope this article provides some added understanding on what are Ring Signatures and Stealth Addresses, and how they work. I find it quite impressive how Monero applies much of the latest cryptographic research into practice, combines different cryptographic techniques to create a system of private and decentralized transactions, and has not had any big hacks like the one I partially described here.

The Monero technology can be quite complex, but in the end I find a good explanation helps make even complex topics understandable. Much like Bitcoin is similarly clever, but not too hard to understand once you find the right resources to explain it. Luigi1111’s Stealth Address article is great, too bad he never seems to have gotten around to his Ring Signatures article hinted there.

For anyone interested on how using the Monero API to analyze the blockchain might work, at the time I wrote my earlier article on Merkle Trees, I built a small tool to crawl the Monero blockchain using the Monero Daemon / node API. This let me locally analyze the data, which I also applied for this article. Maybe one of these days I will also finish the additions I made for this article and update the Github, even though it should already work as an example of the general process.

Personally, I feel like I got maybe halfway there in trying to get a good understanding of how to implement the Stealth Addresses and Ring Signatures in practice. Maybe for understanding their relations on the blockchain and tobasic Monero tracing, that could be enough already.

That’s all for today. Cheers.

Merkle Trees: Concepts and Use Cases

The data structure within Bitcoin, Amazon Dynamo DB, ZFS, …

This article explores what are Merkle trees, and how they are used in practice in different systems including Bitcoin, Amazon’s Dynamo DB, and the ZFS filesystem. The basic concept is quite simple, but some of the clever applications are not so obvious.

First, lets start with the concept of Merkle trees. As I said, it is not too complicated in its basic form.

What is a Merkle Tree

A Merkle tree is fundamentally just a hierarchical set of hash values, building from a set of actual data (Merkle leaf) to intermediate hashes (Merkle braches) and up to the Merkle root that summarizes all the data in one hash value.

Example: A Basic Merkle Tree

The following figure illustrates a very small Merkle tree:

A very small Merkle tree. Image by author.

In this figure, the bottom nodes (Data1Data4) are the actual data processed by the application. Each of these is summarized by their respective hash value (Hash1Hash4), as a Merkle leaf. From these, the Merkle tree builds a hierarchy, combining hashes together until only one is left. The nodes combining other hash nodes are called Merkle branches (here Hash12 and Hash34). When there is only one left (here Hash1234), this is called the Merkle root. There can be multiple levels of branches and hashing as following examples will demonstrate.

Handling An Unbalanced Merkle Tree

The above example illustrates the very basic case of a Merkle tree. It is a convenient example, as at every level there are just the right number of nodes to form exact pairs. What happens if you have an uneven (odd) number of leaf (data) nodes? For example, what happens to the above example if you have 5 Data nodes? You can hash Data1+Data2 together to form a Merkle branch, and same for Data3+Data4. But Data 5 is left without a pair to hash into a new branch.

Different approaches can be taken to address this. For example, in this situation Bitcoin simply copies the un-pairable (odd) hash, and uses the duplicate as a pair (the odd hash is paired with itself). The following figure illustrates this:

Handling odd (uneven) numbers of hashes Bitcoin style. Image by author.

In this example, Hash 5 has no immediate pair, and is duplicated to pair it with itself. Same happens for Hash55. The above is just one option, there are different ways to handle this pairing issue. Of the ones I have seen, I like the Monero (or Cryptonote?) one most. I will present it shortly. First, something slightly different.

A Very Simple Merkle Tree in Python

Theories and explanations are great. But for the technical person, some code helps formalize it. The following code from my Github repo illustrates a naive but simple example of implementing a Merkle tree in Python (or just skip it for more conceptual explanation after):

https://medium.com/media/c99d8470b4f41b5c0603a7f311adf474A very simple and naive Merkle tree implementation in Python. Code by author.

The last four lines in the above code run the example to create a Merkle tree with data nodes Data1Data5. The calculation of the tree with this algorithm looks like this (cropping the hashes to the first 5 characters):

Merkle tree calculation example with actual values. Image and animation by author.

I used the keccak hash function in this example, as this is what the Monero cryptocurrency/blockchain uses, and I was looking into Monero code lately. The 5th (odd) data element here (using my Python code above) is handled slightly different from the Bitcoin example, as this simply re-uses the hash if one is left over (un-pairable, cf54b above). Practically this re-use should have the same effect as the duplication in the Bitcoin algorithm.

Optimizing Merkle Calculation: Monero Style

As I was looking into the Monero blockchain to understand it better, I found it has a different but clever approach to hashing the Merkle tree. The code for it is available in the tree-hash.c file in the Monero Github repository. And a Python version I made to implement the same in my Github.

The Monero approach could be described as converting the hash tree to a perfect binary tree. It hashes enough leaf nodes in the first iteration, so that the following iterations will always have some variant of 2ˣ (a power of 2) nodes. The following figure illustrates this:

Monero approach to Merkle tree building, with 5 nodes (transactions in Monero). Image by author.

This works as follows: First, calculate x so that is greater than number of transactions (data nodes). In this example, it is 2³=8, because 8>5 (there are 5 transactions/Data nodes). The value of x before that, 2²=4, would not fit (4 > 5 is not true). From this, the number of transactions is subtracted. In this case 8–5=3. This 3 is the index of transaction to start iteration 1 from. With 0-based indexing, the start index is Data4 in this example.

Above explanation is maybe a bit abstract. Below is a table with concrete examples of transaction counts and how these are converted into a “perfect binary tree” shape (size is always 2ˣ, eliminating any “odd” counts or leftover un-pairable hashes from following iterations) for iteration 2 and following iterations:

Example calculations of Merkle size with the Monero (Cryptonote) algorithm. Image/Table by author.

The columns:

  • Transactions: Number of transactions in the block being calculated
  • Closest 2^x: First 2ˣ that is bigger than transactions
  • Iteration 1 start index: start hashing pairs from this index in transaction list
  • Iteration 1 values: Number of transactions in transaction list starting from iteration 1 start index to the end of transaction list
  • Iteration 1 pairs: iteration 1 values divided by 2 (because hashed in pairs)
  • Iteration 2 formula: How the above values lead to number of items to hash in the following iteration
  • Iteration 2 size: The number of transactions to hash in iteration 2. As you can see, it is always 2ˣ, and leads into a “perfect binary tree”.

Too many numbers and formulas I guess. Did you fall asleep yet? Anyway, if you pick the row from above table with 23 transactions, you can follow this nice 8-bit animation I made on how the Merkle root calculation progresses for that row:

Calculating Merkle root in Monero for block 1407480. Image/animation by author.

In this case (animation above), the start index is calculated as 2⁶=32–23=9. This is transaction number 10 (zero index adds 1). The following iteration then has 16 nodes, which is 2⁵, next one 8 or 2⁴, and so on. The animation above actually represents how the hash array is manipulated in memory by the Monero code.

Example Use Cases

Concepts are nice, while concrete, real-world, use cases make the point. Here I will discuss the use of Merkle trees in blockchains/cryptocurrencies, and Amazon’s AWS DynamoDB distributed database. I will also briefly touch on the ZFS filesystem, and the Git version control system, as these are sometimes also mentioned as examples of Merkle tree use.

Cryptocurrencies and blockchains: Bitcoin, Monero, et al.

As already briefly discussed above, Bitcoin and similar cryptocurrencies make use of Merkle trees to summarize and validate the transactions in a block, and embed the Merkle root into their block header as a summary of it all. The following figure illustrates this:

Blockchain and Merkle trees. Image by author.

Each block has an ID value, which is the hash of its header fields. A part of this is the Merkle root. Another part is the previous block ID (Parent in above figure). By linking with the previous block ID, and including that as part of the next block ID, the blocks form a blockchain. By embedding the Merkle root in it, they make an immutable record of transactions in the block.

For example, the Bitcoin block header contains:

  • Difficulty target value (called bits in Bitcoin)
  • Merkle root of all transactions in the block
  • Nonce; a value changed in mining to find accepted blocks
  • Previous block hash (block ID); linking this block to the previous block in chain (this previous hash is named parent in figure above)
  • Timestamp; Time of mining (creating the hash for) the block
  • Block version; identifies support features and formats (also used to identify hash function in some blockchains such as Monero)

These header fields are all hashed together to form the block ID. Making the block ID a hash of all header data makes it practically impossible to modify a block header. Including the Merkle root in the block ID further makes it practically impossible to modify the transactions in a block.

But let’s look in a bit more detail into how these types of blockchains actually use the Merkle tree in different ways. It’s all about data integrity, but in different and sometimes ingenious ways.

Blockchain Use Case 1: Transaction Data (Block) Immutability

As I noted above, the base use case of the Merkle tree in Bitcoin style blockchains is to build the Merkle root into the block header, and using it to verify that no transaction has been changed.

As an example, let’s change Data4 to Data6 in the earlier example:

Changing one node value invalidates (changes the root of) the Merkle tree. Image by author.

Comparing this to the earlier example, with Data4, the Merkle root was aa8d3. With Data6, it is now f8932. This way, any change in any transaction data changes the Merkle root, and it no longer matches the one stored in the blockchain for that block (in block header). And through the block header, this issue would propagate to the whole blockchain following this block.

However, if you think about it for a while, you may ask a question:

Do you really need Merkle trees to verify the block data?

No. Instead, one could just concatenate all the transaction data together and build a single root hash. Consider the above example with Data1Data5, but just hashing them all together:

Concatenate+hash all data at once. Image by author.

Now, changing just one data value will have the same effect of invalidating the summary hash:

Hash change detection in concat+hash all at once. Image by author.

This single hash could also be used in place of the Merkle root in the block header and block hash. Same effect so far.

So what is the real benefit of using a Merkle tree here? To summarize, I would say it gives you more granularity in the hash verification, and enables other clever tricks to process the blockchain more efficiently. While also providing the transaction integrity assurance / verification.

The original Bitcoin whitepaper provides two additional examples for use of the Merkle tree: blockchain pruning and simplified payment verification (SPV). Beyond these, the one I really like is the Stratum mining pool protocol. Let’s look at these next.

Blockchain pruning

Over time, a blockchain will grow larger and larger as it accumulates more and more blocks. For example, today (Feb/2021) the size of the Bitcoin blockchain is about 380GB. Blockchain pruning is an approach to reduce this used space with the help of Merkle trees. By removing used transactions from local storage where no longer needed.

We want to have all data around for full scale verification and history, but not all nodes in the peer to peer network need all the data. The blockchain pruning approach proposed in the original Bitcoin whitepaper suggests addressing this by using the Merkle tree to prune (remove) spent (used) transactions from the blocks.

The following figure illustrates this:

Three out of five transactions used (spent) in the block (and Merkle tree). Image by author.

In this example, we have used (spent) transactions TX1, TX2, and TX4. We still have unused transactions TX3 and TX5 left in this block. Assume we are not interested in a full node with full archival, just keeping a list of spendable transactions. Satoshi’s proposal was to prune the block from the used transaction data, and simply leave the Merkle tree branches needed to verify the unspent transaction data:

The Merkle tree pruned of the used transactions. Image by author.

This is perhaps more clear if we also spend TX3:

One more transaction used, leaving just one unused in the Merkle tree (and the block). Image by author.

The block could now be pruned to only contain the TX5 data, and the hash of the other Merkle tree branch (96b8d) in the block:

Pruned Merkle tree with only one transaction left. Image by author.

We would just have saved the space require to store 4 our of 5 transactions, and 6 out of 9 Merkle tree nodes. About 80% space savings on that example. The longer the blockchain gets, and the more it is used, the more spent transactions it would have. And thus more space could be saved. It’s a very clever idea, like most ideas in the original Bitcoin whitepaper.

But much as with block verification, we can always ask the question:

Do you really need Merkle trees to prune the blockchain?

The Bitcoin StackExchange is full of insightful discussion on the actual ways pruning is applied in the actual implementations. While the Merkle tree pruning is a clever approach, it is not applied in this way in (at least) the Bitcoin core software.

Instead, the unspent transactions are stored in a separate database for fast lookup. This database is initialized on first startup by scanning the blockchain and updating the database as new and spent transactions are found. The database is continously updated as new blocks are broadcasted for the Bitcoin network. A pruned node could then just rely on this database.

Running a pruned node in general is sufficient for the basic operations, but does not fully support all features of a blockchain, so some nodes will still need to hold the full data. But that goes beyond the scope of this article.

I think the basic takeaway here is that Merkle trees are cool but sometimes the basic and even simpler approach is fine, and even better. Of course, the hard part is identifying when this is true and when the cooler (Merkle) approach is really best. And don’t take this in any way as a suggestion to not use Merkle trees. Just to think about the overall picture. Also with Bitcoin, and related blockchains, I believe the Merkle tree enables many other things and thus makes a lot of sense. As following sections will show.

Simplified Payment Verification (SPV)

The simplified payment verification (SPV) algorithm was also suggested in the original Bitcoin whitepaper. In SPV, a light-weight blockchain client only stores the block headers, but also wishes to verify payments it receives as valid transactions in the blockchain. Lacking full transaction details, the SPV client uses Merkle trees to effectively verify the transaction details in collaboration with full nodes.

I will re-use the example from blockchain pruning above. The SPV client wants to verify TX5 in the given block. The following figure illustrates this:

SPV client node verifying a single transaction is part of a given Merkle tree. Image by author.

Here, the SPV client node asks a full node for the Merkle branches required to build the Merkle root with the TX5 data. By rebuilding the Merkle root (aa8d3) from the transaction of interest (TX5), and the Merkle branches (96b8d) provided by the full node, the SPV client can have confidence in having received a valid transaction. By checking this re-built Merkle root against the one stored in the block header, it can verify that the whole tree (and thus TX5) is in fact valid, and is a part of the blockchain.

I find SPV to be an interesting example of how Merkle trees can be used, together with (block) data filtering (Bitcoin uses Bloom filters but that is again another topic), to synchronize and verify existence and correctness of selected data in a distributed system.

Pool Mining: The Stratum Protocol

The traditional way to create cryptocurrencies has been mining via proof of work (PoW) hashing. Mining pools are a way for smaller miners to join together and get some rewards for mining, based on how much compute (hash) power they contribute to the pool. This leads to the requirement for a central entity, the mining pool, to be able to coordinate the miners. It needs a way to effectively distribute and track the overall mining process across all clients. With some clever tricks, the Stratum protocol uses Merkle trees to enable this.

Especially Stratum version 1 makes use of Merkle trees to distribute the work efficiently. The pool server provides the miner nodes with the block header elements needed to build the block. This includes the partial Merkle tree, with the branches calculated for all other transactions except the Coinbase transaction. The coinbase transaction is a special transaction that pays out the mining reward. This is illustrated by the following figure:

Merkle templating with the Stratum mining pool protocol (v1). Image by author.

This figure contains three elements: the Merkle tree template, the Coinbase template, and the search for the nonce(s). The pool provides the Merkle tree template to the miner, containing the pre-calculated Merkle branches for the transactions in the block (here 96b8d, thus highly reducing bandwidth and calculation needs for miners). The miner is then expected to fill the Merkle tree template with a suitable coinbase transaction, built from the server provided coinbase template. The winner is one who fills the template with values providing a hash that matches the blockchain network difficulty level. Meaning just a hash with suitable small value.

The coinbase template is provided by the pool, and is filled by the miner using their choice of nonce and extranonce fields. When combined with the Merkle tree template, each of these provides a different block hash, and the only way to find a winning block is to try values until one produces a hash matching the network difficult target. If the miner finds a nonce that matches the network hash difficulty for this Merkle template, it submits it to the mining pool. The coinbase template contains the mining pool address, ensuring that the pool receives the block reward, and can distribute it to all the miners.

In a broader context, the Merkle tree here is used to distribute a partial solution (the pre-calculated Merkle tree branches and coinbase template), while allowing different distributed nodes to work independently to try to find the solution to the missing part (nonces for the coinbase transaction to build an acceptable PoW hash). By embedding the pool address as part of the template, it ensures that all distributed nodes contribute to the common goal and can share the rewards.

AWS Dynamo DB

Dynamo DB is a distributed database provided as part of the Amazon Web Services (AWS) platform. It was originally designed to handle much of Amazon’s retail global infrastructure needs. As such, a lot of consideration has been put into making it scalable and optimizing the efficiency in all parts of its architecture. The architecture of Dynamo DB is described in the AWS Dynamo DB paper from 2007, including its use of Merkle trees to efficiently synchronize diverged nodes (section 4.7 in the paper).

Dynamo DB is what is called a key-value store. The actual data (values) are stored in data nodes, and identified and indexed by their keys. Dynamo DB hosts this data in what it calls virtual nodes. Each virtual node hosts a key-range. To handle high-availability and scalability requirements, Dynamo DB distributes the data across these key-ranges, and hosts each key-range on multiple virtual nodes (across multiple physical nodes, or partitions).

The following figure tries to illustrate this, with 3 (virtual) Dynamo DB nodes, each holding 2 key ranges, each key range duplicated across two nodes. I am sure this is not exact to all the details, and the system had evolved since the paper was published, but I believe it depics the general concept well enough:

DynamoDB virtual nodes and distributed key ranges. Image by author.

In this type of a distributed system, there will eventually always come a time when some of these virtual nodes are out of sync with other virtual nodes holding the same key-range. Dynamo DB uses Merkle trees to perform effective comparison and synchronization of the key-ranges in these nodes. In the above figure I made a small 3-node Merkle tree inside each key-range to illustrate this The KR3 key-range tree is shown partially in red to illustrate how the nodes would have diverged and need resolving to find correct values.

A Merkle tree is built for each key-range, where the leaf nodes of the tree are the key-range data values. The Merkle root summarizes the data in each node. By comparing the Merkle roots of each virtual node hosting the same key-range, divergence in nodes is immediately visible. This full comparison requires only the communication and comparison of a single value, the Merkle root. If there is a difference, one can keep comparing the branches of the tree to efficiently find the exact divergent spots, and the data values to synchronize.

The following figure illustrates this process:

DynamoDB using Merkle trees to find data nodes out of sync. Image by author.

As you can see, I just copied the blockchain example into the above figure. Because the fundamel structure, and the concepts it relies on, are exactly the same! The difference is only in the (clever) application of the same fundamentals in a different context.

Yes, some details would likely differ. For example , DynamoDB likely uses a different hash function than Keccak. But, as illustrated here, the fundamental approach and concepts are the same, and as such applicable to more than just the common blockchain example of Merkle trees.

ZFS File System

ZFS is a filesystem supporting data spread over multiple volumes, a form of distributed filesystem. Multiple ZFS volumes can be grouped into storage pools, which can also host redundant copies of the data. ZFS is advertised as using Merkle trees to ensure data integrity.

ZFS uses Merkle trees to checksum data, to identify issues where some part of the data written to, or read from, disk is corrupted (or misread etc.). As in most filesystems, the data is stored as blocks of specific sizes. These blocks form a natural leaf node for building a Merkle tree.

The main rationale I found online for ZFS to use Merkle trees always discusses how ZFS stores the Merkle tree as a checksum outside the block itself, providing higher resiliency for identifying errors. This leaves it quite unclear to me why you need a Merkle tree for this, and not just store a general checksum outside the block. Sometimes I feel people should ask more questions.

On the other hand, I could see the benefit of verifying data across multiple storage pools and their replicas by using hierarchical Merkle trees. This case would make it very similar to how DynamoDB operates and uses Merkle trees to sync nodes (vs ZFS volumes). However, I could not find any description for using Merkle trees for this. If someone knows better, I would be happy to hear.

In any case, this seems like a very similar case to Dynamo DB, as being used to verify integrity of a distributed data storage system more efficiently. With the added benefit of separating the consistency properties (checksum hashes) from the data itself, for increased resiliency.

Git Version Control System

Another example that I found referenced as an example of Merkle trees is the Git version control system. Git does use a lot of hashing, and its basic functionality is heavily based on hashing file contents and various metadata. Git forms a directed acyclic graph (DAG) of commits identified by these hashes. Which seems to be considered by some to be a Merkle tree. I disagree, so I will use this as an example of some nuances in the concept.

A DAG is a graph where connections go one way (directed), and there are no cycles (acyclic). Unfortunately, I do not understand how the Git DAG would be a Merkle tree. The following figure represents a Git DAG with a few of the most common Git graph actions – branches and merges:

A made up DAG representing Git commits. Arrows point to parent commits. Image by author.

Here you have nodes that are identified by their hashes, and a tree-type structure where each node points to a parent, and has one or more children. However, there are multiple issues I see in calling this a Merkle tree:

  • Nodes in a Merkle tree do not have multiple parents. The DAG does. The figure above has two such nodes: 8e290 and 184da.
  • Merkle tree only hashes actual data in its leaf nodes, the Merkle branches after the leaf nodes are hashes of hashes. The Git DAG does not have branches based on hashes of hashes, just nodes with hashes of raw data.
  • The Merkle tree starts with the leaf nodes, and proceeds up the tree, constantly halving the number of branches in each iteration. The Git DAG contracts and expands all the time with branching and merging. Because it is a graph and not a tree.

Each node in the Git DAG hashes its parent hash ID as part of its own hash identifier, and it is ordered in time (directed). As such it actually very closely resembles a blockchain. Unlike cryptocurrencies, it allows forking (branching), and integrates those as a central part of its structure. However, I see no problem to relax this as a requirement for a general blockchain definition. And I do not see a blockchain requiring a Merkle tree, just as I do not see Git DAG as one. It is just useful for cryptocurrencies to have one, but you could have a different form of one, with other mechanisms, such as Git.

In summary, not everything needs to be a Merkle tree, even if it is cool. Although you should consider Merkle trees where it makes sense, because they are cool. However, I do not see the Merkle tree in the Git DAG (or elsewhere in Git). If you know better, happy to hear and learn :).

Conclusions

I find Merkle trees are a simple but clever data structure. And I like my algorithms and systems best when they are simple but clever. The examples I covered in this article were

  • blockchain integrity verification,
  • blockchain pruning
  • simplified payment verification,
  • mining pool protocol for work distribution,
  • DynamoDB data synchronization
  • ZFS checksumming (and possibly node synchronization)

What is common in all of these? All of them use the Merkle tree as a tool to take a potentially large set of data elements, and hierarchically verify their integrity in different steps. This allows for efficient data communication, and efficiently processing the verification of the overall data, going into more details only as needed. This, I belive sums up what I see as the best use case for Merkle trees: Enabling efficient verification and synchronization of data over distributed nodes.

Of course, then there is the stratum mining protocol, that takes it a bit further and uses Merkle trees as a means to distribute work, while controlling specific properties of the output, in a decentralized and untrusted setup. Cool.

Well, in most day-to-day software engineering tasks there is not often much chance to use, and build on, these features. But it certainly is interesting to learn them, and thus better understand what some of the latest technologies built on them are based on. And to make those informated decisions when the time is right. Right?

Zero Knowledge Proofs: Example with Pedersen Commitments in Monero

An Explanation Between Cartoons and Greek Symbols

If you spend a bit more time reading about cryptocurrencies, blockchains, or many other related technologies, you likely run into the term zero knowledge proofs (ZKP). To me, the term sounds like a paradox. How can you prove anything with zero knowledge? How do you event know what to prove, if you have zero knowledge?

So I tried to build myself some understanding on this. In this article, I try to share this understanding of what zero knowledge means in ZKP, and what is the proof really about. And how do the two relate to each other.

I originally posted this articke on Medium, where it can still be found.

I start with the simple example of the ZKP and Ali Baba cave, which seems to be often used as a simplified ZKP example. I then proceed with an example of what ZKP might look like as a cryptographic approach, and how Monero uses Pedersen Commitments to prove input and output amounts match without revealing the actual amounts. I look at it as a zero knowledge proof of the matching sums of input and output amounts, without revealing the amounts.

My goal in writing this article was to build a more intuitive understanding that fits between the Ali Baba cave, and the cryptographic theories. So no formulas with Greek symbols, but not just a Alice and Bob in a cave either.

The Basic Example: Ali Baba Cave

The usual example given for Zero Knowledge Proofs is the Ali Baba cave. This is a circular cave containing a door that unlocks only with a specific passphrase. Alice claims to know the passphrase, and wants to prove this to Bob, without revealing anything about the passphrase. The following image shows the layout of the imaginary cave:

Layout of the Ali Baba cave in this story. Bob image by David Rock Design on Pixabay, Alice image by Graphic Mama Team on Pixabay. Rest of the image by author.

The overall story is that Alice enters the cave, and walks up to Point 3 or Point 4. From there, she can return to the entrance via two paths, Path A and Path B. Once she is at one of the points, Bob comes to the entrance (Point 2) and calls for her to arrive there via one of the paths, which he chooses on random. Depending on which side of the door Alice is, she always has a 50% chance of being able to return to Bob without passing through the door. Otherwise, she needs to know the passphrase. Bob does not know if she used the door or not, he only sees her arrive through the correct path he called out.

The points of interest, as I marked then in the figure, are:

  • Point 1: Alice and Bob meet outside the cave. Alice enters the cave and walks to Point 3 or Point 4.
  • Point 2: A fork of two paths leading deeper into the cave. Once Alice reaches Point 3 or 4, Bob comes here, picks Path A or Path B,and calls for Alice to arrive via the chosen path.
  • Point 3: If Alice waits here, and Bob calls for her to come out through Path A, she can walk there without knowing the passphrase to the door. If Bob calls for Path B, she has to open the door to arrive through Path B.
  • Point 4: Same as Point 3, but here she has to know the passphrase if Bob calls for her to arrive via Path A.

The following animation illustrates the different scenarios that can play out:

A silly animation on Bob and Alice in Ali Baba cave. Bob image by David Rock Design on Pixabay, Alice image by Graphic Mama Team on Pixabay. Rest of the images and animation by author.

I initially found the Ali Baba cave to be a quite confusing example of Zero Knowledge Proof. Because we obviously know a lot, including:

  • There is a door in the cave
  • The door is locked, and there is a magic passphrase to open it
  • Alice is in the cave, (at the door)
  • Bob is outside
  • The cave is circular, and there are two paths in/out of it, with no shortcuts between them
  • Alice claims to know the passphrase
  • After the “proof” is presented, we somehow trust Alice knows the passphrase

So we know a lot about this problem, and the term zero knowledge seemed confusing to me. I eventually figured the following about this problem:

  • The protected secret is the passphrase
  • The Zero Knowledge here refers to not knowing anything that helps reveal the passphrase (or more generally, the protected secret)
  • The proof is in the action of Alice arriving through the correct path every time. Repeated so many times that, statistically, starting on the called side (thus no passphrase required) every time is incredibly unlikely
  • In the end Bob gains trust that Alice knows the secret, but gains zero knowledge of the secret itself (the passphrase). He may gain other knowledge, such as Alice arriving through correct path.

If Alice does not know the passphrase for the door, she has a 50% chance to pick the right path on the first run, 25% to get it right twice in a row, and so on, until the joint probability of always getting it right gets statistically incredibly small.

There are various other properties that are often mentioned as required for a good ZKP. Stack Exchange provides nice and concise definitions for two main ones, soundness and completeness:

  • Soundness: the proof system is truthful, so whatever it proves, it is true. If the proof shows Alice knows the passphrase, she should really know it.
  • Completeness: the proof system is comprehensive, so it can prove all true statements. If Alice knows the passphrase, the system should always show it as true. You cannot know, and fail to prove it.

I recommend the Stack Exchange post for more details, if interested.

Cryptographic Example: Matching Sums in Monero

The Ali Baba cave is an interesting ZKP example, but for me it was a bit hard to figure out how this relates to ZKP in blockchains, cryptocurrencies, and similar constructs. For this, I had a look at how Monero uses the Pedersen commitment to hide the amounts in transactions, while at the same time verifying that the output sums match input sums.

Before going into Monero and its use of the Pedersen Commitment, I will present some base concepts: Elliptic Curves (EC), and their use for basic commitments. Followed finally by Pedersen Commitments on Monero.

Elliptic Curves

Elliptic curves (EC) are a data structure that is commonly used in public-key cryptography. Their mathematical properties make them suitable for creating secure keys and for other cryptographic operations. They are also used in building the Pedersen Commitment in Monero, so we need some basic understanding about them for this article.

EC use in cryptography is referred to as Elliptic Curve Cryptography (ECC). In relation to cryptocurrencies, two commonly used (and standardized) curves are curve 25519 and Secp256k1. Secp256k1 is used by Bitcoin, and 25519 by Monero and many other cryptocurrencies (and other applications). The curves are commonly visualized like this:

Elliptic curves Secp256k1 and 25519 visualized. Image by author.

Generally, the actual elliptic curve is described as a more abstract structure, but this visualization serves as a way to get a bit more concrete idea of what is being discussed. The elliptic curve’s curve points are used as a basis for many operations in ECC. The red dots in the following figures illustrate such points:

Line and curve points over Secp256k1 and 25519. Image by author.

The red lines in these figures also illustrates an important concept related to ECC operations such as adding curve points together. This is the concept of finding lines that connect two dots on the curve, intersecting the curve at a third point. So, a line and three points it passes through.

The code I used to plot these base curves (and some of the following examples) is available on my Github.

Basic Commitment

So what is a commitment? We could look at the dictionary for a general definition, but here it refers to defining (binding) a value in a way that it cannot be changed, while hiding it. Since the value cannot be changed, we can say we are committed to it. At a later time, we can reveal the value by providing a secret to open its container. This leads to the two defining factors commonly used for (cryptographic) commitments:

  • hiding: until the committed value is revealed, it cannot be discovered
  • binding: once a commitment is made, its value cannot be changed

There can be different levels of hiding and binding, typically the tradeoff is between being perfectly hiding/binding and computationally hiding/binding. But that is more details than needed for this article at this point.

A typical example of a simple commitment is putting a piece of paper (e.g., a voting ballot), with a number, in a locked box, and providing the key later:

A basic commitment scheme example: Big chef, Little chef, and a ballot commitment scheme. Chefs, chest, and key from Pixabay. Thanks for the art! Combined story image by author.

In the above example, the big chef puts a voting ballot (with number 5 written on it) into the locked chest. The chest is then given to the little chef, who has no key. The big chef has now committed to a vote. It is inside a box, with no visibility, thus it is hidden. It can no longer be changed as the box is locked and given to someone (little chef) who has no key. Thus, the commitment has become binding. But little chef does not yet know what the vote is. At a later time, the key is given to little chef, and the vote (commitment) is revealed.

Basic Commitment with ECC

Like the above example with the chefs, Elliptic Curves can be used to build cryptographic commitment schemes. I will try to illustrate this here, starting from a simple example using a single curve point, and leading to the Pedersen Commitment. I borrow some basic ideas from the example in the Grim documentation. Thanks for the ideas!

As for notation, I will use a to represent the committed value. Mainly because if you read about Monero, this seems to be the general notation used. I guess referring to the committed amount of Monero.

Using a Single Curve Point

A curve point named H is defined and made public. We call this the public base point. This base point is actually so public that it is generally defined in the curve’s standard specification, such as for curve 25519 and for secp256k1. With a as the committed value (amount), the commitment c based on H becomes c = a * H. This c defines another point on the same Elliptic Curve, based on the base point H. The following figure illustrates this process:

Basic commitment with Elliptic Curves, and a simplified visualization. Image by author.

Here, the blue point is the base point H on the elliptic curve (not the real one for 25519, I just picked something for illustration). The commitment value would be the red point, which would equal to a*H, in this case with value a of 5, c=5*H.

Disclaimer: The below figure illustrates the simplified (fake) notation of Elliptic Curve math I use in this article, with values increasingly moving on the curve in the direction of the dotted red line. For example, 5*H in this figure is 5 line segments from H into the direction of the arrow. I use such increases for more intuitive illustration, real Elliptic Curve math is more complex and behaves different. I will describe the actual math briefly at the end of this article for anyone interested, with some link to find out more. My simplification here is just for illustrative purposes.

Illustration of the simplified visualization I use, points moving in direction of red line. Image by author.

Curves vs Chefs

Lets consider the above simple curve commitment as the chefs example. The big chef calculates c=5*H, and publishes c. The little chef does not see the value a (5) being committed, he only sees curve point c. To reveal the commitment, the big chef then reveals a=5, and the little chef can verify that a*H=c. Due to properties of Elliptic Curves and related mathematics, finding any other value x where x*H=c ,other than a, is considered very difficult, and computationally unfeasible (outside Quantum computing). Thus, this commitment is considered computationally binding.

Once a is revealed, the verification of c is trivial. Because H is a public value, the little chef knows it. Because c was published earlier , he can check a*H=c when a is revealed. If it matches, the commitment and the given a is verified.

If we compare this to the chefs example with the chest earlier, the following concepts should match:

  • chest = published (commitment) curve point c
  • ballot = committed value a
  • key = curve point equation a*H. Or just a, since H is known.

The Problem Here

The problem with the above example is that this process is good at binding the commitment value a, but very poor at hiding it. Lets assume we have 10 choices of values we could put on the chefs ballot, from 1 to 10. It would be trivial to brute force all these point values. Just calculate a*H for all a=1–10, and compare the result to the commitment c.

To address this, something called the blinding factor is used.

The Blinding Factor

The blinding factor is commonly referred to as r. This r is just a (secure) random number. Generally in range 1-2²⁵⁶, so very large. A slightly naive approach is to add this to a as in (r+a)*H. The following figure illustrates this with r=11 and a=5, resulting in published c=(11+5)*H=16*H:

Commitment to 5, with a blinding factor 11. Image by author.

Here, r is shown as the green dot. With this, there is no way to tell what a is, without knowing what r is. However, this has an added problem where any combination of a+r could be revealed as the commitment. Thus this approach would lose the binding property, as the commitment could be changed after publishing it. For example, for the above example we could publish any of (a=4, r=12), (a=5, r=11), (a=6, r=10), or many others as the commitment where the sum of r and a is 16. The commitment needs to be binding.

Pedersen Commitment

And solving this problem finally moves on to the Pedersen Commitment. For this, another base point G on the same curve can be used. This leads to the final commitment form of r*G+a*H=c. This is finally what is called the Pedersen Commitment (with EC). Using my made-up EC notation, we could visualize this like so:

Pedersen commitment on Elliptic Curve 25519, to value 5, and blinding factor 11*G. Image by author.

Now we are back to computationally binding, and at the same time hiding commitment. As I mentioned earlier, due to properties of elliptic curves, and the associated mathematics, it is considered extremely difficult to forge a commitment with different values of r and a, so that it would still match the published c=r*G+a*H. And since both r and a now use a different base point, they cannot be changed as when just summing (a+r)*H above. That’s what they tell me, and it’s enough for my goal in this article.

Homomorphic sums: Hiding the Values in Monero

Monero applies the Pedersen Commitment to hide the actual amounts in transaction inputs and outputs, while at the same time verifying that the sum of amounts in inputs and outputs match. Sounds like sorcery, so lets see how this works.

The homomorphic property of the Pedersen Commitment means that we can actually perform mathematical operations on the encrypted values, and it works just as if the values were not encrypted. In this case, this is based on the homomorphic nature of the Elliptic Curve math. We can simply add curve points together, and compare the results, even without revealing the actual transaction Monero amounts. This is what the magic in the matching Monero sums without knowing the values is based on. But I need concrete examples, so lets make one.

First, a brief description of how cryptocurrency transactions generally works. At a high level, a typical transaction looks something like this:

A cryptocurrency transaction at a high level. Image by author.

Each transaction takes a set of inputs that are spent (TxINs). Spending here means the cryptocurrency in those inputs are transferred to some output addresses on the blockchain, as transaction outputs (TxOUTs). An associated transaction fee is taken by the network, and treated as one of the TxOUTs. In Bitcoin, the transaction inputs and outputs, and their values are visible for anyone to see. In that case, the transaction looks pretty much just like the figure above, for anyone observing the blockchain.

Monero is a cryptocurrency especially focusing on privacy, and does many things to hide various parts of these transactions from the public view. One of those is to hide the amounts being transferred. Monero uses the Pedersen Commitment to hide the transaction amounts (values). In this case, the process now looks something like this:

A (Monero) transaction with amounts hidden. Image by author.

In this case, the input and output values are hidden from the public view. Only the fee is made visible. The only way to see the hidden values is if you have access to the required keys to decrypt the transaction. Thus the values are only visible to the sender and receiver. No other blockchain network nodes, or any other outsider, can see the concrete values that are in the transaction.

However, to maintain a valid ledger of transactions, the system must be able to validate that the total sum of the transaction outputs match its inputs. That is, no bigger amount of Monero is spent than is put in. Otherwise people could just spend some small amount of coins (Monero) in the transaction inputs, and create transaction outputs for much larger sums. This would effectively create Monero out of thin air, and break the entire system. Kind of like the central banks, and the associated monetary system in the “real world”, except we are still waiting for that pyramid to fall ;).

Matching input and output sums

As described earlier, EC Pedersen Commitments used by Monero use two public curve points, G and H. Each input and output Tx (TxIN and TxOUT) has its own Pedersen Commitment, so each one has its own a and r defined, like this:

Example transaction with all commitment values except fee blinding factor. Image by author.

In the above, a always matches the amount of Monero used in the TxIN or TxOUT. As I discussed earlier, the r is basically a large random number. Here I just picked some (small) numbers to illustrate the calculations.

The r in the above figure now becomes the G multiplier, and a the H multiplier. Generally, the numbers (especially r) would be bigger, but these work for an example.

TxIn Commitment

For the TxIN inputs in the above, the commitments would now be (using r and a values from image above):

  • 10$: c = 14*G + 10*H
  • 30$: c = 85*G + 30*H
  • 10$: c = 45*G + 10*H

Using the homomorphic addition property, the total commitment of all the TxIN’s is then:

  • Cin: 14*G + 10*H + 85*G + 30*H + 45*G + 10*H = (14*G + 85*G + 43*G) + (10*H + 30*H + 10*H)

TxOut Commitment

For the TxOUTs, the commitments would be:

  • 8$: c = 33*G + 8*H
  • 40$: c = 28*G + 40*H
  • (fee) 2$: x*G + 2*H

The blinding factor for the fee is still undefined (x) for now. To calculate x, let’s sum the other parts first.

Using homomorphic addition, the total commitment of all the TxOUT’s is:

  • 33*G + 8*H + 28*G + 40*H + x*G + 2*H = (33*G+28*G+x*G) + (8*H + 40*H + 2*H)

Visualizing the Amount on the Elliptic Curve

It is perhaps simpler to look at the a and r calculations here separately (for points H and G respectively). First the a and H:

  • TXin: 10*H+30*H+10*H=50*H
  • TXout: 8*H + 40*H + 2*H=50*H
Matching the sum of input vs output amounts, via their commitments. Image by author.

Both of these end up at the final curve point of 50*H. The final curve point is the part that is published in the transaction (as part of the commitment, combined with the blinding factors*G). The homomorphic nature of it allows comparing the final points to see that the sums match in input and output. This gives the binding property, as finding other matching a and r values would be computationally unfeasible, while allowing to match the amounts.

Next, we need to add all the blinding factors to gain the hiding property.

Visualizing the Blinding Factor on the Elliptic Curve

As for amount H above, for the blinding factor G we can do the same:

  • TxIN: 14*G+85*G+45*G=144*G
  • TxOUT: 33*G+28*G+x*G=61*G+x*G

As noted above, the x in these equations is the r for fee. Using my simplified EC visualization approach, it looks like this without the fee:

Initial setup of the random blinding factors for the Pedersen Commitment. Image by author.

Now we need to make also the TxIN and TxOUT curve points match here. TxIN final curve point is 144*G, so we need to make TxOUT final curve point match that.

Matching the TxIN curve to the TxOUT Curve Using Fee Blinding Factor

As you may have guessed, we can match the TxIN and TxOUT curve points for G like this:

  • r for fee (x above) = (sum of TxIN multipliers for G) – (sum of TxOUT multipliers for G, without fee) = (14+85+45)–(33+28)=83

The commitment for the fee then becomes:

  • 2$: c = 83*G + 2*H

Filling in the last missing spot (x) in the transaction, we get:

Using the fee blinding factor to balance the curve point sums. Image by author.

And when we plug this into the total output G calculation and figure above, we get the following:

  • TxIN: 14*G+85*G+45*G=144*G
  • TxOUT: 33*G+28*G+x*G=61*G+(144–61)*G=61*G+83*G=144*G

And we can now see that both input and output end up at the same curve point for G just like for H:

Using the fee to match the sums (and EC end points) of Pedersen Commitment blinding factors. Image by author.

The total summed commitments are now:

  • TxIN: 144*G + 50*H
  • TxOUT: 144*G + 50*H

We can now subtract the summed input and output commitments, and check if they match the commitment to zero.

Commitment to Zero

To prove that the total transaction inputs match the outputs, we can compare the sum of their commitments to a commitment to zero. Besides my bank account balance, commitment to zero here simply means:

  • z = 0 * H + 0 * G

And if we calculate the difference of the input and output commitments from the above, we get:

  • TxIN commitment (c_in): 144*G + 50*H
  • TxOut commitment (c_out): 144*G + 50*H
  • c_in-c_out: (144*G+50*H) – (144*G+50*H) = 0*G + 0*H
  • This is the same as commitment to zero above (z = 0*G + 0*H)

And so we have successfully proven that our total sum of inputs matches the total sum of outputs. Without revealing anything about the actual transaction amounts. This is true, since only the single, final curve points are included in the transaction data. There is no r or a in any form included in the public commitment. Only thing included is the final curve point that is the sum of 144*G+50*H.

This should reveal no information about the amounts, and, I believe, provides a practical example of zero knowledge proof with more cryptographical constructs. It proves the input and output amounts match, but tells you nothing (you gain zero knowledge) about the actual amounts.

The trialing code I used to make the above calculations and the verify that the input and outputs commitments and their differences comes to match the commitment to zero is available on my Github.

I am still no in-depth expert on cryptography, but for anyone interested to read more on homomorphic encryption and its relation to ZKP, a good start seems to be this post on Stack Exchange (as usual..). However, a brief look into actual EC math is in order after all the simplified examples I did above.

EC Addition and Multiplication, Achtually

All the above examples of elliptic curve addition, or multiplication, I showed the points simply move along the curve by x line segments, where x was the base point multiplier (x*G or x*H). For example, for 5*G I simply moved the point 5 line segments forward on the curve starting from G. This is not how real EC math works.

First, a simple addition of two points. Adding two points on an Elliptic Curve gives you a third point. The following figure illustrates this in four steps:

Adding two points to get a third (or fourth..) point on an Elliptic Curve. Image by author.

In the above figure the four steps are the following:

  • Step 1: Define the points you want to add (P1+P2).
  • Step 2: Find a line connecting these two points.
  • Step 3: The line should always intersect the curve at a third point. Find that third point (P3).
  • Step 4: Reflect P3 across the x-axis to find P4. P4 is now the result of adding P1+P2. So P1+P2=P4 in the above figure.

Why is the Elliptic Curve math done like that? I have no idea, none of the materials I read explained that part. But for me it is enough to have a good enough idea of what is happening and discussed for this topic.

Elliptic Curve Cryptography takes a bit specialized approach to this. Instead of adding two different points together, a public base point is defined, and added to itself N times. Using G to describe a base point, the following figure again illustrates this process in another 4 steps:

Adding a base point to itself on an Elliptic Curve. Image by author.

Again, G here is just a random point I picked for illustration, not any official point. In the above figure, the four steps are now:

  • Step 1: Find the defined public base point G. Since we are adding G to itself, it is used as both P1 and P2 from the previous example on adding P1+P2. Instead of P1+P2, we now do G+G.
  • Step 2: Find the line that passes through P1 and P2, or G and G in this case. For a single point, you could draw any line that passes through it (i.e., infinite possible lines). Someone smart has then decided that the tangent line is to be used. It is the line that just touches the curve at point G.
  • Step 3: Find the point where the tangent line crosses the curve, this matches P3 from the previous example.
  • Step 4: Reflect P3 across the x-axis to get P4, similar to the previous example. This results in G+G, or 2*G. Continue adding G to this to get 3*G, or add 2*G to itself to get 4*G. Repeat until you get the point you desired. For example, in the Monero example above I used values such as 14*G, 85*G, and so on. Just iterate this process to get to the x in x*G.

Additionally, there are other related points, such as use of prime numbers and modulos, for cases such as where the results would not fit in the N bits used. This gets called EC over finite fields, and other such fancy terms. However, the level I presented in this article is enough for me to have a basic understanding to read the blockchain articles, understand the topics, and possibly experiment a bit with the topic.

While all this sounds a bit complicated, luckily other people have already implemented all this in different libraries. For prototyping the topics in this article, I used an Elliptic Curve library for Golang, where the operations and calculations are implemented and verified for me. But I do find it very useful to understand what and why am I doing in those operations.

As I noted before, you can find some example code where I use a library for these calculations on my Github.

Computational vs Information Theoretical

Earlier I discussed how a commitment should be both hiding and binding. Until the committing party reveals the committed value, it should remain hidden. And it should not be possible to change the committed value after committing to it. In other words, the commitment should be binding. I believe I already mentioned the terms information theoretically hiding, and computationally binding.

The Pedersen Commitment is considered to be information theoretically hiding, meaning that no matter how much computing power you have, it is not possible to 100% determine the original committed value, if it is not revealed. As far as I understand, this is because there are always multiple possible values that could fulfill the commitment.

Because of this same reason, the Pedersen Commitment is considered to be computationally binding. Because there are multiple possible values that could fulfill the EC equations, with sufficient computing power you could find a different (and thus a “false”) commitment that would satisfy the verifier. In practice, for the Pedersen Commitment, finding such a point is known as the EC discrete logarithm problem, which is considered too hard to solve without Quantum computing.

This appears to be a common tradeoff between the hiding and binding properties. Perfect (information theoretical) hiding results in computational binding, and perfect binding in computationally hiding.

Back to the Roots: What is Zero Knowledge Proof?

After the lengthy detours, back to the original question. What is Zero Knowledge Proof? Looking back at the Ali Baba example, and the use of Pedersen commitment in Monero, I will try to summarize my take.

  • We know a lot about the problem, and are happy share a lot of that knowledge.
  • We have a specific piece of information we want to keep secret.
  • We wish to share zero knowledge of that secret, but prove we know it, or that something related to it is true.
  • The proof comes from applying a specific process according to the domain in question. This results in (statistically) sufficiently strong proof that we hold the proof as verified.
  • There are likely some properties that need to be assumed true, such as having true sources of randomness, or lack of Quantum computing.

In the Ali Baba cave, the process is that of Alice always arriving via the correct path to Bob, repeated so many times that the combined probability of Alice always picking the right path by chance is incredibly small. Assumptions including no cheating (no path to bypass door in cave), and no broken randomness (you cannot predict which path Bob calls).

In the Monero proof of matching input and output amounts via Pedersen Commitments, the process is in the Homomorphic calculations holding true. While this case is only computationally binding, the computing power and the associated discrete logarithm problem is considered so hard that the chance of “faking” the proof is also considered incredibly small. Assumptions here include no solution to the discrete logarithm problem (i.e., no Quantum computing), and again true sources of randomness (cannot guess r).

Conclusions

There are plenty of examples on Ali Baba cave, and resources such as Zero to Monero that are filled with Greek symbols and formulas. I wrote this piece in order to find myself the middle ground on what ZKP means, and how the conceptual examples like Ali Baba translate to the cryptographic world, and what they mean at a higher level. I hope this will be useful to someone else. At least I feel I got a better understanding to my questions.

This article also allowed me to get familiar with the Pedersen Commitment, which I had often seen mentioned, but wondered about the real meaning. In the end, I find ZKP, as most topics, is not too complicated once you get past the terminology. Just a lot of new concepts for me, which takes time.

In the case of Monero and the way it verifies and anonymizes its transactions, there are also various other features I have not touched here. These include range proofs (a.k.a. bulletproofs in current implementation) to ensure the Pedersen Commitments are always using positive numbers, ring signatures to hide the sender addresses, stealth addresses for hiding recipients, key images to identify transactions, and (Diffie-Helman) key-exchanges to communicate the r and a from sender to receiver in the Pedersen Commitment. But these are out of the scope of this article.

Just to repeat the general disclaimer, I am not a cryptographer. So do your own research if you really need to understand the details. Hope this was useful for someone anyway. Cheers.

Just to remind once more, the codes I used to trial this article are available on my Monero Scraper Github project. At the time of writing, under src/golang/pedersen and src/python/experiments/ellipticcurves.