Category Archives: blockchain

Mapping Ring Signatures and Stealth Addresses in Monero

With an example from a recent XHV hack (of June 2021)


I originally published this article on Medium, where it is still available.

Since Monero has been forked and used as a basis for multiple other blockchains, the same concepts of Ring Signatures and Stealth Addresses also apply more broadly to those forked blockchains as well. One of these forked projects is the Haven Protocol (XHV) private stablecoin project. The example I will use is from the June 2021 XHV hack, highlighting an interplay of the Stealth Addresses and Ring Signatures, and how they were relevant in how the XHV developers tried to address the hack.

Since the exploits were not fully rolled back on the XHV blockchain, the data should still be there to check in practice for anyone interested. As long as the blockchain exists, anyway. In any case, lets start with a brief look at the relevant Monero privacy features to set the stage.

Monero Privacy Features

Monero uses multiple privacy enhancing mechanisms. These include Pedersen Commitments, Ring Signatures, Ring Confidential Transactions (RingCT), and Stealth Addresses. These can be divided into techniques for hiding the amount of Monero transferred in a transaction, hiding the receiver of a transaction, and hiding the sender. That’s a lot of hiding there, so lets see about each one.

Transaction Amounts

One core aspect of transaction anonymity is the transaction amount, i.e., how much funds is the transaction spending / sending. This is relevant to hide the amount itself, but also for the sake of correlating the inputs of one transaction with outputs of another transaction. For example, if you could look at a specific sum that is sent in one transaction, and find the same sum spent later in another transaction, you could deduce that the recipient of the first transaction is likely the spender of the second one, or that it matches the price of some other asset that you think was acquired.

Monero originally did not hide the amounts, and the transaction input and output sums were visible on the blockchain. You can still see this if you look at the earlier Monero transactions on the blockchain, for example this one from year 2014 in block 100000.

These days Monero uses a cryptographic scheme called Pedersen Commitment to hide the amounts. I described the Pedersen Commitment in detail in my article on Zero Knowledge Proofs. Related to this, later in this article I also discuss Ring Signatures, which also used to need to take the amounts into account. However, since the introduction of the RingCT protocol, the Ring Signatures now also hide the amounts along with the Pedersen Commitment.

So, in general, with regards to transaction amounts, the current Monero-based blockchains hide them well, and this is very transparent to the user. That is, the user does not need to really concern about it, as it happens always and automatically when using the protocol.

Transaction Receiver

The approach to hiding the recipient of a transaction in Monero is called Stealth Addresses. There appears to be some ambiquity with regards to what a Stealth Address means in the Monero community, but here I refer to its use as the transaction receiver addresses visible on the blockchain.

Stealth addresses are different from the wallet addresses. The Monero wallet address, as I refer to it here, is the one you get if you start the official Monero Command Line (CLI) wallet, and type the “address” command. Or the same in GUI or whatever wallet you use. This wallet address actually never appears on a Monero-based blockchain, only the Stealth Addresses do.

Given a target (receiver) wallet address, the sender wallet generates a one-time Stealth Address. This generated Stealth Address is used in the blockchain as the transaction receiver address. Elliptic Curve mathematics are used to create a Stealth Address from public keys encoded in the given receiver wallet address, along with using a large randomized factor. This makes each Stealth Address unique. Even if the same sender sends Monero to the same receiver, the Stealth Address is always different and unique for each transaction. This is the mechanism to hide the transaction recipient.

For the details of building Stealth Addresses, I recommend this excellent Stack Exchange post, and similarly excellent article by Luigi1111. When looking at the blockchain, without matching private keys, a Stealth Address cannot be matched to any wallet address. Only the receiver has access to the private keys matching the public keys used to generate the Stealth Address, and thus only their wallet can identify transactions intended for them.

Monkey See, Monkey Do (a Doodle)

To make this a bit more visual, I made the following most beautiful doodle about the mail system as a metaphora for how a Stealth Address works.

My most impressive doodle on using mail as a metaphora for a Stealth Address. Pics from Pixabay: monkey, envelope 1, envelope 2, envelope 3, heads 1, heads 2, heads 3, heads 4, trashcan, piggymailman.

Consider the process in the figure in 7 steps listed in it:

  1. George (the Monkey on left) wants to send 10 Monero to Bob (balding guy on right). So he starts to build a Monero transaction, and putting the 10 Monero in it.
  2. George has Bob’s wallet address. Maybe he found it from Bob’s Github project, asking for donations. George wants to send a few bucks. Using the public keys encoded in Bob’s public wallet address, George’s wallet generates a unique Stealth Address to send funds to Bob.
  3. George puts this newly generated Stealth Address on the 10 Monero transaction as the receiver address.
  4. George gives the transaction to the postman for delivery. The postman here represents the Monero peer-to-peer blockchain network. Consisting of nodes running the Monero Daemon software.
  5. The postman, or in this case the Monero network, delivers a copy of the transaction to every single user of the blockchain. Or their wallets.
  6. Everyone tries to read and open the transaction by attempting to fit their private keys to the Stealth Address. Or their wallet automatically does this.
  7. Everyone else discards the transaction, since their keys did not match, except Bob, whose private keys match the public keys George used from his wallet address. So Bob gets the 10 Monero and adds them to his wallet (piggybank) for later spending.

For the purposes of this article, the key takeaway is that for each transaction a unique receiver address is generated, and this is the one you see on the blockchain as the transaction receiver address. And this is (what I call) the Stealth Address. This is an important element for the next step of hiding the transaction sender using Ring Signatures, and how we can later use this to look for links between the Ring Signatures and Stealth Addresses. The XHV hack I discuss later will illustrate this link more concretely.

Transaction Sender

As I noted already, the approach to hiding the sender of a transaction in Monero is called Ring Signatures. A Ring Signature is so named due to the ring-like structure in how the algorithm is applied. The general idea of this algorithm is to create an anonymity set of possible signers, and hide the true sender/signer in that anonymity set. A signer in this case refers to a transaction output (Stealth Address) that could be spent in the transaction the Ring Signature is attached to. The following illustrates a Ring Signature (yes, its just a doodle based on my understanding):

My Ring Signature doodle. Green is the real tx, hiding in yellow decoys.

In the above figure, I colored T9 (as Transaction Output 9 in the ring) in green to illustrate it being the real input in this case. This is just for illustration, as the real input / signer is randomly placed on the ring. Because if it was always at the same position, it would be obvious which one it is. The Ring Signature algorithm processes these in a ring-like sequence, hence the name.

Monero currently enforces a ring size of 11, meaning the ring contains the real transaction input and 10 fake decoy transaction outputs (sometimes called mixins). Ring size used to be open to user change, but the forced count of 11 is currently used to avoid distinct ring sizes giving away additional metadata about specific users and thus weakening the overall anonymity set.

The signatures in a ring are generated using a set of Stealth Addresses from real transactions on the blockchain. Anyone can prove that one of the signatures is real, but not which one. This gives every ring participant plausible deniability. The real signer is of course the one who sends the real funds, or spends the output they received at that Stealth Address. But you cannot tell which of the 11 it is. Further, a key image is included in each transaction spend to make sure each output is only spent once.

For details on Ring Signatures implementation, I recommend the excellent Stack Exchange answer for Monero and RingCT specifics, or the Wikipedia article with example code for the original Ring Signature scheme.

For the purposes of this article, it is enough to understand that Stealth Addresses define the transaction recipient, and when a transaction output is spent, the Stealth Address of that spent output appears in the transactions Ring Signature. But the Stealth Address may also appear as fake decoys in other transactions.

Visualize it: Stealth Address and a Ring Signature from XHV Explorer

To illustrate, here is a screenshot of the Haven protocol (XHV) explorer showing one of the hacked coinbase outputs (normally amounts are hidden but coinbase / miner transactions show them):

Screenshot of a coinbase output on XHV chain. This link should take you to it, if the explorer exists as used to.

Note that the above screenshot shows how each transaction output has a visible Stealth Address for the receiver. And it is always unique even if going to the same receiver wallet. All the outputs in the above screenshot are actually paid to the same (hacker) wallet, but have different Stealth Address each. In the above screenshot I highlighted one row with the 101k xUSD output. Let’s map it to a Ring Signature.

Here is a screenshot of the explorer, showing how the above highlighted Stealth Address is used in a Ring Signature:

Screenshot of explorer with a ring signature using the previous screenshots Stealth Address. Link.

In this screenshot I marked it as the real output. Generally we could not tell if it was a real input or a decoy input. In this case, I believe it is the real spend, as I will explain soon, due to some telling ways the hackers spent it. However, in general this should illustrate how the Stealth Address appears in Ring Signatures as the real or decoy one. From the public data on the blockchain we cannot really tell which one it is.

Finally after all that lengthy explaining, lets look at the promised XHV hack, and how the Stealth Addresses and Ring Signatures play out in that.

Example by Looking at the XHV Hack

First, some brief background to understand the XHV exploit descriptions, and their relation to Stealth Addresses and Ring Signatures a bit better.

What is XHV

XHV, or Haven Protocol, is called an “algorithmic stablecoin”. The basic XHV token is, surprisingly, called XHV. It can be traded on exchanges, and sent between wallet, similar to XMR for Monero. To create the algorithmic stablecoin functionality, XHV implements also other token types on the blockchain, such as xUSD to track US dollar value, xBTC to track Bitcoin value, and xJPY to track the Japanese Yen value. Special transactions can be created on the XHV blockchain to change one of these token types to another, and these tokens can be sent between wallets just like the base XHV itself.

The tracked prices for USD, BTC, etc., are determined by Chainlink oracles, such as the one for XHV-USD. Without going into too much detail, these concepts are only relevant for this article at a higher level with the multiple asset types, as they come up discussing the following examples with the XHV exploits.

The Exploits

At the time of the example used here, there were multiple different exploits applied in the course of a few days on the XHV blockhain. I will use the first exploit on that list as an example. This exploit abused some missing checks on how the miner (coinbase) transaction is created. The output of the other exploits could be analyzed similarly if interested, as they are all listed on the XHV teams Medium post.

The first exploited miner transaction on that list is in the XHV block 882877. Specifically, the attackers used an exploit to take advantage of missing checks for validating mining rewards / fees in the miner reward (a.k.a. coinbase) transaction for the additional asset tokens. In the block 882877 mining transaction (also the screenshots above in theRing Signatures section), you can see 101k xUSD and 6.73 xBTC being created out of thin air. The miner transaction is exploited again in block 883040 for the same amounts of 6.73 xBTC and 101k xUSD.

So how does this relate to Stealth Addresses and Ring Signatures? After the XHV developers noticed / were notified of the abnormally large miner rewards, they attempted to stop the hackers from using those hacked 101k xUSD and 6.73 xBTC outputs. This is where the Stealth Addresses and Ring Signatures of those transaction outputs come to play.

Attempting to Block Spending of the Exploited Funds

As I described earlier, Monero does not include the actual receiver address on the blockchain. Instead, it creates the one-time Stealth Addresses for each transaction, based on the receivers cryptographic public keys encoded in their wallet address. And while you cannot map a single Stealth Address to any specific recipient/wallet, you can map it to Ring Signatures of specific transactions, where the output identified by the Stealth Address is possibly used as input / spent (or as a decoy). The screenshots earlier illustrated this.

In this case, the XHV developers would have taken the Stealth Addresses used as target addresses for payment of the exploited transactions, and scanned the blockchain to find any Ring Signatures using these Stealth Addresses as inputs, and thus any potential spends of those funds. With no matching Ring Signatures / transaction spends found, one could assume that the funds were not yet spent / moved from the initial target Stealth Address. Reading their report, this would have been the case they assumed.

To block the attacker from using their exploited funds, they provided the biggest XHV mining pools with a modified software that discarded any new transactions using those Stealth Addresses as part of their Ring Signatures. These biggest mining pools would give them control of well over 51% of the network hash power. This 51%+ hash power gives anyone power to discard any blocks mined by someone else, in this case those that would have contained those Stealth Addresses from the exploited outputs as inputs.

However, reading their report and the Medium post, the developers seem to have made some mistake in their scanning, and the idea that the funds were not spent was not the real case. We can verify this simply by using the Monero Daemon RPC API that is shared by the XHV Daemon. We can query a network node with this API for blocks and transactions, and look for any with a specific Stealth Address as an input in the Ring Signatures. Let’s see how it looks.

Mapping Stealth Addresses to Ring Signatures

To build this example, I queried the XHV network for transactions that have the Stealth Addresses of the exploited miner transactions in their Ring Signatures. Once I found any, I repeated this scan one more time, to find transactions using those potential spends as inputs in their ring signatures. The following figure shows the results of this process, and how the exploited miner transactions spread into the blockchain in the blocks coming after it:

Hacker TX fan-out into the blockchain and its ring signatures.

The above figure illustrates how the two exploited transaction outputs fan out from the original exploited miner transaction from block 882887. The two exploited miner transactions in the above are the ones on left side of the figure, labeled 882877 xUSD 100k and 882877 6.73 xBTC. If you look at the miner transaction on the block explorer, the xUSD transaction is paid to Stealth Address 3b8e80a279000d3d32826010b665e2c78aaf25d1744c49cac8a5da154d293875, and the xBTC to Stealth Address 7dec1f53dd30416458926bd35e9c1fff98d0ac86fc185a8fedfad546ff7d4546.

Scanning for these Stealth Addresses as part of a Ring Signature, the xUSD output is used as input in Ring Signatures of three transactions, in blocks 882939, 883064, 883410. The xBTC output is similarly used as input in two Ring Signatures, in blocks 882939, and 883041.

I highlighted 882939 in the above figure in green, since it seems quite clear to me that this is the actual transaction to spend the exploited funds from block 882877. I cannot prove it 100% because the Ring Signature obfuscates the sender in 10 other decoys for both the xUSD and xBTC transaction. However, this is a good illustration of where these privacy mechanisms don’t necessarily alone hide you, if your actions betray you through other heuristics.

I say this because block 882877 contains transactions using both the xUSD and xBTC exploited outputs as ring members. Both outputs also appear in any Ring Signature for the first time in this same block. With the way Monero chooses its decoys, having two transactions in the same block using decoys from the exact same previous transaction is extremely unlikely. Even more, Monero has an unlock time of 60 blocks for miner (coinbase) transactions such as the exploit in block 882877, and the difference between 882939 and 882877 is 62 blocks. So 882939 is one of the first blocks where this hacked output could ever be spent. What a coincidence to have all this happen together. This illustrates how the technology can work perfectly fine, but your actions can still betray you. Probably makes no difference to the hacker, but a good example.

Why Only 2–3 Rings Use the Exploited Outputs

The above figure shows how there are only 3 and 2 blocks where the xUSD and xBTC outputs appear as ring inputs but for the other transactions in the figure there are many more. This is the effect of the XHV developers efforts to stop their use, which was just a bit late as they already were used as real inputs/decoys in those 3 and 2 blocks. But if not, the stopping likely would have worked fine as the cutoff shows.

Fanning Out Into the Blockchain

In the figure above, I expanded the two transaction outputs from the xUSD payment in the block 882939 as examples of what a more realistic fan-out from an output would look like into multiple Ring Signatures in the following blocks. The first (marked output1 in the figure) is paid to stealth address 71d224f0043fdf8c5b6e8bc292f1c1e07bf820d707c2d9b04b6335d49bf37e4b. The second fanned-out output (output2) is paid to stealth address afa169a74b1581cafedf215219ca7b7708802ec13945540d3fb9ba30e188d842.

As the figure shows, the output 1 of block 882939 can be found as (input) member of Ring Signatures in transactions in blocks 883000, 883105, 883155, 883247, 883302, 883987, 883993, 884002, and 884003. Similarly, output 2 can be found in 882970, 883903, 883925, 883986, 883991, 883996, 884440 and 885705. If you want to check for yourself, just open these links and search for the stealth address I copy-pasted above.

Relevance to User Privacy

Beyond all this technological aspect, reading about hacks and all can be interesting. Or maybe everyone already fell asleep. But is any of this relevant to a normal user?

If you only care about having some basic privacy, or if you don’t care about privacy and anonymity at all, you are likely fine just using the Monero based protocols as is. If you do care more deeply about it, there are a few points we can gather from this that are good to keep in mind:

  • When you receive any transaction, consider re-sending it a short time after to yourself, or in general churning it a few times. This will obfuscate what stealth address the funds are in, beyond just inclusion in other Ring Signatures. But beware of obvious patterns such as the one above for the miner exploit.
  • Consider other metadata that could be used to identify and link your transaction. In the above XHV hack example, the metadata was the use of multiple outputs from the same miner transaction at the same time, right after the original transaction unlocked.

However, to make this more complicated, churning and other techniques are not so simple to do perfectly either. Some related discussion in this Stack Exchange post. To make it perfect, your churning should look like real decoy patterns look. But it can get a little complicated to do such perfect optimizations, so mostly some basic consideration for above points is likely good enough. It is especially tricky as I am now aware of any specific support for having granular control over which transaction outputs are spent, or even visibility to the specific inputs and outputs in a Monero wallet.

In general, timing related analysis seems to have been one of the most common ways to try and analyze Monero transactions. To look more deeply into the decoy selection algorithm, I wrote a small study on that here but it got a bit lenghty. It seems to happen to me a lot. So to avoid even more overly long writing, and too much digression, I will save my details on decoy selection and timing analysis for another day.

Other Metadata

Besides time-based analysis, various other issues related to transaction analysis, Ring Signatures, decoys, and related topics have been identified and addressed over Monero lifetime. A very nice and extensive discussion on this topic is available in the Breaking Monero YouTube series by the Monero team. I won’t go into any of those details here, but if interested in the topic, it is definitely a great series to watch.

Future Views on Ring Signatures

While the current Monero ring size is 11, recent Monero Labs Research topics include Triptych and related ideas that aim to enable much larger ring sizes in practice. This should make any analysis of the transaction sender, and the relations between them even more difficult. It should also provide broader decoy patterns, making it harder to fail to match them for your own transactions.

Obligatory Promotion of Software Testing

Having worked over many years in software testing (mainly the engineering part of it), I will take the chance for a few words about these cryptocurrency hacks and what I think about the topic, even if it is not really about Ring Signatures and Stealth Addresses. Sorry about that.

Regarding the XHV hack, the reports they released seem to point to quite lacking software development practices (search for Lessons Learned). There appears to have been little to no tests, lack of code reviews and audits, and getting any (helpful) community involvement in development seems to have had little to no support. Pretty dire reading for an open-source financial project running openly decentralized over the internet, and at times valued in hundreds of millions of dollars or more.

Reading those lessons learned, at least the XHV team seem to recognize this now and work to improve. I guess it remains to be seen how well one can change fundamental processes and peoples working practices in one go.

Of course, the XHV team are also not alone in this as, for example, about one month later (July 2021), another cryptocurrency project called Thorchain had similarly multiple hacks amounting to millions of dollars in damages. While not as bad, some of the reports I read pointed at similar limits in development processes, testing, etc. In general such hacks seem to occur almost monthly in cryptocurrency projects, especially for the newer projects with a larger new codebase (and thus untested attack surface). Maybe it goes with the territory, but maybe it wouldn’t hurt to put some thought into testing and security.

Of course, everyone makes mistakes, and I guess it is most important to learn from those mistakes. But that should not excuse ignoring even basic levels of quality assurance.


I hope this article provides some added understanding on what are Ring Signatures and Stealth Addresses, and how they work. I find it quite impressive how Monero applies much of the latest cryptographic research into practice, combines different cryptographic techniques to create a system of private and decentralized transactions, and has not had any big hacks like the one I partially described here.

The Monero technology can be quite complex, but in the end I find a good explanation helps make even complex topics understandable. Much like Bitcoin is similarly clever, but not too hard to understand once you find the right resources to explain it. Luigi1111’s Stealth Address article is great, too bad he never seems to have gotten around to his Ring Signatures article hinted there.

For anyone interested on how using the Monero API to analyze the blockchain might work, at the time I wrote my earlier article on Merkle Trees, I built a small tool to crawl the Monero blockchain using the Monero Daemon / node API. This let me locally analyze the data, which I also applied for this article. Maybe one of these days I will also finish the additions I made for this article and update the Github, even though it should already work as an example of the general process.

Personally, I feel like I got maybe halfway there in trying to get a good understanding of how to implement the Stealth Addresses and Ring Signatures in practice. Maybe for understanding their relations on the blockchain and tobasic Monero tracing, that could be enough already.

That’s all for today. Cheers.

Merkle Trees: Concepts and Use Cases

The data structure within Bitcoin, Amazon Dynamo DB, ZFS, …

This article explores what are Merkle trees, and how they are used in practice in different systems including Bitcoin, Amazon’s Dynamo DB, and the ZFS filesystem. The basic concept is quite simple, but some of the clever applications are not so obvious.

First, lets start with the concept of Merkle trees. As I said, it is not too complicated in its basic form.

What is a Merkle Tree

A Merkle tree is fundamentally just a hierarchical set of hash values, building from a set of actual data (Merkle leaf) to intermediate hashes (Merkle braches) and up to the Merkle root that summarizes all the data in one hash value.

Example: A Basic Merkle Tree

The following figure illustrates a very small Merkle tree:

A very small Merkle tree. Image by author.

In this figure, the bottom nodes (Data1Data4) are the actual data processed by the application. Each of these is summarized by their respective hash value (Hash1Hash4), as a Merkle leaf. From these, the Merkle tree builds a hierarchy, combining hashes together until only one is left. The nodes combining other hash nodes are called Merkle branches (here Hash12 and Hash34). When there is only one left (here Hash1234), this is called the Merkle root. There can be multiple levels of branches and hashing as following examples will demonstrate.

Handling An Unbalanced Merkle Tree

The above example illustrates the very basic case of a Merkle tree. It is a convenient example, as at every level there are just the right number of nodes to form exact pairs. What happens if you have an uneven (odd) number of leaf (data) nodes? For example, what happens to the above example if you have 5 Data nodes? You can hash Data1+Data2 together to form a Merkle branch, and same for Data3+Data4. But Data 5 is left without a pair to hash into a new branch.

Different approaches can be taken to address this. For example, in this situation Bitcoin simply copies the un-pairable (odd) hash, and uses the duplicate as a pair (the odd hash is paired with itself). The following figure illustrates this:

Handling odd (uneven) numbers of hashes Bitcoin style. Image by author.

In this example, Hash 5 has no immediate pair, and is duplicated to pair it with itself. Same happens for Hash55. The above is just one option, there are different ways to handle this pairing issue. Of the ones I have seen, I like the Monero (or Cryptonote?) one most. I will present it shortly. First, something slightly different.

A Very Simple Merkle Tree in Python

Theories and explanations are great. But for the technical person, some code helps formalize it. The following code from my Github repo illustrates a naive but simple example of implementing a Merkle tree in Python (or just skip it for more conceptual explanation after): very simple and naive Merkle tree implementation in Python. Code by author.

The last four lines in the above code run the example to create a Merkle tree with data nodes Data1Data5. The calculation of the tree with this algorithm looks like this (cropping the hashes to the first 5 characters):

Merkle tree calculation example with actual values. Image and animation by author.

I used the keccak hash function in this example, as this is what the Monero cryptocurrency/blockchain uses, and I was looking into Monero code lately. The 5th (odd) data element here (using my Python code above) is handled slightly different from the Bitcoin example, as this simply re-uses the hash if one is left over (un-pairable, cf54b above). Practically this re-use should have the same effect as the duplication in the Bitcoin algorithm.

Optimizing Merkle Calculation: Monero Style

As I was looking into the Monero blockchain to understand it better, I found it has a different but clever approach to hashing the Merkle tree. The code for it is available in the tree-hash.c file in the Monero Github repository. And a Python version I made to implement the same in my Github.

The Monero approach could be described as converting the hash tree to a perfect binary tree. It hashes enough leaf nodes in the first iteration, so that the following iterations will always have some variant of 2ˣ (a power of 2) nodes. The following figure illustrates this:

Monero approach to Merkle tree building, with 5 nodes (transactions in Monero). Image by author.

This works as follows: First, calculate x so that is greater than number of transactions (data nodes). In this example, it is 2³=8, because 8>5 (there are 5 transactions/Data nodes). The value of x before that, 2²=4, would not fit (4 > 5 is not true). From this, the number of transactions is subtracted. In this case 8–5=3. This 3 is the index of transaction to start iteration 1 from. With 0-based indexing, the start index is Data4 in this example.

Above explanation is maybe a bit abstract. Below is a table with concrete examples of transaction counts and how these are converted into a “perfect binary tree” shape (size is always 2ˣ, eliminating any “odd” counts or leftover un-pairable hashes from following iterations) for iteration 2 and following iterations:

Example calculations of Merkle size with the Monero (Cryptonote) algorithm. Image/Table by author.

The columns:

  • Transactions: Number of transactions in the block being calculated
  • Closest 2^x: First 2ˣ that is bigger than transactions
  • Iteration 1 start index: start hashing pairs from this index in transaction list
  • Iteration 1 values: Number of transactions in transaction list starting from iteration 1 start index to the end of transaction list
  • Iteration 1 pairs: iteration 1 values divided by 2 (because hashed in pairs)
  • Iteration 2 formula: How the above values lead to number of items to hash in the following iteration
  • Iteration 2 size: The number of transactions to hash in iteration 2. As you can see, it is always 2ˣ, and leads into a “perfect binary tree”.

Too many numbers and formulas I guess. Did you fall asleep yet? Anyway, if you pick the row from above table with 23 transactions, you can follow this nice 8-bit animation I made on how the Merkle root calculation progresses for that row:

Calculating Merkle root in Monero for block 1407480. Image/animation by author.

In this case (animation above), the start index is calculated as 2⁶=32–23=9. This is transaction number 10 (zero index adds 1). The following iteration then has 16 nodes, which is 2⁵, next one 8 or 2⁴, and so on. The animation above actually represents how the hash array is manipulated in memory by the Monero code.

Example Use Cases

Concepts are nice, while concrete, real-world, use cases make the point. Here I will discuss the use of Merkle trees in blockchains/cryptocurrencies, and Amazon’s AWS DynamoDB distributed database. I will also briefly touch on the ZFS filesystem, and the Git version control system, as these are sometimes also mentioned as examples of Merkle tree use.

Cryptocurrencies and blockchains: Bitcoin, Monero, et al.

As already briefly discussed above, Bitcoin and similar cryptocurrencies make use of Merkle trees to summarize and validate the transactions in a block, and embed the Merkle root into their block header as a summary of it all. The following figure illustrates this:

Blockchain and Merkle trees. Image by author.

Each block has an ID value, which is the hash of its header fields. A part of this is the Merkle root. Another part is the previous block ID (Parent in above figure). By linking with the previous block ID, and including that as part of the next block ID, the blocks form a blockchain. By embedding the Merkle root in it, they make an immutable record of transactions in the block.

For example, the Bitcoin block header contains:

  • Difficulty target value (called bits in Bitcoin)
  • Merkle root of all transactions in the block
  • Nonce; a value changed in mining to find accepted blocks
  • Previous block hash (block ID); linking this block to the previous block in chain (this previous hash is named parent in figure above)
  • Timestamp; Time of mining (creating the hash for) the block
  • Block version; identifies support features and formats (also used to identify hash function in some blockchains such as Monero)

These header fields are all hashed together to form the block ID. Making the block ID a hash of all header data makes it practically impossible to modify a block header. Including the Merkle root in the block ID further makes it practically impossible to modify the transactions in a block.

But let’s look in a bit more detail into how these types of blockchains actually use the Merkle tree in different ways. It’s all about data integrity, but in different and sometimes ingenious ways.

Blockchain Use Case 1: Transaction Data (Block) Immutability

As I noted above, the base use case of the Merkle tree in Bitcoin style blockchains is to build the Merkle root into the block header, and using it to verify that no transaction has been changed.

As an example, let’s change Data4 to Data6 in the earlier example:

Changing one node value invalidates (changes the root of) the Merkle tree. Image by author.

Comparing this to the earlier example, with Data4, the Merkle root was aa8d3. With Data6, it is now f8932. This way, any change in any transaction data changes the Merkle root, and it no longer matches the one stored in the blockchain for that block (in block header). And through the block header, this issue would propagate to the whole blockchain following this block.

However, if you think about it for a while, you may ask a question:

Do you really need Merkle trees to verify the block data?

No. Instead, one could just concatenate all the transaction data together and build a single root hash. Consider the above example with Data1Data5, but just hashing them all together:

Concatenate+hash all data at once. Image by author.

Now, changing just one data value will have the same effect of invalidating the summary hash:

Hash change detection in concat+hash all at once. Image by author.

This single hash could also be used in place of the Merkle root in the block header and block hash. Same effect so far.

So what is the real benefit of using a Merkle tree here? To summarize, I would say it gives you more granularity in the hash verification, and enables other clever tricks to process the blockchain more efficiently. While also providing the transaction integrity assurance / verification.

The original Bitcoin whitepaper provides two additional examples for use of the Merkle tree: blockchain pruning and simplified payment verification (SPV). Beyond these, the one I really like is the Stratum mining pool protocol. Let’s look at these next.

Blockchain pruning

Over time, a blockchain will grow larger and larger as it accumulates more and more blocks. For example, today (Feb/2021) the size of the Bitcoin blockchain is about 380GB. Blockchain pruning is an approach to reduce this used space with the help of Merkle trees. By removing used transactions from local storage where no longer needed.

We want to have all data around for full scale verification and history, but not all nodes in the peer to peer network need all the data. The blockchain pruning approach proposed in the original Bitcoin whitepaper suggests addressing this by using the Merkle tree to prune (remove) spent (used) transactions from the blocks.

The following figure illustrates this:

Three out of five transactions used (spent) in the block (and Merkle tree). Image by author.

In this example, we have used (spent) transactions TX1, TX2, and TX4. We still have unused transactions TX3 and TX5 left in this block. Assume we are not interested in a full node with full archival, just keeping a list of spendable transactions. Satoshi’s proposal was to prune the block from the used transaction data, and simply leave the Merkle tree branches needed to verify the unspent transaction data:

The Merkle tree pruned of the used transactions. Image by author.

This is perhaps more clear if we also spend TX3:

One more transaction used, leaving just one unused in the Merkle tree (and the block). Image by author.

The block could now be pruned to only contain the TX5 data, and the hash of the other Merkle tree branch (96b8d) in the block:

Pruned Merkle tree with only one transaction left. Image by author.

We would just have saved the space require to store 4 our of 5 transactions, and 6 out of 9 Merkle tree nodes. About 80% space savings on that example. The longer the blockchain gets, and the more it is used, the more spent transactions it would have. And thus more space could be saved. It’s a very clever idea, like most ideas in the original Bitcoin whitepaper.

But much as with block verification, we can always ask the question:

Do you really need Merkle trees to prune the blockchain?

The Bitcoin StackExchange is full of insightful discussion on the actual ways pruning is applied in the actual implementations. While the Merkle tree pruning is a clever approach, it is not applied in this way in (at least) the Bitcoin core software.

Instead, the unspent transactions are stored in a separate database for fast lookup. This database is initialized on first startup by scanning the blockchain and updating the database as new and spent transactions are found. The database is continously updated as new blocks are broadcasted for the Bitcoin network. A pruned node could then just rely on this database.

Running a pruned node in general is sufficient for the basic operations, but does not fully support all features of a blockchain, so some nodes will still need to hold the full data. But that goes beyond the scope of this article.

I think the basic takeaway here is that Merkle trees are cool but sometimes the basic and even simpler approach is fine, and even better. Of course, the hard part is identifying when this is true and when the cooler (Merkle) approach is really best. And don’t take this in any way as a suggestion to not use Merkle trees. Just to think about the overall picture. Also with Bitcoin, and related blockchains, I believe the Merkle tree enables many other things and thus makes a lot of sense. As following sections will show.

Simplified Payment Verification (SPV)

The simplified payment verification (SPV) algorithm was also suggested in the original Bitcoin whitepaper. In SPV, a light-weight blockchain client only stores the block headers, but also wishes to verify payments it receives as valid transactions in the blockchain. Lacking full transaction details, the SPV client uses Merkle trees to effectively verify the transaction details in collaboration with full nodes.

I will re-use the example from blockchain pruning above. The SPV client wants to verify TX5 in the given block. The following figure illustrates this:

SPV client node verifying a single transaction is part of a given Merkle tree. Image by author.

Here, the SPV client node asks a full node for the Merkle branches required to build the Merkle root with the TX5 data. By rebuilding the Merkle root (aa8d3) from the transaction of interest (TX5), and the Merkle branches (96b8d) provided by the full node, the SPV client can have confidence in having received a valid transaction. By checking this re-built Merkle root against the one stored in the block header, it can verify that the whole tree (and thus TX5) is in fact valid, and is a part of the blockchain.

I find SPV to be an interesting example of how Merkle trees can be used, together with (block) data filtering (Bitcoin uses Bloom filters but that is again another topic), to synchronize and verify existence and correctness of selected data in a distributed system.

Pool Mining: The Stratum Protocol

The traditional way to create cryptocurrencies has been mining via proof of work (PoW) hashing. Mining pools are a way for smaller miners to join together and get some rewards for mining, based on how much compute (hash) power they contribute to the pool. This leads to the requirement for a central entity, the mining pool, to be able to coordinate the miners. It needs a way to effectively distribute and track the overall mining process across all clients. With some clever tricks, the Stratum protocol uses Merkle trees to enable this.

Especially Stratum version 1 makes use of Merkle trees to distribute the work efficiently. The pool server provides the miner nodes with the block header elements needed to build the block. This includes the partial Merkle tree, with the branches calculated for all other transactions except the Coinbase transaction. The coinbase transaction is a special transaction that pays out the mining reward. This is illustrated by the following figure:

Merkle templating with the Stratum mining pool protocol (v1). Image by author.

This figure contains three elements: the Merkle tree template, the Coinbase template, and the search for the nonce(s). The pool provides the Merkle tree template to the miner, containing the pre-calculated Merkle branches for the transactions in the block (here 96b8d, thus highly reducing bandwidth and calculation needs for miners). The miner is then expected to fill the Merkle tree template with a suitable coinbase transaction, built from the server provided coinbase template. The winner is one who fills the template with values providing a hash that matches the blockchain network difficulty level. Meaning just a hash with suitable small value.

The coinbase template is provided by the pool, and is filled by the miner using their choice of nonce and extranonce fields. When combined with the Merkle tree template, each of these provides a different block hash, and the only way to find a winning block is to try values until one produces a hash matching the network difficult target. If the miner finds a nonce that matches the network hash difficulty for this Merkle template, it submits it to the mining pool. The coinbase template contains the mining pool address, ensuring that the pool receives the block reward, and can distribute it to all the miners.

In a broader context, the Merkle tree here is used to distribute a partial solution (the pre-calculated Merkle tree branches and coinbase template), while allowing different distributed nodes to work independently to try to find the solution to the missing part (nonces for the coinbase transaction to build an acceptable PoW hash). By embedding the pool address as part of the template, it ensures that all distributed nodes contribute to the common goal and can share the rewards.

AWS Dynamo DB

Dynamo DB is a distributed database provided as part of the Amazon Web Services (AWS) platform. It was originally designed to handle much of Amazon’s retail global infrastructure needs. As such, a lot of consideration has been put into making it scalable and optimizing the efficiency in all parts of its architecture. The architecture of Dynamo DB is described in the AWS Dynamo DB paper from 2007, including its use of Merkle trees to efficiently synchronize diverged nodes (section 4.7 in the paper).

Dynamo DB is what is called a key-value store. The actual data (values) are stored in data nodes, and identified and indexed by their keys. Dynamo DB hosts this data in what it calls virtual nodes. Each virtual node hosts a key-range. To handle high-availability and scalability requirements, Dynamo DB distributes the data across these key-ranges, and hosts each key-range on multiple virtual nodes (across multiple physical nodes, or partitions).

The following figure tries to illustrate this, with 3 (virtual) Dynamo DB nodes, each holding 2 key ranges, each key range duplicated across two nodes. I am sure this is not exact to all the details, and the system had evolved since the paper was published, but I believe it depics the general concept well enough:

DynamoDB virtual nodes and distributed key ranges. Image by author.

In this type of a distributed system, there will eventually always come a time when some of these virtual nodes are out of sync with other virtual nodes holding the same key-range. Dynamo DB uses Merkle trees to perform effective comparison and synchronization of the key-ranges in these nodes. In the above figure I made a small 3-node Merkle tree inside each key-range to illustrate this The KR3 key-range tree is shown partially in red to illustrate how the nodes would have diverged and need resolving to find correct values.

A Merkle tree is built for each key-range, where the leaf nodes of the tree are the key-range data values. The Merkle root summarizes the data in each node. By comparing the Merkle roots of each virtual node hosting the same key-range, divergence in nodes is immediately visible. This full comparison requires only the communication and comparison of a single value, the Merkle root. If there is a difference, one can keep comparing the branches of the tree to efficiently find the exact divergent spots, and the data values to synchronize.

The following figure illustrates this process:

DynamoDB using Merkle trees to find data nodes out of sync. Image by author.

As you can see, I just copied the blockchain example into the above figure. Because the fundamel structure, and the concepts it relies on, are exactly the same! The difference is only in the (clever) application of the same fundamentals in a different context.

Yes, some details would likely differ. For example , DynamoDB likely uses a different hash function than Keccak. But, as illustrated here, the fundamental approach and concepts are the same, and as such applicable to more than just the common blockchain example of Merkle trees.

ZFS File System

ZFS is a filesystem supporting data spread over multiple volumes, a form of distributed filesystem. Multiple ZFS volumes can be grouped into storage pools, which can also host redundant copies of the data. ZFS is advertised as using Merkle trees to ensure data integrity.

ZFS uses Merkle trees to checksum data, to identify issues where some part of the data written to, or read from, disk is corrupted (or misread etc.). As in most filesystems, the data is stored as blocks of specific sizes. These blocks form a natural leaf node for building a Merkle tree.

The main rationale I found online for ZFS to use Merkle trees always discusses how ZFS stores the Merkle tree as a checksum outside the block itself, providing higher resiliency for identifying errors. This leaves it quite unclear to me why you need a Merkle tree for this, and not just store a general checksum outside the block. Sometimes I feel people should ask more questions.

On the other hand, I could see the benefit of verifying data across multiple storage pools and their replicas by using hierarchical Merkle trees. This case would make it very similar to how DynamoDB operates and uses Merkle trees to sync nodes (vs ZFS volumes). However, I could not find any description for using Merkle trees for this. If someone knows better, I would be happy to hear.

In any case, this seems like a very similar case to Dynamo DB, as being used to verify integrity of a distributed data storage system more efficiently. With the added benefit of separating the consistency properties (checksum hashes) from the data itself, for increased resiliency.

Git Version Control System

Another example that I found referenced as an example of Merkle trees is the Git version control system. Git does use a lot of hashing, and its basic functionality is heavily based on hashing file contents and various metadata. Git forms a directed acyclic graph (DAG) of commits identified by these hashes. Which seems to be considered by some to be a Merkle tree. I disagree, so I will use this as an example of some nuances in the concept.

A DAG is a graph where connections go one way (directed), and there are no cycles (acyclic). Unfortunately, I do not understand how the Git DAG would be a Merkle tree. The following figure represents a Git DAG with a few of the most common Git graph actions – branches and merges:

A made up DAG representing Git commits. Arrows point to parent commits. Image by author.

Here you have nodes that are identified by their hashes, and a tree-type structure where each node points to a parent, and has one or more children. However, there are multiple issues I see in calling this a Merkle tree:

  • Nodes in a Merkle tree do not have multiple parents. The DAG does. The figure above has two such nodes: 8e290 and 184da.
  • Merkle tree only hashes actual data in its leaf nodes, the Merkle branches after the leaf nodes are hashes of hashes. The Git DAG does not have branches based on hashes of hashes, just nodes with hashes of raw data.
  • The Merkle tree starts with the leaf nodes, and proceeds up the tree, constantly halving the number of branches in each iteration. The Git DAG contracts and expands all the time with branching and merging. Because it is a graph and not a tree.

Each node in the Git DAG hashes its parent hash ID as part of its own hash identifier, and it is ordered in time (directed). As such it actually very closely resembles a blockchain. Unlike cryptocurrencies, it allows forking (branching), and integrates those as a central part of its structure. However, I see no problem to relax this as a requirement for a general blockchain definition. And I do not see a blockchain requiring a Merkle tree, just as I do not see Git DAG as one. It is just useful for cryptocurrencies to have one, but you could have a different form of one, with other mechanisms, such as Git.

In summary, not everything needs to be a Merkle tree, even if it is cool. Although you should consider Merkle trees where it makes sense, because they are cool. However, I do not see the Merkle tree in the Git DAG (or elsewhere in Git). If you know better, happy to hear and learn :).


I find Merkle trees are a simple but clever data structure. And I like my algorithms and systems best when they are simple but clever. The examples I covered in this article were

  • blockchain integrity verification,
  • blockchain pruning
  • simplified payment verification,
  • mining pool protocol for work distribution,
  • DynamoDB data synchronization
  • ZFS checksumming (and possibly node synchronization)

What is common in all of these? All of them use the Merkle tree as a tool to take a potentially large set of data elements, and hierarchically verify their integrity in different steps. This allows for efficient data communication, and efficiently processing the verification of the overall data, going into more details only as needed. This, I belive sums up what I see as the best use case for Merkle trees: Enabling efficient verification and synchronization of data over distributed nodes.

Of course, then there is the stratum mining protocol, that takes it a bit further and uses Merkle trees as a means to distribute work, while controlling specific properties of the output, in a decentralized and untrusted setup. Cool.

Well, in most day-to-day software engineering tasks there is not often much chance to use, and build on, these features. But it certainly is interesting to learn them, and thus better understand what some of the latest technologies built on them are based on. And to make those informated decisions when the time is right. Right?

Zero Knowledge Proofs: Example with Pedersen Commitments in Monero

An Explanation Between Cartoons and Greek Symbols

If you spend a bit more time reading about cryptocurrencies, blockchains, or many other related technologies, you likely run into the term zero knowledge proofs (ZKP). To me, the term sounds like a paradox. How can you prove anything with zero knowledge? How do you event know what to prove, if you have zero knowledge?

So I tried to build myself some understanding on this. In this article, I try to share this understanding of what zero knowledge means in ZKP, and what is the proof really about. And how do the two relate to each other.

I originally posted this articke on Medium, where it can still be found.

I start with the simple example of the ZKP and Ali Baba cave, which seems to be often used as a simplified ZKP example. I then proceed with an example of what ZKP might look like as a cryptographic approach, and how Monero uses Pedersen Commitments to prove input and output amounts match without revealing the actual amounts. I look at it as a zero knowledge proof of the matching sums of input and output amounts, without revealing the amounts.

My goal in writing this article was to build a more intuitive understanding that fits between the Ali Baba cave, and the cryptographic theories. So no formulas with Greek symbols, but not just a Alice and Bob in a cave either.

The Basic Example: Ali Baba Cave

The usual example given for Zero Knowledge Proofs is the Ali Baba cave. This is a circular cave containing a door that unlocks only with a specific passphrase. Alice claims to know the passphrase, and wants to prove this to Bob, without revealing anything about the passphrase. The following image shows the layout of the imaginary cave:

Layout of the Ali Baba cave in this story. Bob image by David Rock Design on Pixabay, Alice image by Graphic Mama Team on Pixabay. Rest of the image by author.

The overall story is that Alice enters the cave, and walks up to Point 3 or Point 4. From there, she can return to the entrance via two paths, Path A and Path B. Once she is at one of the points, Bob comes to the entrance (Point 2) and calls for her to arrive there via one of the paths, which he chooses on random. Depending on which side of the door Alice is, she always has a 50% chance of being able to return to Bob without passing through the door. Otherwise, she needs to know the passphrase. Bob does not know if she used the door or not, he only sees her arrive through the correct path he called out.

The points of interest, as I marked then in the figure, are:

  • Point 1: Alice and Bob meet outside the cave. Alice enters the cave and walks to Point 3 or Point 4.
  • Point 2: A fork of two paths leading deeper into the cave. Once Alice reaches Point 3 or 4, Bob comes here, picks Path A or Path B,and calls for Alice to arrive via the chosen path.
  • Point 3: If Alice waits here, and Bob calls for her to come out through Path A, she can walk there without knowing the passphrase to the door. If Bob calls for Path B, she has to open the door to arrive through Path B.
  • Point 4: Same as Point 3, but here she has to know the passphrase if Bob calls for her to arrive via Path A.

The following animation illustrates the different scenarios that can play out:

A silly animation on Bob and Alice in Ali Baba cave. Bob image by David Rock Design on Pixabay, Alice image by Graphic Mama Team on Pixabay. Rest of the images and animation by author.

I initially found the Ali Baba cave to be a quite confusing example of Zero Knowledge Proof. Because we obviously know a lot, including:

  • There is a door in the cave
  • The door is locked, and there is a magic passphrase to open it
  • Alice is in the cave, (at the door)
  • Bob is outside
  • The cave is circular, and there are two paths in/out of it, with no shortcuts between them
  • Alice claims to know the passphrase
  • After the “proof” is presented, we somehow trust Alice knows the passphrase

So we know a lot about this problem, and the term zero knowledge seemed confusing to me. I eventually figured the following about this problem:

  • The protected secret is the passphrase
  • The Zero Knowledge here refers to not knowing anything that helps reveal the passphrase (or more generally, the protected secret)
  • The proof is in the action of Alice arriving through the correct path every time. Repeated so many times that, statistically, starting on the called side (thus no passphrase required) every time is incredibly unlikely
  • In the end Bob gains trust that Alice knows the secret, but gains zero knowledge of the secret itself (the passphrase). He may gain other knowledge, such as Alice arriving through correct path.

If Alice does not know the passphrase for the door, she has a 50% chance to pick the right path on the first run, 25% to get it right twice in a row, and so on, until the joint probability of always getting it right gets statistically incredibly small.

There are various other properties that are often mentioned as required for a good ZKP. Stack Exchange provides nice and concise definitions for two main ones, soundness and completeness:

  • Soundness: the proof system is truthful, so whatever it proves, it is true. If the proof shows Alice knows the passphrase, she should really know it.
  • Completeness: the proof system is comprehensive, so it can prove all true statements. If Alice knows the passphrase, the system should always show it as true. You cannot know, and fail to prove it.

I recommend the Stack Exchange post for more details, if interested.

Cryptographic Example: Matching Sums in Monero

The Ali Baba cave is an interesting ZKP example, but for me it was a bit hard to figure out how this relates to ZKP in blockchains, cryptocurrencies, and similar constructs. For this, I had a look at how Monero uses the Pedersen commitment to hide the amounts in transactions, while at the same time verifying that the output sums match input sums.

Before going into Monero and its use of the Pedersen Commitment, I will present some base concepts: Elliptic Curves (EC), and their use for basic commitments. Followed finally by Pedersen Commitments on Monero.

Elliptic Curves

Elliptic curves (EC) are a data structure that is commonly used in public-key cryptography. Their mathematical properties make them suitable for creating secure keys and for other cryptographic operations. They are also used in building the Pedersen Commitment in Monero, so we need some basic understanding about them for this article.

EC use in cryptography is referred to as Elliptic Curve Cryptography (ECC). In relation to cryptocurrencies, two commonly used (and standardized) curves are curve 25519 and Secp256k1. Secp256k1 is used by Bitcoin, and 25519 by Monero and many other cryptocurrencies (and other applications). The curves are commonly visualized like this:

Elliptic curves Secp256k1 and 25519 visualized. Image by author.

Generally, the actual elliptic curve is described as a more abstract structure, but this visualization serves as a way to get a bit more concrete idea of what is being discussed. The elliptic curve’s curve points are used as a basis for many operations in ECC. The red dots in the following figures illustrate such points:

Line and curve points over Secp256k1 and 25519. Image by author.

The red lines in these figures also illustrates an important concept related to ECC operations such as adding curve points together. This is the concept of finding lines that connect two dots on the curve, intersecting the curve at a third point. So, a line and three points it passes through.

The code I used to plot these base curves (and some of the following examples) is available on my Github.

Basic Commitment

So what is a commitment? We could look at the dictionary for a general definition, but here it refers to defining (binding) a value in a way that it cannot be changed, while hiding it. Since the value cannot be changed, we can say we are committed to it. At a later time, we can reveal the value by providing a secret to open its container. This leads to the two defining factors commonly used for (cryptographic) commitments:

  • hiding: until the committed value is revealed, it cannot be discovered
  • binding: once a commitment is made, its value cannot be changed

There can be different levels of hiding and binding, typically the tradeoff is between being perfectly hiding/binding and computationally hiding/binding. But that is more details than needed for this article at this point.

A typical example of a simple commitment is putting a piece of paper (e.g., a voting ballot), with a number, in a locked box, and providing the key later:

A basic commitment scheme example: Big chef, Little chef, and a ballot commitment scheme. Chefs, chest, and key from Pixabay. Thanks for the art! Combined story image by author.

In the above example, the big chef puts a voting ballot (with number 5 written on it) into the locked chest. The chest is then given to the little chef, who has no key. The big chef has now committed to a vote. It is inside a box, with no visibility, thus it is hidden. It can no longer be changed as the box is locked and given to someone (little chef) who has no key. Thus, the commitment has become binding. But little chef does not yet know what the vote is. At a later time, the key is given to little chef, and the vote (commitment) is revealed.

Basic Commitment with ECC

Like the above example with the chefs, Elliptic Curves can be used to build cryptographic commitment schemes. I will try to illustrate this here, starting from a simple example using a single curve point, and leading to the Pedersen Commitment. I borrow some basic ideas from the example in the Grim documentation. Thanks for the ideas!

As for notation, I will use a to represent the committed value. Mainly because if you read about Monero, this seems to be the general notation used. I guess referring to the committed amount of Monero.

Using a Single Curve Point

A curve point named H is defined and made public. We call this the public base point. This base point is actually so public that it is generally defined in the curve’s standard specification, such as for curve 25519 and for secp256k1. With a as the committed value (amount), the commitment c based on H becomes c = a * H. This c defines another point on the same Elliptic Curve, based on the base point H. The following figure illustrates this process:

Basic commitment with Elliptic Curves, and a simplified visualization. Image by author.

Here, the blue point is the base point H on the elliptic curve (not the real one for 25519, I just picked something for illustration). The commitment value would be the red point, which would equal to a*H, in this case with value a of 5, c=5*H.

Disclaimer: The below figure illustrates the simplified (fake) notation of Elliptic Curve math I use in this article, with values increasingly moving on the curve in the direction of the dotted red line. For example, 5*H in this figure is 5 line segments from H into the direction of the arrow. I use such increases for more intuitive illustration, real Elliptic Curve math is more complex and behaves different. I will describe the actual math briefly at the end of this article for anyone interested, with some link to find out more. My simplification here is just for illustrative purposes.

Illustration of the simplified visualization I use, points moving in direction of red line. Image by author.

Curves vs Chefs

Lets consider the above simple curve commitment as the chefs example. The big chef calculates c=5*H, and publishes c. The little chef does not see the value a (5) being committed, he only sees curve point c. To reveal the commitment, the big chef then reveals a=5, and the little chef can verify that a*H=c. Due to properties of Elliptic Curves and related mathematics, finding any other value x where x*H=c ,other than a, is considered very difficult, and computationally unfeasible (outside Quantum computing). Thus, this commitment is considered computationally binding.

Once a is revealed, the verification of c is trivial. Because H is a public value, the little chef knows it. Because c was published earlier , he can check a*H=c when a is revealed. If it matches, the commitment and the given a is verified.

If we compare this to the chefs example with the chest earlier, the following concepts should match:

  • chest = published (commitment) curve point c
  • ballot = committed value a
  • key = curve point equation a*H. Or just a, since H is known.

The Problem Here

The problem with the above example is that this process is good at binding the commitment value a, but very poor at hiding it. Lets assume we have 10 choices of values we could put on the chefs ballot, from 1 to 10. It would be trivial to brute force all these point values. Just calculate a*H for all a=1–10, and compare the result to the commitment c.

To address this, something called the blinding factor is used.

The Blinding Factor

The blinding factor is commonly referred to as r. This r is just a (secure) random number. Generally in range 1-2²⁵⁶, so very large. A slightly naive approach is to add this to a as in (r+a)*H. The following figure illustrates this with r=11 and a=5, resulting in published c=(11+5)*H=16*H:

Commitment to 5, with a blinding factor 11. Image by author.

Here, r is shown as the green dot. With this, there is no way to tell what a is, without knowing what r is. However, this has an added problem where any combination of a+r could be revealed as the commitment. Thus this approach would lose the binding property, as the commitment could be changed after publishing it. For example, for the above example we could publish any of (a=4, r=12), (a=5, r=11), (a=6, r=10), or many others as the commitment where the sum of r and a is 16. The commitment needs to be binding.

Pedersen Commitment

And solving this problem finally moves on to the Pedersen Commitment. For this, another base point G on the same curve can be used. This leads to the final commitment form of r*G+a*H=c. This is finally what is called the Pedersen Commitment (with EC). Using my made-up EC notation, we could visualize this like so:

Pedersen commitment on Elliptic Curve 25519, to value 5, and blinding factor 11*G. Image by author.

Now we are back to computationally binding, and at the same time hiding commitment. As I mentioned earlier, due to properties of elliptic curves, and the associated mathematics, it is considered extremely difficult to forge a commitment with different values of r and a, so that it would still match the published c=r*G+a*H. And since both r and a now use a different base point, they cannot be changed as when just summing (a+r)*H above. That’s what they tell me, and it’s enough for my goal in this article.

Homomorphic sums: Hiding the Values in Monero

Monero applies the Pedersen Commitment to hide the actual amounts in transaction inputs and outputs, while at the same time verifying that the sum of amounts in inputs and outputs match. Sounds like sorcery, so lets see how this works.

The homomorphic property of the Pedersen Commitment means that we can actually perform mathematical operations on the encrypted values, and it works just as if the values were not encrypted. In this case, this is based on the homomorphic nature of the Elliptic Curve math. We can simply add curve points together, and compare the results, even without revealing the actual transaction Monero amounts. This is what the magic in the matching Monero sums without knowing the values is based on. But I need concrete examples, so lets make one.

First, a brief description of how cryptocurrency transactions generally works. At a high level, a typical transaction looks something like this:

A cryptocurrency transaction at a high level. Image by author.

Each transaction takes a set of inputs that are spent (TxINs). Spending here means the cryptocurrency in those inputs are transferred to some output addresses on the blockchain, as transaction outputs (TxOUTs). An associated transaction fee is taken by the network, and treated as one of the TxOUTs. In Bitcoin, the transaction inputs and outputs, and their values are visible for anyone to see. In that case, the transaction looks pretty much just like the figure above, for anyone observing the blockchain.

Monero is a cryptocurrency especially focusing on privacy, and does many things to hide various parts of these transactions from the public view. One of those is to hide the amounts being transferred. Monero uses the Pedersen Commitment to hide the transaction amounts (values). In this case, the process now looks something like this:

A (Monero) transaction with amounts hidden. Image by author.

In this case, the input and output values are hidden from the public view. Only the fee is made visible. The only way to see the hidden values is if you have access to the required keys to decrypt the transaction. Thus the values are only visible to the sender and receiver. No other blockchain network nodes, or any other outsider, can see the concrete values that are in the transaction.

However, to maintain a valid ledger of transactions, the system must be able to validate that the total sum of the transaction outputs match its inputs. That is, no bigger amount of Monero is spent than is put in. Otherwise people could just spend some small amount of coins (Monero) in the transaction inputs, and create transaction outputs for much larger sums. This would effectively create Monero out of thin air, and break the entire system. Kind of like the central banks, and the associated monetary system in the “real world”, except we are still waiting for that pyramid to fall ;).

Matching input and output sums

As described earlier, EC Pedersen Commitments used by Monero use two public curve points, G and H. Each input and output Tx (TxIN and TxOUT) has its own Pedersen Commitment, so each one has its own a and r defined, like this:

Example transaction with all commitment values except fee blinding factor. Image by author.

In the above, a always matches the amount of Monero used in the TxIN or TxOUT. As I discussed earlier, the r is basically a large random number. Here I just picked some (small) numbers to illustrate the calculations.

The r in the above figure now becomes the G multiplier, and a the H multiplier. Generally, the numbers (especially r) would be bigger, but these work for an example.

TxIn Commitment

For the TxIN inputs in the above, the commitments would now be (using r and a values from image above):

  • 10$: c = 14*G + 10*H
  • 30$: c = 85*G + 30*H
  • 10$: c = 45*G + 10*H

Using the homomorphic addition property, the total commitment of all the TxIN’s is then:

  • Cin: 14*G + 10*H + 85*G + 30*H + 45*G + 10*H = (14*G + 85*G + 43*G) + (10*H + 30*H + 10*H)

TxOut Commitment

For the TxOUTs, the commitments would be:

  • 8$: c = 33*G + 8*H
  • 40$: c = 28*G + 40*H
  • (fee) 2$: x*G + 2*H

The blinding factor for the fee is still undefined (x) for now. To calculate x, let’s sum the other parts first.

Using homomorphic addition, the total commitment of all the TxOUT’s is:

  • 33*G + 8*H + 28*G + 40*H + x*G + 2*H = (33*G+28*G+x*G) + (8*H + 40*H + 2*H)

Visualizing the Amount on the Elliptic Curve

It is perhaps simpler to look at the a and r calculations here separately (for points H and G respectively). First the a and H:

  • TXin: 10*H+30*H+10*H=50*H
  • TXout: 8*H + 40*H + 2*H=50*H
Matching the sum of input vs output amounts, via their commitments. Image by author.

Both of these end up at the final curve point of 50*H. The final curve point is the part that is published in the transaction (as part of the commitment, combined with the blinding factors*G). The homomorphic nature of it allows comparing the final points to see that the sums match in input and output. This gives the binding property, as finding other matching a and r values would be computationally unfeasible, while allowing to match the amounts.

Next, we need to add all the blinding factors to gain the hiding property.

Visualizing the Blinding Factor on the Elliptic Curve

As for amount H above, for the blinding factor G we can do the same:

  • TxIN: 14*G+85*G+45*G=144*G
  • TxOUT: 33*G+28*G+x*G=61*G+x*G

As noted above, the x in these equations is the r for fee. Using my simplified EC visualization approach, it looks like this without the fee:

Initial setup of the random blinding factors for the Pedersen Commitment. Image by author.

Now we need to make also the TxIN and TxOUT curve points match here. TxIN final curve point is 144*G, so we need to make TxOUT final curve point match that.

Matching the TxIN curve to the TxOUT Curve Using Fee Blinding Factor

As you may have guessed, we can match the TxIN and TxOUT curve points for G like this:

  • r for fee (x above) = (sum of TxIN multipliers for G) – (sum of TxOUT multipliers for G, without fee) = (14+85+45)–(33+28)=83

The commitment for the fee then becomes:

  • 2$: c = 83*G + 2*H

Filling in the last missing spot (x) in the transaction, we get:

Using the fee blinding factor to balance the curve point sums. Image by author.

And when we plug this into the total output G calculation and figure above, we get the following:

  • TxIN: 14*G+85*G+45*G=144*G
  • TxOUT: 33*G+28*G+x*G=61*G+(144–61)*G=61*G+83*G=144*G

And we can now see that both input and output end up at the same curve point for G just like for H:

Using the fee to match the sums (and EC end points) of Pedersen Commitment blinding factors. Image by author.

The total summed commitments are now:

  • TxIN: 144*G + 50*H
  • TxOUT: 144*G + 50*H

We can now subtract the summed input and output commitments, and check if they match the commitment to zero.

Commitment to Zero

To prove that the total transaction inputs match the outputs, we can compare the sum of their commitments to a commitment to zero. Besides my bank account balance, commitment to zero here simply means:

  • z = 0 * H + 0 * G

And if we calculate the difference of the input and output commitments from the above, we get:

  • TxIN commitment (c_in): 144*G + 50*H
  • TxOut commitment (c_out): 144*G + 50*H
  • c_in-c_out: (144*G+50*H) – (144*G+50*H) = 0*G + 0*H
  • This is the same as commitment to zero above (z = 0*G + 0*H)

And so we have successfully proven that our total sum of inputs matches the total sum of outputs. Without revealing anything about the actual transaction amounts. This is true, since only the single, final curve points are included in the transaction data. There is no r or a in any form included in the public commitment. Only thing included is the final curve point that is the sum of 144*G+50*H.

This should reveal no information about the amounts, and, I believe, provides a practical example of zero knowledge proof with more cryptographical constructs. It proves the input and output amounts match, but tells you nothing (you gain zero knowledge) about the actual amounts.

The trialing code I used to make the above calculations and the verify that the input and outputs commitments and their differences comes to match the commitment to zero is available on my Github.

I am still no in-depth expert on cryptography, but for anyone interested to read more on homomorphic encryption and its relation to ZKP, a good start seems to be this post on Stack Exchange (as usual..). However, a brief look into actual EC math is in order after all the simplified examples I did above.

EC Addition and Multiplication, Achtually

All the above examples of elliptic curve addition, or multiplication, I showed the points simply move along the curve by x line segments, where x was the base point multiplier (x*G or x*H). For example, for 5*G I simply moved the point 5 line segments forward on the curve starting from G. This is not how real EC math works.

First, a simple addition of two points. Adding two points on an Elliptic Curve gives you a third point. The following figure illustrates this in four steps:

Adding two points to get a third (or fourth..) point on an Elliptic Curve. Image by author.

In the above figure the four steps are the following:

  • Step 1: Define the points you want to add (P1+P2).
  • Step 2: Find a line connecting these two points.
  • Step 3: The line should always intersect the curve at a third point. Find that third point (P3).
  • Step 4: Reflect P3 across the x-axis to find P4. P4 is now the result of adding P1+P2. So P1+P2=P4 in the above figure.

Why is the Elliptic Curve math done like that? I have no idea, none of the materials I read explained that part. But for me it is enough to have a good enough idea of what is happening and discussed for this topic.

Elliptic Curve Cryptography takes a bit specialized approach to this. Instead of adding two different points together, a public base point is defined, and added to itself N times. Using G to describe a base point, the following figure again illustrates this process in another 4 steps:

Adding a base point to itself on an Elliptic Curve. Image by author.

Again, G here is just a random point I picked for illustration, not any official point. In the above figure, the four steps are now:

  • Step 1: Find the defined public base point G. Since we are adding G to itself, it is used as both P1 and P2 from the previous example on adding P1+P2. Instead of P1+P2, we now do G+G.
  • Step 2: Find the line that passes through P1 and P2, or G and G in this case. For a single point, you could draw any line that passes through it (i.e., infinite possible lines). Someone smart has then decided that the tangent line is to be used. It is the line that just touches the curve at point G.
  • Step 3: Find the point where the tangent line crosses the curve, this matches P3 from the previous example.
  • Step 4: Reflect P3 across the x-axis to get P4, similar to the previous example. This results in G+G, or 2*G. Continue adding G to this to get 3*G, or add 2*G to itself to get 4*G. Repeat until you get the point you desired. For example, in the Monero example above I used values such as 14*G, 85*G, and so on. Just iterate this process to get to the x in x*G.

Additionally, there are other related points, such as use of prime numbers and modulos, for cases such as where the results would not fit in the N bits used. This gets called EC over finite fields, and other such fancy terms. However, the level I presented in this article is enough for me to have a basic understanding to read the blockchain articles, understand the topics, and possibly experiment a bit with the topic.

While all this sounds a bit complicated, luckily other people have already implemented all this in different libraries. For prototyping the topics in this article, I used an Elliptic Curve library for Golang, where the operations and calculations are implemented and verified for me. But I do find it very useful to understand what and why am I doing in those operations.

As I noted before, you can find some example code where I use a library for these calculations on my Github.

Computational vs Information Theoretical

Earlier I discussed how a commitment should be both hiding and binding. Until the committing party reveals the committed value, it should remain hidden. And it should not be possible to change the committed value after committing to it. In other words, the commitment should be binding. I believe I already mentioned the terms information theoretically hiding, and computationally binding.

The Pedersen Commitment is considered to be information theoretically hiding, meaning that no matter how much computing power you have, it is not possible to 100% determine the original committed value, if it is not revealed. As far as I understand, this is because there are always multiple possible values that could fulfill the commitment.

Because of this same reason, the Pedersen Commitment is considered to be computationally binding. Because there are multiple possible values that could fulfill the EC equations, with sufficient computing power you could find a different (and thus a “false”) commitment that would satisfy the verifier. In practice, for the Pedersen Commitment, finding such a point is known as the EC discrete logarithm problem, which is considered too hard to solve without Quantum computing.

This appears to be a common tradeoff between the hiding and binding properties. Perfect (information theoretical) hiding results in computational binding, and perfect binding in computationally hiding.

Back to the Roots: What is Zero Knowledge Proof?

After the lengthy detours, back to the original question. What is Zero Knowledge Proof? Looking back at the Ali Baba example, and the use of Pedersen commitment in Monero, I will try to summarize my take.

  • We know a lot about the problem, and are happy share a lot of that knowledge.
  • We have a specific piece of information we want to keep secret.
  • We wish to share zero knowledge of that secret, but prove we know it, or that something related to it is true.
  • The proof comes from applying a specific process according to the domain in question. This results in (statistically) sufficiently strong proof that we hold the proof as verified.
  • There are likely some properties that need to be assumed true, such as having true sources of randomness, or lack of Quantum computing.

In the Ali Baba cave, the process is that of Alice always arriving via the correct path to Bob, repeated so many times that the combined probability of Alice always picking the right path by chance is incredibly small. Assumptions including no cheating (no path to bypass door in cave), and no broken randomness (you cannot predict which path Bob calls).

In the Monero proof of matching input and output amounts via Pedersen Commitments, the process is in the Homomorphic calculations holding true. While this case is only computationally binding, the computing power and the associated discrete logarithm problem is considered so hard that the chance of “faking” the proof is also considered incredibly small. Assumptions here include no solution to the discrete logarithm problem (i.e., no Quantum computing), and again true sources of randomness (cannot guess r).


There are plenty of examples on Ali Baba cave, and resources such as Zero to Monero that are filled with Greek symbols and formulas. I wrote this piece in order to find myself the middle ground on what ZKP means, and how the conceptual examples like Ali Baba translate to the cryptographic world, and what they mean at a higher level. I hope this will be useful to someone else. At least I feel I got a better understanding to my questions.

This article also allowed me to get familiar with the Pedersen Commitment, which I had often seen mentioned, but wondered about the real meaning. In the end, I find ZKP, as most topics, is not too complicated once you get past the terminology. Just a lot of new concepts for me, which takes time.

In the case of Monero and the way it verifies and anonymizes its transactions, there are also various other features I have not touched here. These include range proofs (a.k.a. bulletproofs in current implementation) to ensure the Pedersen Commitments are always using positive numbers, ring signatures to hide the sender addresses, stealth addresses for hiding recipients, key images to identify transactions, and (Diffie-Helman) key-exchanges to communicate the r and a from sender to receiver in the Pedersen Commitment. But these are out of the scope of this article.

Just to repeat the general disclaimer, I am not a cryptographer. So do your own research if you really need to understand the details. Hope this was useful for someone anyway. Cheers.

Just to remind once more, the codes I used to trial this article are available on my Monero Scraper Github project. At the time of writing, under src/golang/pedersen and src/python/experiments/ellipticcurves.

PoW vs PoS. Whut?

Recently I have tried to understand blockchains and how they actually work. It seems the internet is full of high-level descriptions that always stop before actually explaining how the different blockchain variants really work. What is proof-of-work in practice, and what does proof-of-stake actually mean? What actually happens when you deploy a blockchain node in different variants, how does the blockchain “network” reach “consensus”? What is the blockchain “network”? What is “concensus”? What is “byzantine fault tolerance” and how does it relate to blockchains?

I started with looking into Bitcoin, since it is the most popular and has the most information available. There is a nice book about it even, with reasonable amounts of implementation level details as well. The book is named “Mastering Bitcoin: Programming the Open Blockchain 2nd Edition”. Just in case the Amazon link breaks. I am sure there are many other great books and resources but I really liked that one. The Naivecoin tutorial I discussed before is also good stuff. So generally, resources are there for understanding Bitcoin style PoW blockchains, also at more detailed level. But let’s see what I think, and you can then correct me. I start with PoW and then give some PoS assumptions.

Proof of Work (PoW)

Proof of Work simply refers to running a specific hash function for long enough to find a set of values that produces a hash value matching some given criterion. An example using a generic block structure, mostly imitating Bitcoin block structure:

  • prev_hash: Previous block hash
  • merkle_root: Hash for data in this block (Merkle root)
  • timestamp: Current timestamp
  • diff_target: Difficulty target
  • nonce: A value that is changed to try to find different hashes
  • transaction_list: List of transactions included in this block

I use the above properties as an example for hashing. Additionally, there would be a number of more generic meta-data, such as magic number, block size, version. Such metadata are used in parsing to find expected data size of block (e.g., bytes to read from socket), and to verify the block data is valid (correct magic number, version number, ..). For more details on actual Bitcoin block elements, see e.g., here.

To build a hash for PoW, the above list of data can be all hashed together. So it becomes something like this:

block_hash = SHA256(prev_hash+merkle_root+timestamp+diff_target+nonce+transaction_list)

This is slightly simplified, but serves the purpose. Here SHA256 is a specific hash function, and various other hash functions are often used as well. The resulting block_hash can be used to verify the contents of the block, since to get the same hash value you need to have the exact same block contents as input for the hash function. This protects from anyone changing a processed block (e.g., modifying existing transactions already in the blockchain). Since the attacker would have to be able to re-calculate a new hash for this block, as well as all blocks coming after it (so the longer/deeper the block is in the chain, the “safer” it is from modification as it is buried deeper under new blocks).

The blockchain itself is just a long list of blocks, each referencing to the previous block (with prev_hash). The term chain refers to each block linking to previous one this way. This connection is enforced by including the prev_hash in the next block along with block_hash of the block itself. So you cannot just change the order of the blocks in the chain, as the linked prev_hash values would no longer match.

Look, I made a pretty picture:


Here, the block itself contains the block_hash, prev_hash, and block_data. So block_hash is calculated over all the rest of the data in the block (including prev_hash). Prev_hash links the block to the previous block in the chain, as it is the block_hash of the previous block. Block_data contains all the rest of the data, and is of course included in the block_hash, meaning block_hash verifies the integrity of all this data. Because changing block_data (or prev_hash) would change block_hash, and thus invalidate the whole chain (links would no longer be valid as prev_hash would differ from block_hash of previous block). Cute.

Anyway. What is proof of work? Anyone can calculate a block hash using the above hashing formula. But not just any hash will be accepted by the blockchain network.  The acceptance criteria is defined in the software designed to run the blockchain network. The most common one is the difficulty based criteria. The Naivecoin tutorial has a good and practical description of one. The difficulty is calculated using an algorithm with several parameters. A typical example:

  • current_difficulty: Current mining (PoW) difficulty.
  • adjustment_interval: Number of blocks to produce until difficulty should be adjusted.
  • target_interval: The target time that should be spent between generating blocks.

The blockchain always targets to produce blocks with an interval time of target_interval between them. For example, Bitcoin uses a target_interval value of 10 minutes, and a adjustment_interval of 2016 blocks. With a good difficulty value, 2016 blocks should take 2 weeks to generate. This is 2016*10/60/24=14 days. 2016 is the number of blocks in adjustment interval, 10 is for 10 minutes per block, 60 is for minutes in hour, 24 for hours in a day. If mining 2016 blocks takes less time than 14 days, the difficulty will be adjusted higher. If the mining time is more than 14 days, the difficulty is adjusted lower. This way the blockchain network adjusts its parameters over time to keep the amount of coins mined close to the specified rate. Many blockchains use shorter than 10 minute time_interval and shorter than 14 days adjustment_interval. But generally, the idea is again the same. This is the basic idea, which can be built on by other difficulty adjustment algorithms such as LWMA, which seems to be quite popular with many altcoins currently.

The process of mining in PoW systems is to try hashing different block configurations (thus different hash inputs), until you get one that passes the difficulty check. The nonce parameter is there specifically for this purpose, to let miners try different values to get a hash that passes the difficulty check. Additionally, one can play to some extent with the choice of transactions to include as well as the exact timestamp used. These all change the input to the hash function, and thus the produced hash value. But nonce is the intended main value to change. Additional considerations can also be applied, such as extra_nonce etc.  but the main concept is the same.

What is the difficulty exactly? It is simply a reference to a hash value that needs to be found. With higher difficulty, the target hash value is set to be harder to find. With lower difficulty, it is set to be easier to find. This is based on the fact that a hash value is just a very large number, even if usually represented as a hex-string. Examples:

  • Difficulty target set to 100k: You need to find a hash value that is less than 100k.
  • Difficulty target set to 1M: You need to find a hash value that is less than 1M.

Out of these two, the 100k difficulty target is higher in difficulty, because there are much fewer possible target values (100000). 1M is a much “easier” difficulty target since there are 10x the possible accepted hash output values (100000 vs 1 million). Thus, a higher difficulty target means a lower difficulty (i.e., easier to find a suitable hash and thus create a valid block). Most confusing, I am sure. There can be variants of this, such as using a number of leading zeroes required for the hash output instead of under a specific number, but the concept is still the same.

The hashing algorithm presented far above (block_hash=…) produces a single value. And we cannot control what value it produces, since hash-functions are one-way and no-one has been able to figure a way to reverse them (directly generate input to give specific hash output). So the only way to find a hash matching the target difficulty is to keep trying different input combinations until we hit one which produces a suitable hash value (below the minimum difficulty target). The “Proof of Work” is then just proof that you have generated huge numbers of these hashes, enough to find one matching the difficulty criteria. Hashing is the work, the proof is producing a hash matching the difficulty target.

Peer to Peer (P2P) consensus

Once someone finds a block (produces an accepted hash..), the block needs to be added to the blockchain. This happens when the finding node propagates the new block with the valid block hash to its peers, which propagate it further to their peers, and so on for the whole network. The peers can all check that the data in the block produces a valid hash in relation to the difficulty target, and that the data in the block is verified (prev_hash matches the block_hash on previous block added to the chain, transactions in transaction_list are valid, etc.). This of course requires all nodes to be connected in a peer-to-peer fashion.

The peer-to-peer network is established and continously updates itself over time. When a blockchain node boots up, it typically establishes a connection to the blockchain network via a pre-configured set of seed nodes. These can be static IP addresses stored in the client, or a set of pre-defined DNS server addresses. Once the node has connected this way to its first set of peers, these peers provide it with additional peer nodes to connect to. The node can request more peers from these peers, and so on. Connections to nodes may drop over time, and the node can use peer lists it receives from others to find new ones.

The initial synchronization of the blockchain state happens by the nodes sending their confirmed blocks, transactions, and other blockchain state information to each other and any new peer joining the network. Once nodes are synchronized, the new mined blocks are added to the blockchain as soon as some node finds them. New “money” is created by the mining node adding a “coinbase” transaction to any new block it finds (gets an accepted hash for), giving itself the number of coins agreed in the network specification for finding a new block.

So this is how distributed consensus is achieved in PoW networks. But how about PoS networks? Not too different (or so I think), but the way of creating and assigning new blocks has to be different. That is the whole point, of course..

Proof of Stake (PoS)

The above information on Proof of Work is actually relatively simple to find on the internets and books. The Bitcoin book I linked above is one great source for such information. This is maybe because Bitcoin is such a well known and long-standing project, it has a huge market for information, and so there a lot of information about it. For Proof of Stake, there is much less information, and I cannot name a single coin in PoS side that would be such a “canonical” project. Ethereum is talking about making the switch from PoW to PoS, and some proposals for this in the form of “Casper” proposals are available. However, these are written in a very theoretical manner, and digress all over the place. Not so great for someone like me, who just wants a good understanding at a bit more detailed level.

So what do I think is Proof of Stake? At a high level it is always described as distributing new coins based on the amount of coins you already have. So each round (staking interval) gives a set of new coins to existing owners, based on their “stake” in the network. The stake would be the number of coins you already have, and are willing to “stake”. And quite often this is referred to as requiring the users wallet to be online to be “staking”, that is to be able to receive these rewards. So one might expect that the wallet is connected to other peers in a similar peer-to-peer network as for PoW, but the network somehow agrees to give new coins pediodically to someone “staking” instead of using a PoW hashing contest. But how does this staking actually happen? How does a distributed and assumably decentralized network ever agree on who gets the coins?

I found some Reddit post (which I lost the link for) and a couple of StackExchange posts (post1, post2). And the Blackcoin and Peercoin whitepapers. These all seem to describe the same thing, even if with a bit of a different twist. In any case, what they describe makes sense, so I will go with that. There is also Ouroboros whitepaper, Tendermint whitepaper, and the Ethereum PoS FAQ. But these drift off in way too academic way for me.

The algorithm I get from these is very similar to that of PoW hashing. The main difference is that, there is no field in the block data that is hashed that can be easily varied to such a large extent that is done in the PoW implementations. Instead, only a selected set of core values in the block is selected as the state given as input to the hash function. This set of core values is called the “kernel”. Since the data selected for this input has only a limited amount of possible values, and thus hash outputs, it makes no sense for the “miner” to spend a lot of effort and energy on trying different values. Mainly because there are only a few values to try at each point in time. This is how you try to escape the hashing contest of PoW coins, and save the planet by reducing energy consumption.

So in this scenario, I would assume each “staker” takes their set of kernel values, along with a timestamp, hashes these, and if they get a value that fits the difficulty target of the network, they can be selected to “mint” new coins. The timestamp is masked to remove the lowest bits, in order to make it unfeasible to try a large number of close-to-each-other timestamps. The number of coins owned is used as a way to weight the hash outcome higher, and to increase the likelihood to “mint” the new coins on each round based on the number of coins staked. A number of previous blocks hashes is also included in the equation to make it more difficult to try to come up with different hash values for future blocks.

From the posts I linked above, I get the more sensible explanation of how the node to create a block is selected. So the hash you end up with from your kernel data is compared against the network difficulty (similar to PoW), and the difficulty is also adjusted at similar intervals as with PoW. Some of the early coins (e.g., the Peercoin whitepaper) also discuss adding coin age into the equation to favor coins that have not been selected recently. Later on the Blackcoin whitepaper sees this as a potential problem (even possible attack vector) and removes age from the equation. And from there I guess it has diverged to all sorts of academic exercise, that are simple but made to look complex.

In any case, I guess there are other ways to do different parts of PoS. But the process in general as I described above seems like it would make sense. Please do correct and tell me how it really works.. 🙂

Something I wonder after all this is, why is it always said you need to have your wallet online to “stake”? I don’t see a need, other than to have something around to calculate your kernel hash. You could just grab the kernel input data from the wallet, and keep a separate program online to calculate the PoS hashes. This way there would be much less risk of wallet being compromised, which is often cited as a risk of PoS staking. I must be missing something again?


Porting Naivecoin tutorial to Golang (first parts)

Recently I got interested in blockchains, and how do they work in practice. Found a nice tutorial called Naivecoin: a tutorial for building a cryptocurrency. There is also a nice book on the topic called Mastering Bitcoin: Programming the Open Blockchain. Had to get that too, didn’t I.

So to get a better understanding of the topic, I tried implementing the tutorial in Go. Using a different programming environment requires me to look up the actual implementation and try to understand it, as opposed to just copy-pasting all the code. I tried the parts until part 3, and set my code up on Github, hopefully I will update it with the remaining parts of the Naivecoin tutorial, and other related topics, sometime later.

The Naivecoin first part focuses on building the basic blockchain. To start, a block structure is needed to describe the blockchain:

type Block struct {
	Index        int  //the block index in the chain
	Hash         string //hash for this block
	PreviousHash string //hash for previous block
	Timestamp    time.Time //time when this block was created
	Data         string //the data in this block. could be anything. not really needed since real data is transaction but for fun..
	Transactions []Transaction  //the transactions in this block
	Difficulty	 int //block difficulty when created
	Nonce		 int //nonce used to find the hash for this block

A blockchain is a long list (or a chain) of blocks. The Block structure above has various fields to describe its position in the chain, its contents, and metadata used to provide assurance over the chain validity.

  • Index: The number of the block on the blockchain. So 1st block is 1, 10th block is 10.
  • Hash: Cryptographic hash of all the fields in the block together. Verifies the block has not been modified since creation (e.g., change transaction address).
  • PreviousHash: Should match the Hash value of the previous block in the blockchain. Useful to verify chain integrity I am sure.
  • Timestamp: Time when this block was created. Used for hash variation, and to keep the creation of blocks within a defined time limit (to make some attacks harder).
  • Data: The blockchain is supposed to store data in an “immutable ledger”. I guess the typical blockchain application is still cryptocurrencies, where the data is the Transactions part. I put this extra data field in just to play with other data if I feel like it.
  • <Transactions: List of transactions included in this block. Moving coins from one person to the other.
  • Difficulty: The mining difficulty when this block was created. Useful to check the hash validity etc.
  • Nonce: A value to vary in trying to get a hash that matches the set difficulty. In case of bigger difficulty, may be required. But for this tutorial implementation I guess this is fine.

The blockchain is composed of these blocks, each referencing the previous one, in a chain. Thus the blockchain, woohoo. So, first how to calculate the hash for a block:

//calculate hash string for the given block
func hash(block *Block) string {
	indexStr := strconv.Itoa(block.Index)
	timeStr := strconv.FormatUint(uint64(block.Timestamp.Unix()), 16) //base 16 output
	nonceStr := strconv.Itoa(block.Nonce)
	diffStr := strconv.Itoa(block.Difficulty)
	txBytes, _ := json.Marshal(block.Transactions)
	txStr := string(txBytes)
	//this joins all the block elements to one long string with all elements appended after another, to produce the hash
	blockStr := strings.Join([]string{indexStr, block.PreviousHash, timeStr, diffStr, block.Data, txStr, nonceStr}, " ")
	bytes := []byte(blockStr)
	hash := sha256.Sum256(bytes)
	s := hex.EncodeToString(hash[:]) //encode the Hash as a hex-string. the [:] is slicing to match datatypes in args
	return s

I think Bitcoin uses something like a Merkle tree to combine elements in a block. But in the above, all the elements in a block are simply turned into strings, concatenated into a single long string, and this string is hashed. So again, we could maybe do better but it works well for a tutorial such as Naivecoin.

Now, to create the blocks. As visible in the structure above, each block has to reference the previous ones to create the chain. And to provide assurance over the overall chain, hashes of the block and of the previous block are provided. But the chain has to start somewhere. The first block in the chain is called the genesis block. It has no previous hash, as it has no previous block. So we create it specifically first:

//create genesis block, the first one on the chain to bootstrap the chain
func createGenesisBlock(addToChain bool) Block {
	genesisTime, _ := time.Parse("Jan 2 15:04 2006", "Mar 15 19:00 2018")
	block := Block{1, "", "0", genesisTime, "Teemu oli täällä", nil,1, 1}
	hash := hash(&block)
	block.Hash = hash
	if addToChain {
		globalChain = append(globalChain, block)
	return block

More generally, to create non-genesis blocks:

//create a block from the given parameters, and find a nonce to produce a hash matching the difficulty
//finally, append new block to current chain
func createBlock(txs []Transaction, blockData string, difficulty int) Block {
	chainLength := len(globalChain)
	previous := globalChain[chainLength-1]
	index := previous.Index + 1
	timestamp := time.Now().UTC()
	nonce := 0
	newBlock := Block{index, "", previous.Hash, timestamp, blockData, txs, difficulty, nonce}
	for {
		hash := hash(&newBlock)
		newBlock.Hash = hash
		if verifyHashVsDifficulty(hash, difficulty) {
			return newBlock
		newBlock.Nonce = nonce

The above code takes the block transactions, data and the current difficulty of the chain as parameters. The rest of the information required to create a block is taken from the previous block (block index, previous block hash) or system clock (current timestamp). Finally, it loops the different nonce values until it finds a hash matching the current difficulty level. Like I mentioned before, it might be a bit simplified, but this is certainly good enough for a tutorial such as the Naivecoin.

So, in this case, to verify the block difficulty:

func verifyHashVsDifficulty(hash string, difficulty int) bool {
	prefix := strings.Repeat("0", difficulty)
	return strings.HasPrefix(hash, prefix)

This is quite simple, it just measures that the given hash-string starts with a given number of zeroes. In this case the measure of difficulty is the number of zeroes the hash should start with. In a more general case, I believe the difficulty can be a number under which the hash should fall (2nd top answer at the time of writing this). That gives you much more granularity to define the difficulty.. But, again, no need to go overly complex on things in a Naivecoin tutorial.

Similarly, adding a block to the blockchain is quite simple:

//add a new block to the existing chain
func addBlock(block Block) {
	chainLength := len(globalChain)
	previousBlock := globalChain[chainLength-1]
	block.PreviousHash = previousBlock.Hash
	globalChain = append(globalChain, block)
	for _, tx := range block.Transactions {
	//todo: check block hash matches difficulty

So as I commented above, I should also check that the new block matches the required difficulty. Would not be too hard but I will leave that update for the next version.

Adding the block also requires adding all the transactions within that block to the list of active transactions:

func addTransaction(tx Transaction) {
	oldTx := findUnspentTransaction(tx.Sender, tx.Id)
	if oldTx >= 0 {
		print("transaction already exists, not adding: ", tx.Id)
	allTransactions = append(allTransactions, tx)
	for _, txIn := range tx.TxIns {
		deleteUnspentTransaction(tx.Sender, txIn.TxId)
	for idx, txOut := range tx.TxOuts {
		utx := UnspentTxOut{tx.Id, idx, txOut.Address, txOut.Amount}
		unspentTxOuts = append(unspentTxOuts, utx)

So, as the Naivecoin tutorial so nicely explains, each transaction has two types of “sub-transactions” as I call them here. Not an official term or anything. But these are:

  • TxIn: Transaction inputs.
  • TxIn: Transaction outputs.

I found the explanations related to how these work very confusing to start with. Eventually I got it, but it is not the most intuitive thing. Especially with the terminology of Transaction (Tx), TxIn, TxOut, … Especially since the TxIn and TxOut are mostly the same thing, but expressed in a different way. Well, that is my undestanding anyway. Please do correct me.

A TxOut (would it be correct to call it a transaction output?) is what creates “coins” to spend. Sending coins to someone creates a TxOut. Since a single transaction can contain multiple TxOut instances, it just means you can send coins to multiple people in a single transaction. Or multiple wallets that is. The most common seems to be to send coins to someone else and back to yourself. Why is this?

To create TxOut’s you need a matching amount of TxIn’s. Or the amount of coins referenced in each should match. This check should actually be a part of the addTransaction function above. To check that the transaction is valid by checking that the input and output amounts add up to the same sums. But you can only add existing TxOut instances to a transaction as TxIn’s. So if you got a TxOut giving you 100 coins, you can only add that TxOut to your next transaction as a TxIn if you want to send someone coins. So the TxIn has to be 100 coins in this case. What if you want to send only 50 coins? Then you put the single TxIn with 100 coins, but create 2 TxOut’s for that transaction. One to send 50 coins to he received, and another to send the remaining 50 coins to yourself. This way the balances match again. Confusing? Yes. Check pretty pictures some way down this post.

Of course, if you send coins to an address that is not used, you will “burn” those coins. No-one can ever access them, because no-one knows the private key to match the public key used to sign that TxOut. You could maybe make a lot of money if you could find a key to some of the bigger coin-burn addresses. But I digress.

How do the coins initially come into existence if you can only send coins by referencing an existing TxOut? Where do the first TxOut come from? It has to all be bootstrapped somehow, right? This is what is called a Coinbase transaction. No, not according to the cryptocurrency selling company. They took the name from the term, not the other way around. A coinbase TxIn might look something like this:


func createCoinbaseTx(address string) Transaction {
	var cbTx Transaction

	var txIn TxIn
	txIn.TxIdx = len(globalChain)
	cbTx.TxIns = append(cbTx.TxIns, txIn)

	var txOut TxOut
	txOut.Address = address
	cbTx.TxOuts = append(cbTx.TxOuts, txOut)

	cbTx.Id = calculateTxId(cbTx)
	cbTx.Signature = "coinbase"

	return cbTx

The above code creates the special transaction called the coinbase transaction. In proof-of-work type solutions, such as in the Naivecoin tutorial, this is added to each mined block. So when a miner finds a hash for a new block (thus “creating” it), they get the reward for the block. This reward is paid as the coinbase transaction. Each block can have one, and the miner can create it in the block. I would expect the miner then puts their own address in the transaction as the receiver. For the TxOut address. The TxIn in this case comes from nowhere, from thin air, or from a made up input. This is how the money is made here..

To my understanding, the way coinbase transactions are verified is simply by each node in the network accepting only a single coinbase transaction in a block, with specific parameters. These should match the number of coins to be rewarded according to the coin specification, and the defined coinbase signature. I believe in Bitcoin you can set many of the other fields as you like. For example, people can hide messages in the TxIn address or some other fields of the coinbase transaction. Because they do not need to match anything real. Once the coinbase amount is issued to a miner, they can then send coins using the coinbase TxOut as part of their new transaction.

The related data structures:

type TxOut struct {
	Address string	//receiving public key
	Amount int		//amount of coin units to send/receive

type TxIn struct {
	TxId      string	//id of the transaction inside which this TxIn should be found (in the list of TxOut)
	TxIdx     int		//index of TxOut this refers to inside the transaction

type UnspentTxOut struct {
	TxId    string	//transaction id
	TxIdx   int		//index of txout in transaction
	Address string  //public key of owner
	Amount  int		//amount coin units that was sent/received

//transaction is from a single person/entity
//inputs from that entity use funds, outputs show how much and where
//all the time there should be some list kept and updated based on this
type Transaction struct {
	Id string
	Sender string //the address/public key of the sender
	Signature string	//signature including all txin and txout. in this case we sign Transaction.Id since that already has all the TxIn and TxOut
	TxIns []TxIn
	TxOuts []TxOut

Each block contains a set of transactions, which contain a set of TxIn and TxOut.

  • The TxOut defines the address (encoding of recipients public key) of the recipient, and the amount of coins to send.
  • The TxIn defines a reference to a previous TxOut. So an identifier to find the transaction which contains it, and the index of the TxOut within that transaction.
  • The transaction itself contains the address (public key) of the sender. Since each TxIn and TxOut is contained in the transaction, and references that transaction, the owner can be checked against the address of the sender in the transaction. So the recipient in the “spent” TxOut should match the sender of this new transaction.
  • A prespecified way of calculating the transaction id is defined in the coin specification. This is then used to create and encode a hash value for the transaction id. I guess most real coins would use some form of Merkle-trees as linked high above. In this case the naivecoin simply adds the information on the sender and contained TxIn and TxOut instances into a long string and hashes that.
  • The signature is used to sign the transaction id (already containing all the transaction data in the hash) with the sender private key, using ECDSA signatures, as described in my previous post. The authenticity of the transaction and all its contents can then be verified by using the public key of the sender as embedded in the sender address to verify the signature. Brilliant stuff.

In pictures this looks something like this:


In the above example (in the figure above), Bob is sending 550 coins to Alice. To do this, Bob needs TxIn’s that sum up to minimum of 550. He has a set that sums up to 600. So these are used as the TxIn’s in this transaction. As 600 is more than 550, there are two TxOut’s created. One assigns the 550 coins to Alice, and the other 50 coins back to Bob. The blockchain tracks coin amounts using the TxOut instances stored within transactions, so it can only “spend” a full TxOut. This is why the “change” (50 coins in this example) generates a completely new TxOut.

The wording in the figure above for “deleteing” inputs simply refers to the TxIn’s being marked as used, and not being able to use them again. So the previous TxOut they refer to (from previous transactions that gave coins to Bob) are marked as used. As the wallet application and blockchain nodes track which TxOut are unused (not referenced by any accepted TxIn), it can count the balance of a user as a sum of their unspent TxOut.

The wording for “creating” new TxOut in the figure above refers to adding new TxOut to the list of unspent TxOut. In this case it adds one for Bob (50 coins) and one for Alice (550 coins).

In the code showing the TxOut, TxIn, and UnspentTxOut higher above the structure of each of these is shown. TxId refers each TxOut and TxIn to a specific transaction where the associated TxOut was created. So also TxIn refers to TxOut but serves as a way to mark that TxOut as “spent”. The reference consists of the transaction id (each transaction in the blockchain having a unique id), and the TxIdx refers to the index of the TxOut within a transaction. Since a transaction can have multiple TxOut, this is just the index in the list of TxOut within the transaction to identify which one the TxOut exactly is.

Each TxOut is assigned to a specific address, and the address has the recipient public key. Each transaction is signed using the senders private key, and this signature can be verified using the signers public key. Thus each TxIn can be verified to be spendable by the person who signed the transaction containing the TxIn. Following the TxIn reference to the TxOut it is referring to, and looking up the address it was assigned to gives the public key. By checking that this public key matches the private key used to sign the new transaction wanting the spend that TxIn, it can be verified that the person creating the transaction is in possession of the private key associated to that public key.

Proper transaction verification should check that all TxIn match the transaction signature, there are sufficient number of the TxIn, and that the TxIn total sum matches the total sum of TxOut. Probably much more too, but that is a good start.

Much confusing all this is. Hope you, dear reader, if you exist, find more information to understand my ramblings and to correct me.

After the transaction in the figure above, the end result looks something like this:


With all this in place, we can write the code to send coins:

func sendCoins(privKey *ecdsa.PrivateKey, to string, count int) Transaction {
	from := encodePublicKey(privKey.PublicKey)
	//TODO: error handling
	txIns, total := findTxInsFor(from, count)
	txOuts := splitTxIns(from, to, count, total)
	tx := createTx(privKey, txIns, txOuts)
	return tx

func findTxInsFor(address string, amount int) ([]TxIn, int) {
	balance := 0
	var unspents []TxIn
	for _, val := range unspentTxOuts {
		if val.Address == address {
			balance += val.Amount
			txIn := TxIn{val.TxId, val.TxIdx}
			unspents = append(unspents, txIn)
		if balance >= amount {
			return unspents, balance
	return nil, -1

func splitTxIns(from string, to string, toSend int, total int) []TxOut {
	diff := total - toSend
	txOut := TxOut{to, toSend}
	var txOuts []TxOut
	txOuts = append(txOuts, txOut)
	if diff == 0 {
		return txOuts
	txOut2 := TxOut{from, diff}
	txOuts = append(txOuts, txOut2)
	return txOuts

func createTx(privKey *ecdsa.PrivateKey, txIns []TxIn, txOuts []TxOut) Transaction {
	pubKey := encodePublicKey(privKey.PublicKey)
	tx := Transaction{"",  pubKey,"", txIns, txOuts}

	signTxIns(tx, privKey)

	tx.Id = calculateTxId(tx)
	tx.Signature = signData(privKey, []byte(tx.Id))
	return tx

The full code for my Go port of the naivecoin tutorial (first parts) is on my Github.

There are various other concepts I would still like to look into. Ways to be able to effectively search the whole blockchain and all transactions in it effectively (to validate transactions, build block explorers, etc.) would be interesting. I understand some type of embedded databases may be used. And endlessly argued about. Ah, just when you though the programming language flame-wars of past are no longer, you find it all again. Much stability in argument, such peace it brings. In my Rock’n chair I kek at all things past and present. Sorry, just trying to learn to talk like the internet and crypto people do. On a lighter note, other topics of interest to look into include robust peer-to-peer connections, ring signatures, proof of stake, etc.

That about sums it up for my experiment for now. Do let me know what are all the things I got wrong.

Trying to learn ECDSA and GoLang

Recently I have been looking at the Naivecoin tutorial, and trying to implement it in Go to get an idea of how blockchains really work, and to learn some more Go as well. The tutorial code is in Javascript, and translating it to Go has been mostly straightforward. However, porting the third part with transactions was giving me some issues. I had trouble figuring how to port the signature part. This is tutorial the code in Javascript:

    const key = ec.keyFromPrivate(privateKey, 'hex');
    const signature: string = toHexString(key.sign(dataToSign).toDER());

This is nice and simple, just suitable for a high-level framework and tutorial. However, to implement it myself in Go, …

The JS code above takes the private key as a hex formatted string, and parses that into a Javascript PrivateKey object. This key object is then used to sign the “dataToSign”, the signature is formatted as something called “DER”, and the result is formatted as a hex string. What does all that mean?

The tutorial refers to Elliptic Curve Digital Signature Algorithm (ECDSA). The data to sign in this case is the SHA256 hash of the transaction ID. So how to do this in Go? Go has an ecdsa package with private keys, public keys, and functions to sign and verify data. Sounds good. But the documentation is quite sparse, so how do I know how to properly use it?

To try it, I decided to first write a program in Java using ECDSA signatures, and use it to compare to the results of the Go version I would write. This way I would have another point of reference to compare my results to, and to understand if I did something wrong. I seemed to find more information about the Java implementation, and since I am more familiar with Java in general..

So first to generate the keys to use for signatures in Java:

	public static String generateKey() throws Exception {
		KeyPairGenerator keyGen = KeyPairGenerator.getInstance("EC");
		SecureRandom random = SecureRandom.getInstance("SHA1PRNG");

		keyGen.initialize(256, random); //256 bit key size

		KeyPair pair = keyGen.generateKeyPair();
		ECPrivateKey priv = (ECPrivateKey) pair.getPrivate();
		PublicKey pub = pair.getPublic();

		//actually also need public key, but lets get to that later...
		return priv;

Above code starts with getting an “EC” key-pair generator, EC referring to Elliptic Curve. Then get a secure random number generator instance, in this case one based on SHA1 hash algorithm. Apparently this is fine, even if SHA1 is not recommended for everything these days. Not quite sure about the key size of 256 given, but maybe have to look at that later.. First to get this working.

The “priv.Encoded()” part turns the private key into a standard encoding format as a byte array. Base64 encode it for character representation, to copy to the Go version..

Next, to sign the data (or message, or whatever we want to sign..):

	public static byte[] signMsg(String msg, PrivateKey priv) throws Exception {
		Signature ecdsa = Signature.getInstance("SHA1withECDSA");


		byte[] strByte = msg.getBytes("UTF-8");

		byte[] realSig = ecdsa.sign();

		System.out.println("Signature: " + new BigInteger(1, realSig).toString(16));

		return realSig;

Above starts with gettings a Java instance of the ECDSA signature algorithm, with type “SHA1withECDSA”. I spent a good moment wondering what all this means, to be able to copy the functionality into the Go version. So long story short, first the data is hashed with SHA1 and then this hash is signed with ECDSA. Finally, the code above prints the signature bytes as a hexadecimal string (byte array->BigInteger->base 16 string). I can then simply copy-paste this hex-string to Go to see if I can get signature verification to work in Go vs Java. Brilliant.

First I tried to see that I can get the signature verification to work in Java:

	private static boolean verifySignature(PublicKey pubKey,String msg, byte[] signature) throws Exception {
		byte[] message = msg.getBytes("UTF-8");
		Signature ecdsa = Signature.getInstance("SHA1withECDSA");
		return ecdsa.verify(signature);

The code above takes the public key associated with the private key that was used to sign the data (called “msg” here). It creates the same type of ECDSA signature instance as the signature creation previously. This is used to verify the signature is valid for the given message (data). So signed with the private key, verified with the public key. And yes, it returns true for the signed message string, and false otherwise, so it works. So now knowing I got this to work, I can try the same in Go, using the signature, public key, and private key that was used in Java. But again, the question. How do I move these over?

Java seems to provide functions such as key.getEncoded(). This gives a byte array. We can then Base64 encode it to get a string (I believe Bitcoin etc. use Base56 but the same idea). So something like this:

		byte[] pubEncoded = pub.getEncoded();
		String encodedPublicKey = Base64.getEncoder().encodeToString(pubEncoded);
		String encodedPrivateKey = Base64.getEncoder().encodeToString(priv.getEncoded());

Maybe I could then take the output I just printed, and decode that into the key in Go? But what is the encoding? Well, the JDK docs say getEncoded() “Returns the key in its primary encoding format”. And what might that be? Well some internet searching and debugger runs later I come up with this (which works to re-create the keys in Java):

	public static PrivateKey base64ToPrivateKey(String encodedKey) throws Exception {
		byte[] decodedKey = Base64.getDecoder().decode(encodedKey);
		PKCS8EncodedKeySpec spec = new PKCS8EncodedKeySpec(decodedKey);
		KeyFactory factory = KeyFactory.getInstance("EC");
		PrivateKey privateKey = factory.generatePrivate(spec);
		return privateKey;

	public static PublicKey base64ToPublicKey(String encodedKey) throws Exception {
		byte[] decodedKey = Base64.getDecoder().decode(encodedKey);
		X509EncodedKeySpec spec = new X509EncodedKeySpec(decodedKey);
		KeyFactory factory = KeyFactory.getInstance("EC");
		return publicKey;

So the JDK encodes the private key in PKCS8 format, and the public key in some kind of X509 format. X509 seems to be related to certificates, and PKCS refers to “Public Key Cryptography Standards”, of which there are several. Both of these seem a bit complicated, as I was just looking to transfer the keys over. Since people can post those online for various crypto tools as short strings, it cannot be that difficult, can it?

I tried to look for ways to take PKCS8 and X509 data into Go and transform those into private and public keys. Did not get me too far with that. Instead, I figured there must be only a small part of the keys that is needed to reproduce them.

So I found that the private key has a single large number that is the important bit, and the public key can be calculated from the private key. And the public key in itself consists of two parameters, the x and y coordinates of a point (I assume on the elliptic curve). I browsed all over the internet trying to figure this all out, but did not keep records of all the sites I visited, so my references are kind of lost. However, here is one description that just so states the integer and point part. Anyway, please let me know of any good references for a non-mathematician like me to understand it if you have any.

To get the private key value into suitable format to pass around in Java:

	private static String getPrivateKeyAsHex(PrivateKey privateKey) {
		ECPrivateKey ecPrivateKey = (ECPrivateKey) privateKey;
		byte[] privateKeyBytes = ecPrivateKey.getS().toByteArray();
		String hex = bytesToHex(privateKeyBytes);
		return hex;

The “hex” string in the above code is the big integer value that forms the basis of the private key. This can now be passed, backed up, or whatever we desire. Of course, it should be kept private so no posting it on the internet.

For the public key:

	private static String getPublicKeyAsHex(PublicKey publicKey) {
		ECPublicKey ecPublicKey = (ECPublicKey) publicKey;
		ECPoint ecPoint = ecPublicKey.getW();

		byte[] affineXBytes = ecPoint.getAffineX().toByteArray();
		byte[] affineYBytes = ecPoint.getAffineY().toByteArray();

		String hexX = bytesToHex(affineXBytes);
		String hexY = bytesToHex(affineYBytes);

		return hexX+":"+hexY;

The above code takes the X and Y coordinates that make up the public key, combines them, and thus forms a single string that can be passed to get the X and Y for public key. A more sensible option would likely just create a single byte array with the length of the first part as first byte or two. Something like [byte count for X][bytes of X][bytes of Y]. But the string concatenation works for my simple example to try to understand it.

And then there is one more thing that needs to be encoded and passed between the implementations, which is the signature. Far above, I wrote the “signMsg()” method to build the signature. I also printed the signature bytes out as a hex-string. But what format is the signature in, and how do you translate it to another platform and verify it? It turns out Java gives the signatures in ASN.1 format. There is a good description of the format here. It’s not too complicated but how would I import that into Go again? I did not find any mention of this in the ECDSA package for Go. By searching with ASN.1 I did finally find an ASN.1 package for Go. But is there a way to do that without these (poorly documented) encodings?

Well, it turns out that ECDSA signatures can also be described by using just two large integers, which I refer to here as R and S. To get these in Java:

	public static byte[] signMsg(String msg, PrivateKey priv) throws Exception {
		Signature ecdsa = Signature.getInstance("SHA1withECDSA");


		byte[] strByte = msg.getBytes("UTF-8");

		byte[] realSig = ecdsa.sign();

		System.out.println("R: "+extractR(realSig));
		System.out.println("S: "+extractS(realSig));

		return realSig;

	public static BigInteger extractR(byte[] signature) throws Exception {
		int startR = (signature[1] & 0x80) != 0 ? 3 : 2;
		int lengthR = signature[startR + 1];
		return new BigInteger(Arrays.copyOfRange(signature, startR + 2, startR + 2 + lengthR));

	public static BigInteger extractS(byte[] signature) throws Exception {
		int startR = (signature[1] & 0x80) != 0 ? 3 : 2;
		int lengthR = signature[startR + 1];
		int startS = startR + 2 + lengthR;
		int lengthS = signature[startS + 1];
		return new BigInteger(Arrays.copyOfRange(signature, startS + 2, startS + 2 + lengthS));

Above code takes the byte array of the signature, and parses the R and S from it as matching the ASN.1 specification I linked above. So with that, another alternative is again to just turn the R and S into hex-strings or Base56 encoded strings, combine them as a single byte-array and hex-string or base56 that, or whatever. But just those two values need to be passed to capture the signature.

Now, finally to parse all this data in Go and to verify the signature. First to get the private key from the hex-string:

	func hexToPrivateKey(hexStr string)  *ecdsa.PrivateKey {
		bytes, err := hex.DecodeString(hexStr)

		k := new(big.Int)

		priv := new(ecdsa.PrivateKey)
		curve := elliptic.P256()
		priv.PublicKey.Curve = curve
		priv.D = k
		priv.PublicKey.X, priv.PublicKey.Y = curve.ScalarBaseMult(k.Bytes())
		//this print can be used to verify if we got the same parameters as in Java version
		fmt.Printf("X: %d, Y: %d", priv.PublicKey.X, priv.PublicKey.Y)

		return priv

The above code takes the hex-string, parses it into a byte array, creates a Go big integer from that, and sets the result as the value into the private key. The other part that is needed is the elliptic curve definition. In practice, one of a predefined set of curves is usually used, and the same curve is used for a specific purpose. So it can be defined as a constant, whichever is selected for the blockchain. In this case it is always defined as the P256 curve, both in the Java and Go versions. For example, Bitcoin uses the Secp256k1 curve. So I just set the curve and the big integer to create the private key. The public key (X and Y parameters) is calculated here from the private key, by using a multiplier function on the private key’s big integer.

To build the public key straight from the X and Y values passed in as hex-strings:

	func hexToPublicKey(xHex string, yHex string) *ecdsa.PublicKey {
		xBytes, _ := hex.DecodeString(xHex)
		x := new(big.Int)

		yBytes, _ := hex.DecodeString(yHex)
		y := new(big.Int)

		pub := new(ecdsa.PublicKey)
		pub.X = x
		pub.Y = y

		pub.Curve = elliptic.P256()

		return pub

Again, base56 or similar would likely be more efficient representation. So the above code allows just to pass around the public key and not the private key, which is how it should be done. With the parameters X and Y passed, and the curve defined as a constant choice.

To create and verify the signature from the passed values:

	type ecdsaSignature struct {
		R, S *big.Int

	func verifyMySig(pub *ecdsa.PublicKey, msg string, sig []byte) bool {
		digest := sha1.Sum([]byte(msg))

		var esig ecdsaSignature
		asn1.Unmarshal(sig, &esig)
		//we can use these prints to compare to what we had in Java...
		fmt.Printf("R: %d , S: %d", esig.R, esig.S)
		return ecdsa.Verify(pub, digest[:], esig.R, esig.S)

The above version reads the actual ASN.1 encoded signature that is produced by the Java default signature encoding. To get the functionality matching the Java “SHA1withECDSA” algorithm, I first have to hash the input data with SHA1 as done here. Since the Java version is a bit of a black box with just that string definition, I spent a good moment wondering about that. I would guess the same approach would apply for other choices such as “SHA256withECDSA” by just replacing the hash function with another. Alternatively, I can also just pass in directly the R and S values of the signature:

	func verifyMySig(pub *ecdsa.PublicKey, msg string, sig []byte) bool {
		digest := sha1.Sum([]byte(msg))

		var esig ecdsaSignature
		esig.R.SetString("89498588918986623250776516710529930937349633484023489594523498325650057801271", 0)
		esig.S.SetString("67852785826834317523806560409094108489491289922250506276160316152060290646810", 0)
		fmt.Printf("R: %d , S: %d", esig.R, esig.S)
		return ecdsa.Verify(pub, digest[:], esig.R, esig.S)

So in the above, the R and S are actually set from numbers passed in. Which normally would be encoded more efficiently, and given as parameters. However, this works to demonstrate. The two long strings are the integers for the R and S I printed out in the Java version.

Strangely, printing the R and S using the ASN.1 and the direct passing of the numbers gives a different value for R and S. Which is a bit odd. But they both verify the signature fine. I read somewhere that some transformations can be done on the signature numbers while keeping it valid. Maybe this is done as part of the encoding or something? I have no idea. But it works. Much trust such crypto skills I have.

func TestSigning(t *testing.T) {
	xHexStr := "4bc55d002653ffdbb53666a2424d0a223117c626b19acef89eefe9b3a6cfd0eb"
	yHexStr := "d8308953748596536b37e4b10ab0d247f6ee50336a1c5f9dc13e3c1bb0435727"
	ePubKey = hexToPublicKey(xHexStr, yHexStr)

	sig := "3045022071f06054f450f808aa53294d34f76afd288a23749628cc58add828e8b8f2b742022100f82dcb51cc63b29f4f8b0b838c6546be228ba11a7c23dc102c6d9dcba11a8ff2"
	sigHex, _ := hex.DecodeString(sig)
	ok := verifyMySig(ePubKey, "This is string to sign", sigHex)

And finally, it works! Great 🙂