Almost bare metal Part II

In my previous post I mentioned I have been working on moving a routine that handles many hashing tasks at once to the GPU. The preliminary results are encouraging, although I cannot say they are apples-to-apples to the actual task at hand.

To get some simple numbers, I simplified the task. There is a stream of bytes that must be hashed using the SHA256 algorithm. The stream of bytes includes a part that is somewhat static and a part that changes. The part that changes could be one of about 13,000 combinations. The task is to hash those 13,000 byte streams and check which (if any) have the value below a certain threshold, similar to the process used by Bitcoin to publish blocks.

Because we are now dealing with moving blocks of memory across a bus, the actual process was adjusted to do this in more of a “batch” mode. The 13,000 byte streams are easily created on the CPU. Each byte stream is put in an array (also on the CPU).

Now for the fun part. The entire array is hashed one element at a time. The results are placed in another array. This is done once on the CPU and once on the GPU, with the timing results compared.

The initial results were not good:

Building CPU IDs: 3345344ns total, average : 261ns.
Building GPU IDs: 142561855ns total, average : 11137ns.

But running it a second time yielded:

Building CPU IDs: 3297128ns total, average : 257ns.
Building GPU IDs: 795204ns total, average : 62ns.

And a third run yielded:

Building CPU IDs: 3304968ns total, average : 258ns.
Building GPU IDs: 788838ns total, average : 61ns.

I would guess that the first run had to set up some things on the GPU that cost time. Subsequent runs are much more efficient. To support that thesis, note that the time it took the CPU was fairly consistent across all three runs, and the GPU runs after the initial run were also very consistent.

Runs beyond the first 3 show consistent times on both CPU and GPU.

The synopsis is: Bulk hashing on a NVidia GeForce 2060 (notebook version) using SYCL provides about a 4X improvement in hashing. Moving to CUDA provides similar results, so it seems the language does not seem to matter here.

More questions to be answered:

  • Comparing the hashes to a “difficulty” level must also be done. Would the GPU be more efficient here as well? Should it be done in the same process (memory efficient) or in a separate process (perhaps pipe efficient)?
  • What would happen if we ran this on an AMD GPU with similar specifications?
  • What resources were used on the GPU? Are we only using a fraction of memory/processing abilities, or are we almost maxed out?
  • At what point is the arrays that are passed in or retrieved from the card too big to handle?

(Almost) bare metal

I love writing in C++. The reasons are performance (I am a speed junkie) and learning opportunities. If necessary, I can tune a routine to seek out the fastest way to achieve the result. Each time I do, I learn a lot. I have been learning constantly since I picked up the book on BASIC that I received with my Timex-Sinclair 1500.

But what if my computer is simply not fast enough to perform the routine I need it to? If the task can be broken into smaller, somewhat independent pieces, we can use threading. My machine has 8 CPUs and 16 cores. My latest project included a process that took 20 seconds on 1 thread, and 6 seconds with 16 threads. And it wasn’t that hard to do.

But 6 seconds is not fast enough. And during those 6 seconds my CPUs are pegged at 100%. The routine is tuned to the best of my ability. What now? Well, the process is some number crunching. I have a video card that loves to crunch numbers. What will it take?

I have coded the routine in CUDA as well as SYCL and will have some results soon (Update: See Part II). I do not get to do this stuff often, so I am enjoying re-learning CUDA and this is my first round with SYCL. SYCL looks like a winner here, as I will be able to compare results between some NVIDIA and AMD cards with (hopefully) the same codebase.

But using GPUs are not the only option for such tasks. There are ASICs available for my current number-crunching necessity. And as I walked down the ASIC road I was able to explore another avenue that I had only glossed over in some prior projects: FPGAs.

ASICS are purpose-built microprocessors. They are designed to be fed a certain way, and spit out the answer in a certain way. You can think of it as getting a new gadget that has the sticker “no user serviceable parts inside”. You cannot get that chip to do anything beyond what it was meant to do.

FPGAs are in the middle between microprocessors and ASICs. In fact, you can use an FPGA as a way to prototype an ASIC. But here, we are not talking making it do what we want by manipulating C or C++. We are talking about manipulating transistors and wires. It is digital circuit building without the breadboard and soldering iron (well, at least in some cases).

FPGAs have been around for a while. But until fairly recently they were simply too expensive for anything but the big budget projects. Think government, Wall Street, Google, and Amazon. Today, you can experiment with an FPGA for $40 or less.

As a software developer, I had to twist my brain to begin to reason about how get these things to do what I want. I have done some microprocessor work. I have written software for embedded devices. I know what a transistor is and when to use a NAND. But FPGAs are requiring me to rewire my brain as I rewire that chip.

Where To Begin

I am just getting started. So here is what I have learned so far. The big hardware players in this arena are Xilinx (now owned by AMD), Altera (now owned by Intel) and Lattice. There are other players, but for what I am doing and where I am at in my exploration, I want to examine these suppliers before getting into the nuanced specifics. I am at the “I don’t know what I don’t know” stage.

When digging through the interwebs, there are a number of development boards and chips that keep popping up. So if you want to follow the tutorials, you will need to get your hands on some hardware. My first board will not be the latest and greatest (one I looked at on Mouser was just over USD$17,000.00). It will be modest and with a manufacturer that seems to often be used in the industry I am working with. Here are the 2 I am looking at:

The Digilent Basys 3

This is often used in FPGA tutorials. The development board includes a good amount of toys to play with to build your knowledge. Pushbuttons, LEDs, Ethernet port, etc. for around US$160

The Digilent Arty A7/S7

This also appears in some tutorials. The S7 seems to be the low-cost version of the A7. Both come in two flavors, so 4 different chips for 4 different prices. The A7 includes an Ethernet port, whereas the S7 does not. The A7 with the lower cost chip is around $160

Honorable Mentions

If you are looking to get into FPGA on the cheap, check out the Lattice Icestick. But don’t stop there. There are plenty of others with varying sizes of “maker” communities around them.

Debugging in VI

I switched back from Visual Studio Code to vi (I tire of switching from keyboard to mouse). As my memory is not very good, here is my cheat-sheet for using :Termdebug

Starting a debug session: Termdebug [executable_name]

Navigating the 3 windows: CTRL + w and then j or k.

Set / remove a breakpoint: :Break :Clear

Step / Next: s or n

Run / Continue: run or c

Pros and Cons of Centralization

One of the subjects that many in the crypto universe speak of is centralization. Most speak of it as something that must be destroyed. But it does provide services. Here are my thoughts:

Pros of Centralization

I have several bank and investment accounts. The majority of them are insured by either FDIC or SIPC. While the US government does not explicitly say they back these insurance institutions, these are (IMO) truly “too big to fail”.

That means that the cash I have deposited is “mostly” secure against the bank failing. The (very) small percentage chance that the FDIC or SIPC fails is small enough that I do not worry about it. Should they fail, the world will have bigger problems than money.

Another pro is security. We rely on the fact that banks know how to secure the deposited funds. They take on the risk of the funds I deposit. The bank will fail if the risks they take fail. So they put guards in place to prevent their own failure. They are basically in the business of protecting my money so they can make money with it.

You can say “that is not fair. They are charging a much higher percentage interest than what they are paying me.” True. But if you think you can do better, check out the real returns of some of the P2P lending platforms out there. I believe you will find that the due diligence of a risk-averse lender can easily be a full-time job.

To me, the biggest benefit is the lack of threat of robbery. If I kept all my money in my home, someone is eventually going to know about it. That opens me up to a physical threat of someone taking it. I realize someone can always attempt a robbery, regardless of if they know I store money there. But should they learn that I keep all my money there greatly increases the risk of me being robbed.

Also, contrary to the belief of many, the central bankers (a.k.a “The Fed” in the US) do perform a vital service. They have powerful tools at their disposal to manipulate the market. Without those tools, the US economy would have completely failed many times over. So, preventing crashes is a good thing, right? Well…

Cons of centralization

I am not advocating that crypto is worthless because the banks perform a service. I believe the technology is great. But there are some trends in crypto that remove the advantages of banks because they too are becoming more centralized. More on that a little later.

In the secular world, the “golden rule” changes to “he who hath the gold makes the rules.” And for societies that run on some form of money (in other words, practically all society), controlling the money supply puts power into the hands of few.

Throughout history, centralization of power has always lead to problems (the lack of centralization has also lead to problems, but I digress). Having money is an easy, powerful, and (often) legal form to control others. Centralization is not always bad, but it easily leads to abuse of power.

So we go back to The Fed. When they push buttons and pull levers to attempt to steer the US economy, their effects ripple through the world economy. Those bankers are not concerned about any economy other than the US economy. The US government is (arguably) concerned with world stability, but not The Fed. Remember, the decisions of The Fed are (arguably) not controlled by politicians. So while they may serve a purpose for the US, they affect the world. And the majority of the world is not their concern.

Concluding thoughts

I have many more thoughts on the matter. But the above forms the crux of what is right and wrong with crypto IMO.

Providing the security without the centralization is a near impossible task. When you have something someone wants, you risk having it taken by force. If it is a government you worry about, a decentralized cryptocurrency may be the answer. Just remember that at that point, you are now responsible for all the services that banks provide.

Do you want to get involved? Great! Do you not know where to start? Be careful! Remember, you are now your own bank. You need to protect your own money. Start small, learn carefully. Things will continue to move fast. But that does not mean you are forced to take unnecessary risks.

Uses for Komodo’s komodostate file

The komodostate file contains “events” that happened on the network. The file provides a quick way for the daemon to get back to where it was after a restart.

Some of the record types are for features that are no longer used. In this document I hope to detail the different record types and which functions use this data.

The komodo_state object

The komodo_state struct holds much of the “current state” of the chain. Things like height, notary checkpoints, etc. This object is built via the komodostate file, and then kept up-to-date by incoming blocks. Note that the komodo_state struct is not involved in writing the komodostate file. It does store its contents while the daemon is running.

Within the komodo_state struct are two collections that we are interested in. A vector of events, and a vector of notarized checkpoints.


komodo::event is the base class for all record types within the komodostate file.

event_rewind – if the height of the KMD chain goes backwards, an event_rewind object is added to the event collection. The following call walks backwards through the events, and when an event_kmdheight is found, it adjusts the komodo_state.SAVEDHEIGHT. That same routine also removes items from the events collection. So an event_rewind is added, and then immediately removed in the next function call (komodo_event_rewind).

event_notarized – when a notarization event happens on the asset chain of this daemon, this event is added to the collection. The subsequent method call does some checks to verify validity, and then updates the collection of notarization points called NPOINTS.

event_pubkeys – for chains that have KOMODO_EXTERNAL_NOTARIES turned on, this event modifies the active collection of valid notaries.

event_u – no longer used, unsure what it did.

event_kmdheight – Added to the collection when the height of the KMD chain advances or regresses. This eventually modifies values in the komodo_state struct that keeps track of the current height.

event_opreturn – Added to the collection when a transaction with an OP_RETURN is added to the chain. NOTE: While certain things happen when this event is fired, these are not stored in the komodostate file. So no “state” is updated relating to opreturn events on daemon restart. The komodo_opreturn function shows these are used to control KV & PAX Issuance/Withdrawal functionality.

event_pricefeed – This was part of the PAX project, which is no longer in use.


NPOINTS is another collection within komodo_state. These are notary checkpoints. This prevents chain reorgs beyond the previous notarization.

The collection is built from the komodostate file, and then maintained by incoming blocks.

Unlike much of the events mentioned earlier, several pieces of functionality search through this collection to find notarization data (i.e. which notarization “notarized” this block).

komodostate corruption

An error in version 0.7.1 of Komodo caused new records written to the komodostate file to be written incorrectly. Daemons that are restarted will only have the internal events and NPOINTS collections up to the point that the corruption starts. The impact is somewhat mitigated by the following:

New entries may be corrupted in the file, but are fine in memory. A long-running daemon will not notice the corruption until restart.

Most functionality does not need the events collection beyond the last few entries, which will be fine after the daemon runs for a time.

Data queried for within the NPOINTS collection will have a gap in history. As the daemon runs, the chances of needing data within that gap are reduced.

Reindexing Komodo ( -reindex ) will recreate the komodostate file based on the data within the blockchain. The fix for the corruption bug will hopefully be released very soon (in code review).

Komodo Notarization OP_RETURN parsing

An issue recently popped up that required a review of notarization OP_RETURNs between chains. As I hadn’t gone through that code as yet, it was a good time to jump in.

NOTE: Consider this entry a DRAFT, as there are a few areas I am still researching.


For notarizations, the OP_RETURN will be a 0-value transaction in vOut position 1. vOut position 0 will be the transaction that contains the value.

The scriptSig of the vOut[1] will start with 6e, which is the OP_RETURN opcode. Immediately after will be the size, which will be a value up to 75, OP_PUSHDATA1 followed by a 1 byte size for values over 75, or OP_PUSHDATA2 followed by 2 bytes of size for numbers that do not fit in 1 byte. And then the “real” data begins.

Block / Transaction / height

The next 32 bytes is a block hash, followed by 4 bytes of the block height. This should correspond to an existing block at a specific height on the chain that wishes to be notarized.

If the chain that wishes to be notarized is the chain that we are currently the daemon of, the next 32 bytes will be the transaction on the foreign “parent” chain that has the OP_RETURN verifying this notarization. <– Note to self: Verify this!

Next comes the coin name. This will be ASCII characters up to 65 bytes in length, and terminated with a NULL (ASCII 0x00). Note that this is what determines if the transaction hash is included. So we actually have to look ahead, attempt to get the coin name, and if it doesn’t match our current chain, look for the coin name 32 bytes prior.

Sidebar: What are the chances that the same sequence of bytes coincidentally end up in the wrong spot so we get a “false positive” of our own chain? Answer: mathematically possible, but beyond the realm of possibility.

MoM and MoM depth

The next set of data is the MoM hash and its depth. This is 32 bytes followed by 4 bytes, and corresponds to the data in the chain that wishes to be notarized.

Note that the depth is the 4 byte value filtered with “& 0xffff”. I am not sure why just yet.

CCid Data

Finally, we get to any CCid data that the asset chain that wishes to be notarized includes. The KMD chain will never include this when notarizing its chain to the foreign “parent” chain (currently LTC).

CCid data contains a starting index (4 bytes), ending index (4 bytes), an MoMoM hash (32 bytes), depth (4 bytes), and the quantity of ccdata pairs to follow (4 bytes).

Afterward is the ccdata pairs. which are the height (4 bytes), and the offset (4 bytes).

Additional Processing

If all of that works, and we are processing our own notarization, the komdo_state struct and the komodostate file is updated with the latest notarization data.


To follow along, look for the komodo_voutupdate function within komodo.cpp.

Komodo Hardfork History

This post is mainly for me to keep track of which hardforks did what.

The main hardforks in Komodo are for the election of notaries. But there are other purposes. I will add to the list as I learn more. Eventually these should be documented within the code (comments near the declaration of a #define or something).

Note that the KMD chain hardforks are normally based on chain height. Asset chains are normally based on time. Hence season hardforks have both.

  • nStakedDecemberHardforkTimestamp – December 2019
    • Modifies block header to include segid if it is a staked chain (chain.h)
    • Many areas use komodo_newStakerActive(), which uses this hardfork
  • nDecemberHardforkHeight – December 2019
    • Many areas use komodo_hardfork_active() which uses this hardfork and the above
    • Disable the nExtraNonce in the miner (miner.cpp)
    • Add merkle root check in CheckBlock() for notaries (main.cpp)
  • nS4Timestamp / nS4HardforkHeight – 2020-06-14 Season 4
    • Only for notary list
  • nS5Timestamp 2021-06-14 Season 5
    • ExtractDestination fix (see komodo_is_vSolutionsFixActive() in komodo_utils.cpp)
    • Notary list updated
  • nS5HardforkHeight 2021-06-14 Season 5
    • Add merkle root check in CheckBlock() for everyone (main.cpp)

Many areas have hardfork changes without much detail. Here are some heights found by searching the code base for “height >”:

  • 792000 does some finagling with notaries on asett chains in komodo_checkPOW (komodo_bitcoind.cpp). See also CheckProofOfWork() in pow.cpp and komodo_is_special() in komodo_bitcoind.cpp.
  • 186233 komodo_eligiblenotary() (komodo_bitcoind.cpp)
  • 82000 komodo_is_special() (komodo_bitcoind.cpp)
  • 792000 komdo_is_special() and komodo_checkPOW()
  • 807000 komodo_is_special()
  • 34000 komodo_is_special()
  • limit is set to different values based on being under 79693 or 82000 in komodo_is_special()
  • 225000 komodo_is_special()
  • 246748 komodo_validate_interest() starts to work (komodo_bitcoind.cpp)
  • 225000 komodo_commission() (komodo_bitcoind.cpp)
  • 10 komodo_adaptivepow_target() (komodo_bitcoind.cpp)
  • 100 (KOMODO_EARLYTXID_HEIGHT) komodo_commission() (komodo_bitcoind.cpp)
  • 2 PoS check in komodo_checkPOW() and komodo_check_deposit()
  • 100 komodo_checkPOW() (komodo_bitcoind.cpp)
  • 236000 komodo_gateway_deposits() (komodo_gateway.cpp)
  • 1 komodo_check_deposit() (komodo_gateway.cpp) (a few places)
  • 800000 komodo_check_deposit() (komodo_gateway.cpp)
  • 814000 (KOMODO_NOTARIES_HEIGHT1) fee stealing check in komodo_check_deposit() as well as another check a little below.
  • 1000000 komodo_check_deposit() (2 places in that method)
  • 195000 komodo_operturn() (komodo_gateway.cpp)
  • 225000 komodo_opreturn() (komodo_gateway.cpp)
  • 238000 komodo_operturn()
  • 214700 komodo_opreturn()

I need to continue searching for “height >” in komodo_interest.cpp, komodo_kv.cpp, komodo_notary.cpp, komodo_nSPV_superlite.h, komodo_nSPV_wallet.h, komodo_pax.cpp, komodo.cpp, main.cpp, metrics.cpp, miner.cpp, net.cpp, pow.cpp, rogue_rpc.cpp, cc/soduko.cpp as well as more searches like “height <” and “height =” to catch more.

Komodo and Notaries in Testnet

I recently deployed an unofficial testnet for Komodo. This will allow me to perform system tests of notary functionality without affecting the true Komodo chain. The idea is to change as little code as possible, test the notary functionality from start to finish, and gain knowledge of the code base and intricacies of notarizations within Komodo.

The plan is to set up a notary node, wire it to Litecoin’s test chain, and do actual notarizations that can be verified on both chains. The first step will focus on the Komodo-Litecoin interaction. Later I will look at how asset chains use the Komodo chain to notarize their chain.

If you wish to follow along, the majority of the changes are in this PR.

A write-up of some of the technical details of becoming a notary can be found here.

Details to test / learn:

  • Notary pay
  • Difficulty reduction
  • Irreversibility
  • Checks and balances (how does a notary know which fork to notarize?)

Unit tests will reside in my jmj_testutils_notary branch for now.

NodeContext in Bitcoin Core

<disclaimer> These notes are my understanding by looking at code. I may be wrong. Do not take my word for it, verify it yourself. Much of what is below is my opinion. As sometimes is the case, following good practices the wrong way can make a project worse. That is not what I’m talking about here.</disclaimer>

Recent versions (0.20ish) of Bitcoin Core attempt to reorganize logic into objects with somewhat related functionality. In my opinion, this is one of the best ways to reduce complexity. External components communicate with the object with well defined interface. This can lead to decoupling, which makes unintended side effects less likely.

I am focusing on the NodeContext object, as I believe this would be a big benefit for the Komodo core code. I am using this space for some notes on how Bitcoin Core broke up the object model. I hope this post will help remove some notes scribbled on paper on my desk. And who knows, it may be useful for someone else.

Breakdown of NodeContext

The NodeContext object acts as a container for many of the components needed by the bitcoin daemon. The majority of the objects are std::unique_ptr. This is a good choice, as copies are explicit. Below are the components within NodeContext:

  • addrman – Address Manager, keeps track of peer addresses
  • conman – Connection Manager – handles connections, banning, whitelists
  • mempool – Transaction mempool – transactions that as yet are not part of a block
  • fee_estimator
  • peerman – Peer manager, includes a task scheduler and processes messages
  • chainman – Chain Manager, manages chain state. References ibd, snapshot, and active (which is either the ibd or the snapshot)
  • banman – Ban manager
  • args – Configuration arguments
  • chain – The actual chain. This handles blocks, and has a reference to the mempool.
  • chain_clients – clients (i.e. a wallet) that need notification when the state of the chain changes.
  • wallet_client – A special client that can create wallet addresses
  • scheduler – helps handles tasks that should be delayed
  • rpc_interruption_point – A lambda that handles any incoming RPC calls while the daemon is in the process of shutting down.

One of the largest refactoring challenges in Komodo is the large number of global variables. Pushing them into their respective objects (especially chainmain and args objects) will help get rid of many of those globals. This will help compartmentalize code and make unit tests easier.

Komodo Core and Bitcoin v0.20.0 refactoring

Note: The following are my thoughts as a developer of Komodo Core. This is not a guarantee the work will be done in this manner, or even be done at all. This is me “typing out loud”.

The Komodo Core code has diverged quite a bit from the Bitcoin codebase. Of course, Komodo is not a clone. There is quite a bit of functionality added to Komodo which requires the code to be different.

However, there are areas where Komodo could benefit from merging in the changes that Bitcoin has since made to their code base. Some of the biggest benefits are:

  • Modularization – Functionality within the code base is now more modular. Interfaces are better defined, and some of the ties between components have been eliminated.
  • Reduction in global state – Global state makes certain development tasks difficult. Modularization, testing, and general maintainability are increased when state is pushed down into the components that use them instead of exposed for application-wide modification.
  • Testability – When large processes are broken into smaller functions, testing individual circumstances (i.e. edge cases) becomes less cumbersome, and sometimes trivial.
  • Maintainability – With improvements in the points above, modifications to the code base are often more limited in scope and easier to test. This improves developer productivity and code base stability.

Plan of Attack

“Upgrading” Komodo to implement the changes to the Bitcoin code base sounds great. But to do a simple git merge will not work. The extent of the changes is too great.

My idea is to merge in these changes in smaller chunks. Looking at the NodeContext object (a basic building block of a Bitcoin application), we can divide functionality into 3 large pieces.

  • P2P – The address manager, connection manager, ban manager, and peer manager
  • Chain – The chain state, persistence, and mempool
  • Glue – Smaller components that wire things together. Examples are the fee estimator, task scheduler, config file and command line argument processing, client (i.e. wallet) connectivity and functionality.

Building a NodeContext object will be the first step. Each piece will have access to an object of this type. This will provide the state of the system, as well as be where components update state when necessary.

The “glue” components often require resources from both “p2p” and “chain” components. Hence they should probably be upgraded last.

The task of upgrading the “chain” piece is probably smaller in scope than upgrading “p2p”. I will attempt to attack that first.

The “p2p” pieces do not seem to be too difficult. Most of the work seems to be in wrapping much of the functionality of main.cpp and bitcoind.cpp into appropriate classes. The communication between components are now better defined behind interfaces. The pimpl idiom is also in use in a few places.

Note: Bitcoin 0.22 requires C++17. For more information, click here.