Testing, probing and profiling the code

This is part 3 of a series on improving performance. Click here for part 1, and here for part 2.

Now that the basics are done, it is time to start thinking more about testing. Some may say “but what happened to test first?” Well that is a long discussion, but suffice it to say that there must be a balance between testing and productivity.

Those that subscribe to assuring 100% test coverage I applaud your efforts. But unfortunately, your code will still contain bugs. Testing is VERY important. But apply it in a broader context and you will be more productive. I built tests that assure this very naive system works.

So what do we need to test here? Firstly, we must exercise as much of the code as possible. Then we can profile to see where the bottlenecks are. After all, this project is about profiling.

The rules of the game are “throw as many transactions at this thing as rapidly as possible.” But we also must be concerned about the quality of the data. In a real-world matching engine, the majority of transactions happen real close to the best bid and offer. But not everything happens there. So we need to generate a large amount of test data that will be processed in order. The end result should be a somewhat repeatable time. That data should include:

  • Most (but not all) transactions close to the bid and ask
  • A varying amount of quantity and price
  • Price should not fall below zero (an upward bias perhaps?)
  • Orders that partially fill
  • Orders that take liquidity from several existing orders

There is functionality that does not exist. For example, this book does not provide for cancelling orders. It does not handle rounding (yet). So the test data will not include those things.

A very large CSV file was created with randomized data. Different tests will be created that limit the number of records processed. From that test, we can begin profiling to see which areas of code would benefit the most from further examination.

Tools

I will be using valgrind’s callgrind output to measure performance. kcachegrind will be used to help read the callgrind output.

Initial Results

Each time I profile a piece of code, I am somewhat surprised at the results. “No premature optimization” is alive and well. I had several assumptions before profiling. Some were valid, others do not seem to be so.

Firstly, the method “placeOnBook” took a good chunk of time. After a small amount of evaluation, it simply places an entry in the map. This area should be examined. I am thinking either there are copies or constructors that could be optimized or avoided altogether.

Secondly, the CSV Reader was high on the list. This is an example of how your test framework can show up in performance results. It doesn’t matter too much, as such results can be somewhat ignored. Optimization there is a waste of effort. But it does serve as an indication that the matching engine code is fairly robust.

Conclusion of Round 1

The std::map used is where effort should be put in. Keying differently, or replacing std::map with something faster will provide the best improvement. Here is a screenshot of the output, formatted by kcachegrind.

Order matching and container types

This is part two of my order book exercise. To start at the beginning, click here.

The initial commit of my order book uses a std::map as a collection of bids and asks. That works, but it has a problem. The key is the price. More than 1 order with the same price and oops! Bad things happen. In this case, the previous order disappears.

Well, is std::map the best choice? We could immediately say no, but that wouldn’t be any fun, now would it?

Our goal here is profiling, but I want a complete, working example. So something must be down to allow for two orders to be on the book at the same price. In addition, the oldest order should be used first (FIFO). How is that to be accomplished?

One way is create an object to be used as a key. The plan is to create an AssetOrderKey object that works with comparison operators. Let’s see that in action.

If we add an AssetKey object, and then expand it into two objects, we can have a different comparison operator for the two collections. Therefore, a begin() call on either collection will give us the best bid or ask.

To see these changes, take a look at this GitHub commit.

Another way to do that would be to pass a comparison operator to the std::map. I did not do that here.

Matching Engine Requirements

As an academic exercise, I wanted to take on building a matching engine in C++. The purpose here is to iterate through the process of measuring and improving performance.

I imagine the initial requirements as naive, with later iterations including removal of floating point calculations, variable precision, 128-bit integers, “dust” handling (may be more of an implementation question than a performance one).

This engine will be strictly single-threaded and purposeful. It is hoped that the input will be clean and optimized (which could be offloaded) to improve throughput.

Tooling will be sparse, on purpose. The idea here is not a discussion of the intricacies of the tools, but how tweaks to code affects speed.

The Idea

The engine will receive limit orders that specify the asset held, the asset to be bought, and the desired price. It will include a sequential id, externally guaranteed to be unique (such a key could be analyzed later… we’ll see…).

Once received, the order is processed and if not immediately filled, what is left over is placed on the order book.

Simple, right? There are many details yet to be sorted out. So we will get started! I will edit this post with links to my future posts on the subject. Stay tuned!

For the first cut of the order book, see this GitHub commit.

To see what I tackled next, see part two.

Bitshares Asset Terminology

The Bitshares Core code distinguishes between assets in the following manner:

  1. CORE – A base asset. Only one exists on the chain, and is created within the Genesis Block. On the BitShares mainnet, this is BTS. On the BitShares testnet, this is TEST.
  2. User Issued Asset (UIA) – An asset issued by a BitShares account.
  3. BitAsset – An asset that is backed by another. The backing asset is either CORE or an asset that itself is backed by CORE.

Some BitAssets have their parameters controlled by the BitShares Committee. These are distinguished by the ‘bit’ prefix(i.e. bitUSD, bitCNY, bitEUR, bitBTC). The price feeds for these assets come from committee members or witness members.

BitAssets could also be split into two types:

  1. Market Pegged Asset (MPA) – Assets who’s price is based on external price feeds (as opposed to the internal DEX market), and backed by the CORE asset or another asset that itself is backed by CORE.
  2. Prediction Market (PM) – Specialized BitAsset where total debt and total collateral are equal. Once a price feed (which will be between 0 and 1) is published, the market is globally settled.

Note: “Smartcoin” is an industry term with a few definitions. Those that refer to “smartcoins” on the BitShares platform are probably referring to Market Pegged Assets.

Note that assets must have some sort of exchange rate to calculate fees. This rate is called the Core Exchange Rate (CER).

BitShares Margin Terminology

BitShares has many controls around market pegged assets (MPAs) and their ability to be “shorted into existence.” Some of those fields have long names and difficult to sort out abbreviations. I am hoping to prepare this document to help sort it out.

MCR – Maintenance Collateral (or Margin Call) Ratio

How it is used:

For the sake of simplicity, I will not include fees in this discussion.

Scenario 1: Extreme Risk Taker

MYTOKEN is currently valued at 20 BTS. MCR is currently set at 1.75. I have 100 BTS, and wish to create 1 MYTOKEN. I would need to put up at least 35 BTS to create 1 MYTOKEN ( 20BTS * 1.75MCR = 35 ), which I do.

So now I have 1 MYTOKEN, I also have 75 BTS, and 35 BTS tied up as collateral.

If the value of MYTOKEN rises in relation to BTS, my collateral ratio is now below the 1.75 minimum. Let’s say the value of MYTOKEN rises to 25BTS. My collateral ratio is now (collateral / ( debt * current price) =) 1.4, well below the maintenance level I need of 1.75. I will be forced to sell (a.k.a. margin called).

Scenario 2: A Conservative Trader

MYTOKEN is currently valued at 20BTS. MCR is currently set at 1.75. I have 100 BTS, and wish to create 1 MYTOKEN. I would need to put up at least 35 BTS to create 1 MYTOKEN (this is the same as Scenario 1). I put up 50 BTS as collateral to received 1 MYTOKEN.

With such a trade, I can calculate at what price MYTOKEN must rise (or BTS must fall) to in order to be margin called. That formula is (collateral / (debt * MCR). Therefore the price at which I would get margin called is (50 / (1 * 1.75)) = 28.5714BTS

MSSR – Maximum Short Squeeze Ratio

How it is used:

When margin calls happen, orders to purchase the asset I “shorted into existence” could dry up the current liquidity on the “sell” side of the market. If there are no sellers, the price of the asset could skyrocket, thereby leaving me to lose even more money.

MSSR helps with this. It is a safety mechanism for orderly markets. We will run the margin call of Scenario 2 above through the market. For this scenario, the MSSR is set at 1.1. That means that I will have to pay up to 10% above market price in order to lower my exposure. We calculated that the price at which I would get margin called was 28.5714BTS. With an MSSR of 1.1 that means that I will pay up to 31.4281BTS.

The order book currently has an order to sell 0.25 MYTOKEN at 29 BTS, and another order to sell 0.25 MYTOKEN at 32 BTS. That means the system will purchase the 0.25 MYTOKENs at 29BTS. My account will now have 0.75 MYTOKEN, and will have used 7.25BTS of my collateral for the purchase. That leaves my collateral balance at 42.75BTS.

After that purchase, the price of MYTOKEN would need to continue to rise to 32.5714 for my collateral ratio to again be low enough for yet another margin call.

How MCR and MSSR are adjusted

At the moment, the settlement price, MCR, and MSSR are provided by price feeds. The different feeds are used to derive an average that is used for all market participants at that time.

Creating a Coin/Token on the BitShares Blockchain

Do you want your own token? Who doesn’t? Here is how you can create your own token on the BitShares blockchain. This is called a “User Issued Asset” or UIA.

For this tutorial, I will be using the BitShares public testnet. I suggest you do the same. Just ask for some TEST tokens so that you can play around. I will also be using the command line based reference wallet that comes with BitShares Core, the cli_wallet.

Firstly, I must start my wallet and make sure it connects to a testnet node. I can then unlock my wallet, and as long as I have some TEST tokens to pay the fee, I can create my token. Here is the syntax:

create_asset <issuer> <symbol> <precision> <options> <bitasset_options> <send>

We will be replacing each of these <parameters> with a value. Let’s begin.

Issuer

This is normally you. You can use your account name or your account ID. This is who will sign the transaction, and also who will be responsible for issuing tokens.

Symbol

Give your token a name. Tokens with a 3 character symbol are expensive. 4 character names are less so, and 5 or more characters are the cheapest.

Precision

This is how many decimal places you want your token to have. Zero is valid, 8 is the maximum.

Options

This is some JSON text, where you specify many options. An example is below.

{
   "max_supply" : 100000,
   "market_fee_percent" : 30,
   "max_market_fee" : 1000, 
   "issuer_permissions" : 1,
   "core_exchange_rate" : {
       "base": {
         "amount": 21, 
         "asset_id": "1.3.0" 
       },
       "quote": {
         "amount": 76399,       
         "asset_id": "1.3.1"     
       }
   },
   "whitelist_authorities" : [],
   "blacklist_authorities" : [],
   "whitelist_markets" : [],
   "blacklist_markets" : [],
   "description" : "My New Token"
}

BitAsset Options

This is a special type of token, which I will not discuss here. Pass a ‘null’ in place of this parameter.

Send

This is the final parameter. It says that you would like to transmit this to the blockchain. Passing true here will process your request. Passing false will run what you sent through some checks to make sure it is valid, and then return you the results as if you had sent true.

Issuer Permissions

Notice the value of 1 in the field issuer_permissions above. For the desired value of issuer_permissions you will need to do a little math.

Start with 0. If you want to charge a fee when the token is traded on the exchange, add 1. If you want the asset to only be distributable to those in the white list, add 2. If you want the Issuer to be able to take back the token from any account, add 4. If you want the issuer to be the only entity that can transfer the token, add 8. If you do not want to give the token holders the ability to do blind transfers, add 40. Place the total in issuer_permissions. In the example above, we only want to charge a market fee. So the total is 1.

The Final Product

create_asset jmjatlanta4 JMJATLANTA 8 { "max_supply" : 100000, "market_fee_percent" : 30, "max_market_fee" : 1000, "issuer_permissions" : 1, "core_exchange_rate" : { "base": { "amount": 21, "asset_id": "1.3.0" }, "quote": { "amount": 76399, "asset_id": "1.3.1" } }, "whitelist_authorities" : [], "blacklist_authorities" : [], "whitelist_markets" : [], "blacklist_markets" : [], "description" : "My New Token" } null false

The above will create my new token JMJATLANTA that I can then give to whomever has an account on the BitShares blockchain.

Stay Tuned!

But what do all of those other bits of data mean? That will be the subject of my next post about User Issued Assets.

Boost Test Command Line Favorites

The BitShares-Core project uses Boost Test features to exercise the product. When starting the test, here are some interesting command line options:

–run_test=module/test

This will run a specific test. We organize them into modules, that usually match the filename for more easily finding a failing test.

–log_level=message

This provides more detail about a test

–report_level=detailed

If you have a lot of output, this will give you a report at the end that details passing and failing tests. It can be parsed to find the failing tests, and associate that failure with the test that caused it.

BitShares Testnet Notes

Here are some notes about the BitShares testnet. I just needed a single place to keep these:

When syncing from scratch, it will seem to hang with the following message in the default log:

1753845ms th_a application.cpp:569 handle_block ] Got block: #6690000 006614d0b54c9b1d1aa4f3e9e4af9fbdf55d5b26 time: 2017-03-07T12:32:40 transaction(s): 4 latency: 76463793845 ms from: f0x irreversible: 6689981 (-19)

This is due to the testing that took place in March of 2017. Be patient. It will go through eventually.

Also, the chain id of the public testnet is 39f5e2ede1f8bc1a3a54a7914414e3779e33193f1f5693510e73cb7a87617447

If there is a problem with network latency, it can flounder when trying to catch up on blocks, especially around March of 2017. To increase the timeout, adjust this line.

HTLC and Wallet Security

As Hash and Time Lock Contracts become more popular, they begin to be seen by the end user. For instance, wallets are now being built with this functionality. And that exposes the user to some unscrupulous actors that could cause financial harm.

This is intended to be a document aimed at software developers. What an HTLC is and how an atomic swap is conducted will not be discussed. But interested end-users may also gain some knowledge within.

Hashlock attacks

A message digest looks very random, and it is. An important item to remember is that it is a one-way hash. The hash itself tells you nothing about the original preimage (a.k.a seed). We do not know what it was, how large it was, or when the hash was generated, or even if it was generated.

Only if we have the preimage and algorithm can we verify that the hash was generated by that preimage. In other words: if we apply a message digest algorithm to a given preimage, it will always generate the same output.

Note in the statement above that there are no guarantees about a different preimage not generating the same output as well. Therefore, using a strong algorithm is important. At the time of this writing, SHA256 is the most common, and is considered secure.

Most hashing algorithms have a fixed size output. For instance, SHA256 generates 256 bits (32 bytes) of output. No matter the input, the output will be at least and at most 32 bytes.

An Oversized Preimage Attack

There are limits to the size of a preimage that can be used. That limit is set by the blockchain itself. But when dealing with atomic swaps across blockchains, an attacker could use a preimage that is too large for one side of the 2 sided transaction. That means that an HTLC can be created on both chains, but only redeemed on one chain, and not the other.

This can be mitigated by specifying the preimage size as part of the contract. This feature is available on many bitcoin-based chains among others, but is not standard. Should the preimage size be used with an atomic swap, both sides of the transaction should include the preimage size in their contract.

Timelock Attacks

It is standard practice for HTLCs that the timelock should be long enough so that the block that contains the contract can be considered irreversible. This makes it possible for the redeemer to expose the preimage and still be guaranteed that the contract itself will not be reversed.

This becomes even more important with atomic swaps. The shorter duration (a.k.a “inner”) contract should allow time to achieve its own irreversibility. And the longer duration (a.k.a. “outer”) contract must allow time for irreversibility of both.

In addition, with an atomic swap both sides must consider the redemption time necessary. The creator of the inner contract must decide that should the redeemer redeem at the last moment, is there enough time to redeem the outer contract before the timelock expires.

One final consideration of the timelock portion of the contract is the use of capital. Should the other party not accept, the funds in the contract are held until the timelock expires. Long expiration times could result in missed opportunity costs.

The Technical Hurdle for Wallets and Users

Hash and Time Lock Contracts are implemented on many chains. Following the general guidelines of BIP199 is not the hard part. Providing an interface where an end-user is as protected as possible will require effort. With the current wallet user base, such protections for HTLC creation, verification, and execution will be the responsibility of the wallet developer.

Other notes on HTLCs

It is generally recommended that the “outer” contract be on the slower chain. This can minimize the length of the timelock, should there be the desire to use the shortest possible time.

While it may be obvious, the “inner” contract must expire before the “outer” contract.

While the Bitcoin blockchain provides the functionality for specifying the preimage length, BIP199 does not include that logic in its proposal. It is my opinion that if both chains support it, it should be used. Most chains that derive from Bitcoin should support it. BitShares also supports it.

BitShares TESTNET Binaries

These are version 3.1.0 binaries that connect to the BitShares TESTNET. I compiled these mainly for use by the BUIDL Boston hackathon, but there is nothing special about them. You can build them yourself from source by building the tags/test-3.1.0 tag from https://github.com/bitshares/bitshares-core.

The file cli_wallet_mac.tar.gz has the sha 256 checksum hash of 5e0d89576115e4ece3ec2fa99221a2a9ebcadfdd0d22e309f2739ba38259a787

The file BitShares-Core-test-3.1.0-Linux-cli-tools.tar.gz has the sha 256 checksum hash of bdc9159a617b56e560e565a633012f739d20170f68aebd79744be1655a5acb02

The file BitShares-Core-testnet-3.1.0-Windows-x64-cli-tools.zip has the sha256 checksum hash of 3b8c26ce8955bcd81b716ff4f6236a5123648a3eb0009010ac5c9f5aeb0b69a8

Note: The Windows version of cli_wallet.exe must know where your certificates are stored if you wish to connect to secure websockets. The environment variable SSL_CERT_FILE should point to a .pem file that contains those certificates. If you need a certificate file, I recommend you download it from the curl website by clicking here.

If you place the above cacert.pem file in the same directory as cli_wallet.exe, you can easily set the environment variable and start the cli_wallet.

set SSL_CERT_FILE=./cacert.pem
cli_wallet.exe -s wss://testnet.dex.trading

Below are some URLs to BitShares Testnet nodes:

  • wss://testnet.dex.trading
  • wss://node.testnet.bitshares.eu
  • wss://testnet.nodes.bitshares.ws