An Algorithmic Trading Framework

I’m dreaming here. But if I were to build the ultimate framework for algorithmic trading, what would it look like?

Purpose

One purpose would be to concentrate more on the algorithm, and less on the mechanics that all strategies need. But this is tricky. All strategies need something different, with different parameters.

Another purpose would be to have a consistent way to write complete algorithms. Once the framework is learned, adding new and updating old ones become easier.

Hurdles

There are a myriad of options for implementing trading algorithms. A simple idea can quickly turn into a long list of questions.

Risk measures must be looked at. How is the optimum trade size calculated? Are other instruments involved in this calculation (i.e. Would this trade create a risk in a particular index or industry that is above an allowed threshold?).

We must consider order entry. Will it be a market order? Do we attempt to make the spread? Which ECN will be used? What do we do if the order is not immediately filled? What if the order is partially filled?

Then there is order management. At what point is the trade closed at a loss? Is there a strategy for adjusting risk if the trade moves for/against the entry price? How are profits taken?

There is also brokerage and data feed questions. Are these decisions already made? Is there the possibility they will be changed in the future?

With such questions answered (or at least partially answered), we begin to look at frameworks that can help build the infrastructure.

If we’re sticking to a particular broker or software package, the framework decision becomes easy. If we’re talking FOREX, and the broker mainly works with Metatrader, there would need to be a strong reason to choose another platform.

A trading business must also look at in-house experience that is available. A hedge fund that has a staff of Python developers may not want to work with a C++ framework.

Conclusion

A “one size fits all” trading platform will never be created. A platform that works well for a particular situation is often available. I would like to build a platform that is somewhere in the middle of those two situations. I would like to hide the complexities of portfolio management, broker connectivity and data feed connectivity by providing a (somewhat) generic interface to these items.

Strategies that connect through an API to these resources would be somewhat more portable between brokerages, data providers, and changes to portfolio management rules. This would also hopefully allow for backtesting without rewriting.

A while back, I started down the road of building such a framework. A rough implementation is at GitHub. And by rough I mean pre-pre-alpha. There are plenty of areas that need work, or even redone. But it is a start.

Data Storage Formatting

I have been battling an internal war. The question: In which format should data be stored in? Twenty years ago, the question was fairly simple. Much was moving from proprietary storage systems to relational database management systems. That was a great move, or so I thought at the time. Everything was sorted. Simple queries were simple. Hard queries were hard. Queries that needed to be performant could be tuned.

But data came in different formats, and had to be parsed. What could easily be parsed was placed in columns. The big stuff was placed in blobs. Specialized parsers could then be built to look at the blobs if they wanted to.

There were plenty of issues to resolve. How do you keep multiple databases in sync? How do you handle changes in the incoming metadata? What do you do with old, rarely used, but perhaps important data? The questions kept coming.

Those questions and many more have answers that are not easy. And like most things in life, the answer is often “it depends.”

The current situation is the need to pour over mountains of different data, looking for specific things that happened over a specific time.

Time-Series Market Data

Sample frequency is an issue here. The typical OHLC data that comes from the financial markets is in a recognizable format, easily parsed and placed in rows and columns, indexed, and with management systems like kdb+ you can capture data at a fast sample rate, or even down to the tick, but summaries are quickly calculated.

Report Filings

Is a typical RDBMS a good fit for financial reports, such as what comes from EDGAR? Not so much. A “big data” system may be a better fit, or perhaps an API that knows how to parse directly from EDGAR itself.

Market News

Sentiment is very difficult to quantify. The “mood” of the market may be extrapolated after the fact by certain indicators, but how is the oil sector affected when good Tesla numbers come out the same day as a bad US Manufacturing report? How do you build that query?

Summary

There are many more categories and sub-categories. But the basic result is: there are many factors to consider when looking at data. Attempt to pick the correct way, and be prepared to change it if it doesn’t work. Don’t be afraid to use more than one solution at the same time. Avoid complexity, but remember that not all systems are small. Consider micro services. Keep an open mind, but avoid “analysis paralysis”.

Once again, “it depends.”

C++ vs Python in Algorithmic Trading

I am very much a C++ person. I warmed up to the language as it was warring with C for programmers and popularity. Although I used neither for my “day job” ex/cept on rare occasions, I jumped at opportunities to play with them.

After C and C++, I learned many other languages. I took deep dives into Java, C#, JavaScript, etc. But there were plenty that were (some still are) popular that I never had the opportunity to use much. Perl, awk/sed, regular expressions are some that come to mind.

Eventually I switched to using mainly C++, and then nearly 100% for a good while. After escaping the confines of my hard-core enterprise software experiences, I found the language world had twisted again. But this time was a bit different.

In areas of finance, vendors often had the upper hand. When some big name used some tool within their development area, the smaller ones followed suit, if they could afford it. If you were the lucky vendor, you sat back and waited for the orders to roll in.

But as the trading world evolved, the smaller, nimble organizations had unprecedented access to new tools at little to no cost. Metrics like performance and accuracy were comparable. And the young graduates were leaving their academic life with these skills already under their belts.

Language wars are far from over. But the voracity is dying down. Most developers understand that you should pick the right tool for the job. Consider what you are building up front, and make sure to leave room for flexibility.

C++ is still my language of choice. I use it daily. But it has flaws. And sometimes it is simply not the best language for the job. At the moment I am prototyping, and building some algorithms that crunch data from multiple sources in multiple ways. The sources, elements, and methods that I write tomorrow will probably not be used in the final product. But these cycles are necessary to figure out what the final product should do. Should I write that in C++? Not in this case.

Much of what I’m looking at has been looked at by others. Many used Python to grind through it. There are scripts, designs, and visualizations written in Python that help me understand the data that is running through the system. Do I want to rewrite that in C++? No thank you. The final product may be written in C++. But for now, I’m happily tweaking other people’s code for my own purpose.

I must give props to the mighty data scientists and financial wizards that wrote some of these Python libraries. Of course, the language itself permits C libraries to be used within the language. So some of them are simply using Python to access libraries written in other languages. But still, Python has carved out a good market share in the big-data areas of trading systems.

Will Python take over everything in the trading world? The odds are very slim. Entire trading systems have been written in it. Many of them in fact. But there are still good reasons to choose other tools that do a better job at some aspect of the cycle.

I have added Python to my language stack. I will probably never dive deep into the language. But after only a short while using it, I feel comfortable with it.

Should your system be written in Python? Perhaps. I wouldn’t hesitate if it fits. But I also wouldn’t recommend forcing it to fit when it doesn’t belong. Standardize across the board on it? Nope. Pick and choose. Choose wisely. Hybrid models are common, and often best.

That is my $0.02. YMMV

Algorithmic detection of chart patterns

I have been building out some algorithms that rely on detecting “consolidation”. Chartists know what that looks like, but the term by itself is extremely vague to a quant. What to do?

I have often thought about the method used in creating “The encyclopedia of chart patterns” and wondered if such could be used as part of an automated strategy. I read somewhere that the source code for the application used to build the data in that book was written in VB, and was for his personal use. Sad, but I would have probably done the same thing.

Today I began reading “Foundations of Technical Analysis: Computational Algorithms, Statistical Inference, and Empirical Implementation” (2000) by Lo, Mamaysky, and Wang. What a great paper. It details some of the technical challenges of building such algorithms.

What I like of the paper is that it picked one smoothing algorithm, and basically said “we picked it and went with it, right or wrong”. My interpretation of that is “you can take this further, but we wanted to give you enough information to get you started”. Kudos to the authors.

Is it worth the effort to implement? Time will tell. But it certainly helped me get my head around some of the intricacies of smoothing estimators and finding optimal values for them.

Testing, probing and profiling the code

This is part 3 of a series on improving performance. Click here for part 1, and here for part 2.

Now that the basics are done, it is time to start thinking more about testing. Some may say “but what happened to test first?” Well that is a long discussion, but suffice it to say that there must be a balance between testing and productivity.

Those that subscribe to assuring 100% test coverage I applaud your efforts. But unfortunately, your code will still contain bugs. Testing is VERY important. But apply it in a broader context and you will be more productive. I built tests that assure this very naive system works.

So what do we need to test here? Firstly, we must exercise as much of the code as possible. Then we can profile to see where the bottlenecks are. After all, this project is about profiling.

The rules of the game are “throw as many transactions at this thing as rapidly as possible.” But we also must be concerned about the quality of the data. In a real-world matching engine, the majority of transactions happen real close to the best bid and offer. But not everything happens there. So we need to generate a large amount of test data that will be processed in order. The end result should be a somewhat repeatable time. That data should include:

  • Most (but not all) transactions close to the bid and ask
  • A varying amount of quantity and price
  • Price should not fall below zero (an upward bias perhaps?)
  • Orders that partially fill
  • Orders that take liquidity from several existing orders

There is functionality that does not exist. For example, this book does not provide for cancelling orders. It does not handle rounding (yet). So the test data will not include those things.

A very large CSV file was created with randomized data. Different tests will be created that limit the number of records processed. From that test, we can begin profiling to see which areas of code would benefit the most from further examination.

Tools

I will be using valgrind’s callgrind output to measure performance. kcachegrind will be used to help read the callgrind output.

Initial Results

Each time I profile a piece of code, I am somewhat surprised at the results. “No premature optimization” is alive and well. I had several assumptions before profiling. Some were valid, others do not seem to be so.

Firstly, the method “placeOnBook” took a good chunk of time. After a small amount of evaluation, it simply places an entry in the map. This area should be examined. I am thinking either there are copies or constructors that could be optimized or avoided altogether.

Secondly, the CSV Reader was high on the list. This is an example of how your test framework can show up in performance results. It doesn’t matter too much, as such results can be somewhat ignored. Optimization there is a waste of effort. But it does serve as an indication that the matching engine code is fairly robust.

Conclusion of Round 1

The std::map used is where effort should be put in. Keying differently, or replacing std::map with something faster will provide the best improvement. Here is a screenshot of the output, formatted by kcachegrind.

Order matching and container types

This is part two of my order book exercise. To start at the beginning, click here.

The initial commit of my order book uses a std::map as a collection of bids and asks. That works, but it has a problem. The key is the price. More than 1 order with the same price and oops! Bad things happen. In this case, the previous order disappears.

Well, is std::map the best choice? We could immediately say no, but that wouldn’t be any fun, now would it?

Our goal here is profiling, but I want a complete, working example. So something must be down to allow for two orders to be on the book at the same price. In addition, the oldest order should be used first (FIFO). How is that to be accomplished?

One way is create an object to be used as a key. The plan is to create an AssetOrderKey object that works with comparison operators. Let’s see that in action.

If we add an AssetKey object, and then expand it into two objects, we can have a different comparison operator for the two collections. Therefore, a begin() call on either collection will give us the best bid or ask.

To see these changes, take a look at this GitHub commit.

Another way to do that would be to pass a comparison operator to the std::map. I did not do that here.

Next step: testing and profiling . Click here for part 3.

Matching Engine Requirements

As an academic exercise, I wanted to take on building a matching engine in C++. The purpose here is to iterate through the process of measuring and improving performance.

I imagine the initial requirements as naive, with later iterations including removal of floating point calculations, variable precision, 128-bit integers, “dust” handling (may be more of an implementation question than a performance one).

This engine will be strictly single-threaded and purposeful. It is hoped that the input will be clean and optimized (which could be offloaded) to improve throughput.

Tooling will be sparse, on purpose. The idea here is not a discussion of the intricacies of the tools, but how tweaks to code affects speed.

The Idea

The engine will receive limit orders that specify the asset held, the asset to be bought, and the desired price. It will include a sequential id, externally guaranteed to be unique (such a key could be analyzed later… we’ll see…).

Once received, the order is processed and if not immediately filled, what is left over is placed on the order book.

Simple, right? There are many details yet to be sorted out. So we will get started! I will edit this post with links to my future posts on the subject. Stay tuned!

For the first cut of the order book, see this GitHub commit.

To see what I tackled next, see part two.

Bitshares Asset Terminology

The Bitshares Core code distinguishes between assets in the following manner:

  1. CORE – A base asset. Only one exists on the chain, and is created within the Genesis Block. On the BitShares mainnet, this is BTS. On the BitShares testnet, this is TEST.
  2. User Issued Asset (UIA) – An asset issued by a BitShares account.
  3. BitAsset – An asset that is backed by another. The backing asset is either CORE or an asset that itself is backed by CORE.

Some BitAssets have their parameters controlled by the BitShares Committee. These are distinguished by the ‘bit’ prefix(i.e. bitUSD, bitCNY, bitEUR, bitBTC). The price feeds for these assets come from committee members or witness members.

BitAssets could also be split into two types:

  1. Market Pegged Asset (MPA) – Assets who’s price is based on external price feeds (as opposed to the internal DEX market), and backed by the CORE asset or another asset that itself is backed by CORE.
  2. Prediction Market (PM) – Specialized BitAsset where total debt and total collateral are equal. Once a price feed (which will be between 0 and 1) is published, the market is globally settled.

Note: “Smartcoin” is an industry term with a few definitions. Those that refer to “smartcoins” on the BitShares platform are probably referring to Market Pegged Assets.

Note that assets must have some sort of exchange rate to calculate fees. This rate is called the Core Exchange Rate (CER).

BitShares Margin Terminology

BitShares has many controls around market pegged assets (MPAs) and their ability to be “shorted into existence.” Some of those fields have long names and difficult to sort out abbreviations. I am hoping to prepare this document to help sort it out.

MCR – Maintenance Collateral (or Margin Call) Ratio

How it is used:

For the sake of simplicity, I will not include fees in this discussion.

Scenario 1: Extreme Risk Taker

MYTOKEN is currently valued at 20 BTS. MCR is currently set at 1.75. I have 100 BTS, and wish to create 1 MYTOKEN. I would need to put up at least 35 BTS to create 1 MYTOKEN ( 20BTS * 1.75MCR = 35 ), which I do.

So now I have 1 MYTOKEN, I also have 75 BTS, and 35 BTS tied up as collateral.

If the value of MYTOKEN rises in relation to BTS, my collateral ratio is now below the 1.75 minimum. Let’s say the value of MYTOKEN rises to 25BTS. My collateral ratio is now (collateral / ( debt * current price) =) 1.4, well below the maintenance level I need of 1.75. I will be forced to sell (a.k.a. margin called).

Scenario 2: A Conservative Trader

MYTOKEN is currently valued at 20BTS. MCR is currently set at 1.75. I have 100 BTS, and wish to create 1 MYTOKEN. I would need to put up at least 35 BTS to create 1 MYTOKEN (this is the same as Scenario 1). I put up 50 BTS as collateral to received 1 MYTOKEN.

With such a trade, I can calculate at what price MYTOKEN must rise (or BTS must fall) to in order to be margin called. That formula is (collateral / (debt * MCR). Therefore the price at which I would get margin called is (50 / (1 * 1.75)) = 28.5714BTS

MSSR – Maximum Short Squeeze Ratio

How it is used:

When margin calls happen, orders to purchase the asset I “shorted into existence” could dry up the current liquidity on the “sell” side of the market. If there are no sellers, the price of the asset could skyrocket, thereby leaving me to lose even more money.

MSSR helps with this. It is a safety mechanism for orderly markets. We will run the margin call of Scenario 2 above through the market. For this scenario, the MSSR is set at 1.1. That means that I will have to pay up to 10% above market price in order to lower my exposure. We calculated that the price at which I would get margin called was 28.5714BTS. With an MSSR of 1.1 that means that I will pay up to 31.4281BTS.

The order book currently has an order to sell 0.25 MYTOKEN at 29 BTS, and another order to sell 0.25 MYTOKEN at 32 BTS. That means the system will purchase the 0.25 MYTOKENs at 29BTS. My account will now have 0.75 MYTOKEN, and will have used 7.25BTS of my collateral for the purchase. That leaves my collateral balance at 42.75BTS.

After that purchase, the price of MYTOKEN would need to continue to rise to 32.5714 for my collateral ratio to again be low enough for yet another margin call.

How MCR and MSSR are adjusted

At the moment, the settlement price, MCR, and MSSR are provided by price feeds. The different feeds are used to derive an average that is used for all market participants at that time.

Creating a Coin/Token on the BitShares Blockchain

Do you want your own token? Who doesn’t? Here is how you can create your own token on the BitShares blockchain. This is called a “User Issued Asset” or UIA.

For this tutorial, I will be using the BitShares public testnet. I suggest you do the same. Just ask for some TEST tokens so that you can play around. I will also be using the command line based reference wallet that comes with BitShares Core, the cli_wallet.

Firstly, I must start my wallet and make sure it connects to a testnet node. I can then unlock my wallet, and as long as I have some TEST tokens to pay the fee, I can create my token. Here is the syntax:

create_asset <issuer> <symbol> <precision> <options> <bitasset_options> <send>

We will be replacing each of these <parameters> with a value. Let’s begin.

Issuer

This is normally you. You can use your account name or your account ID. This is who will sign the transaction, and also who will be responsible for issuing tokens.

Symbol

Give your token a name. Tokens with a 3 character symbol are expensive. 4 character names are less so, and 5 or more characters are the cheapest.

Precision

This is how many decimal places you want your token to have. Zero is valid, 8 is the maximum.

Options

This is some JSON text, where you specify many options. An example is below.

{
   "max_supply" : 100000,
   "market_fee_percent" : 30,
   "max_market_fee" : 1000, 
   "issuer_permissions" : 1,
   "core_exchange_rate" : {
       "base": {
         "amount": 21, 
         "asset_id": "1.3.0" 
       },
       "quote": {
         "amount": 76399,       
         "asset_id": "1.3.1"     
       }
   },
   "whitelist_authorities" : [],
   "blacklist_authorities" : [],
   "whitelist_markets" : [],
   "blacklist_markets" : [],
   "description" : "My New Token"
}

BitAsset Options

This is a special type of token, which I will not discuss here. Pass a ‘null’ in place of this parameter.

Send

This is the final parameter. It says that you would like to transmit this to the blockchain. Passing true here will process your request. Passing false will run what you sent through some checks to make sure it is valid, and then return you the results as if you had sent true.

Issuer Permissions

Notice the value of 1 in the field issuer_permissions above. For the desired value of issuer_permissions you will need to do a little math.

Start with 0. If you want to charge a fee when the token is traded on the exchange, add 1. If you want the asset to only be distributable to those in the white list, add 2. If you want the Issuer to be able to take back the token from any account, add 4. If you want the issuer to be the only entity that can transfer the token, add 8. If you do not want to give the token holders the ability to do blind transfers, add 40. Place the total in issuer_permissions. In the example above, we only want to charge a market fee. So the total is 1.

The Final Product

create_asset jmjatlanta4 JMJATLANTA 8 { "max_supply" : 100000, "market_fee_percent" : 30, "max_market_fee" : 1000, "issuer_permissions" : 1, "core_exchange_rate" : { "base": { "amount": 21, "asset_id": "1.3.0" }, "quote": { "amount": 76399, "asset_id": "1.3.1" } }, "whitelist_authorities" : [], "blacklist_authorities" : [], "whitelist_markets" : [], "blacklist_markets" : [], "description" : "My New Token" } null false

The above will create my new token JMJATLANTA that I can then give to whomever has an account on the BitShares blockchain.

Stay Tuned!

But what do all of those other bits of data mean? That will be the subject of my next post about User Issued Assets.