IoT – The marriage of software and hardware

I have always enjoyed electronics as a spectator. It continues to fascinate me. But the maturing trend of IoT captures my imagination because it blends electronics and software.

My career has mainly revolved around the major platforms. Windows, MacOS, Linux. I wrote large applications that ran on large servers or powerful desktops. Performance was a concern, but the cost of development often demanded to worry about hardware constraints only when something didn’t work.

Smartphones and other mobile devices drove developers to once again consider the hardware. Sure, there are development tools that hide some of the complexities of particular hardware. But when you get down to a board that has few microprocessors, limited RAM, little storage, and must run on a watch battery, the idea of adding a tool to hide complexities often means the tool is too big to fit on the device, let alone your code.

Robots have been used as a STEM tool in schools for a while now. There are people in the workplace now that have a much more solid base in electronics than I do. Imagine kids, my school had a computer in the library that had to be shared among the entire student body. Of course, I was one of the few that knew how to turn it on. But I digress..

The idea that the latest generation (and even a few before this one) are being exposed to IoT makes me happy. Learning such things while your mind is open to explore (and you have the time to do it) means more capable people with imaginations beyond the basics.

But that doesn’t mean that people from my generation can’t contribute. Just as the generation before us, we can certainly understand that we are where we are because of the shoulders we stood on. The question I ask myself is “How can I help?”

I’m no writer, and I’m no teacher. But I do have a good amount of experience, much of which only exists as anecdotes in my head. So it is time I do more thinking, learning, and writing.

My hope is that by writing more, I will explore more. I am looking toward IoT to expand my electronics knowledge, increase my coding knowledge, and learn how to best explain concepts.

These are lofty goals, but I’m hoping for the best.

Side project: Marine Weather Radar

I am researching open-source projects for marine weather radar.

The weather radar hardware manufacturers are proprietary. I understand why, and I’m not against it. But I would like to have my data in one place if I can. To do that, I need integration. That will affect my choice in purchasing hardware.

To research:

Hardware manufacturers of marine radar for pleasure craft. Do they provide an API? What equipment is necessary?

Open-source projects used for integration of systems in the marine environment.

What I’ve found:

OpenCPN seems to be the open-source software of choice for chart plotter navigation. It seems they have AIS (traffic) integrated among others. I have also seen integration with older radomes as part of the radarpi project.

The industry standard for networking devices together is NMEA2000, which is a protocol far too slow for radar data, but seems to work well for things like wind speed, gps position, depth, etc.

Signal K is an open source project for integration of NMEA2000 stuff into the PC. From what I’ve seen so far, they’ve put a lot of work into it.

The Ultimate:

A PC connecting to the boat’s WiFi network that connects to a server (perhaps running on a Raspberry Pi) that talks to boat systems.

Related links:

GnuRadar: Need to explore

Cruiser’s Forum: Post about someone wanting to do the same, but dated.

SI-TEX MDS 8R was (is?) a Radome that plugged into a PC’s ethernet port. Interesting, not sure if it continues to be in production.

After more research, it seems radar is going the ethernet route. More of them are going headless, and I am hopeful that standardization is coming soon.

Update 16-September-2020

radar_pi has done a lot of work on this already. It seems the tight grip that manufacturers have on their hardware limits (but does not eliminate) the usefulness of writing such code.

Manufacturers do sell the antenna separately from their head units. Whether a setup replacing their head unit with a generic one would cause things like voided warranties have yet to be seen.

To truly reverse engineer, build, and test such software would require the actual hardware. This post talks about someone working on such a thing with Furuno hardware.

An Algorithmic Trading Framework

I’m dreaming here. But if I were to build the ultimate framework for algorithmic trading, what would it look like?

Purpose

One purpose would be to concentrate more on the algorithm, and less on the mechanics that all strategies need. But this is tricky. All strategies need something different, with different parameters.

Another purpose would be to have a consistent way to write complete algorithms. Once the framework is learned, adding new and updating old ones become easier.

Hurdles

There are a myriad of options for implementing trading algorithms. A simple idea can quickly turn into a long list of questions.

Risk measures must be looked at. How is the optimum trade size calculated? Are other instruments involved in this calculation (i.e. Would this trade create a risk in a particular index or industry that is above an allowed threshold?).

We must consider order entry. Will it be a market order? Do we attempt to make the spread? Which ECN will be used? What do we do if the order is not immediately filled? What if the order is partially filled?

Then there is order management. At what point is the trade closed at a loss? Is there a strategy for adjusting risk if the trade moves for/against the entry price? How are profits taken?

There is also brokerage and data feed questions. Are these decisions already made? Is there the possibility they will be changed in the future?

With such questions answered (or at least partially answered), we begin to look at frameworks that can help build the infrastructure.

If we’re sticking to a particular broker or software package, the framework decision becomes easy. If we’re talking FOREX, and the broker mainly works with Metatrader, there would need to be a strong reason to choose another platform.

A trading business must also look at in-house experience that is available. A hedge fund that has a staff of Python developers may not want to work with a C++ framework.

Conclusion

A “one size fits all” trading platform will never be created. A platform that works well for a particular situation is often available. I would like to build a platform that is somewhere in the middle of those two situations. I would like to hide the complexities of portfolio management, broker connectivity and data feed connectivity by providing a (somewhat) generic interface to these items.

Strategies that connect through an API to these resources would be somewhat more portable between brokerages, data providers, and changes to portfolio management rules. This would also hopefully allow for backtesting without rewriting.

A while back, I started down the road of building such a framework. A rough implementation is at GitHub. And by rough I mean pre-pre-alpha. There are plenty of areas that need work, or even redone. But it is a start.

Data Storage Formatting

I have been battling an internal war. The question: In which format should data be stored in? Twenty years ago, the question was fairly simple. Much was moving from proprietary storage systems to relational database management systems. That was a great move, or so I thought at the time. Everything was sorted. Simple queries were simple. Hard queries were hard. Queries that needed to be performant could be tuned.

But data came in different formats, and had to be parsed. What could easily be parsed was placed in columns. The big stuff was placed in blobs. Specialized parsers could then be built to look at the blobs if they wanted to.

There were plenty of issues to resolve. How do you keep multiple databases in sync? How do you handle changes in the incoming metadata? What do you do with old, rarely used, but perhaps important data? The questions kept coming.

Those questions and many more have answers that are not easy. And like most things in life, the answer is often “it depends.”

The current situation is the need to pour over mountains of different data, looking for specific things that happened over a specific time.

Time-Series Market Data

Sample frequency is an issue here. The typical OHLC data that comes from the financial markets is in a recognizable format, easily parsed and placed in rows and columns, indexed, and with management systems like kdb+ you can capture data at a fast sample rate, or even down to the tick, but summaries are quickly calculated.

Report Filings

Is a typical RDBMS a good fit for financial reports, such as what comes from EDGAR? Not so much. A “big data” system may be a better fit, or perhaps an API that knows how to parse directly from EDGAR itself.

Market News

Sentiment is very difficult to quantify. The “mood” of the market may be extrapolated after the fact by certain indicators, but how is the oil sector affected when good Tesla numbers come out the same day as a bad US Manufacturing report? How do you build that query?

Summary

There are many more categories and sub-categories. But the basic result is: there are many factors to consider when looking at data. Attempt to pick the correct way, and be prepared to change it if it doesn’t work. Don’t be afraid to use more than one solution at the same time. Avoid complexity, but remember that not all systems are small. Consider micro services. Keep an open mind, but avoid “analysis paralysis”.

Once again, “it depends.”

C++ vs Python in Algorithmic Trading

I am very much a C++ person. I warmed up to the language as it was warring with C for programmers and popularity. Although I used neither for my “day job” ex/cept on rare occasions, I jumped at opportunities to play with them.

After C and C++, I learned many other languages. I took deep dives into Java, C#, JavaScript, etc. But there were plenty that were (some still are) popular that I never had the opportunity to use much. Perl, awk/sed, regular expressions are some that come to mind.

Eventually I switched to using mainly C++, and then nearly 100% for a good while. After escaping the confines of my hard-core enterprise software experiences, I found the language world had twisted again. But this time was a bit different.

In areas of finance, vendors often had the upper hand. When some big name used some tool within their development area, the smaller ones followed suit, if they could afford it. If you were the lucky vendor, you sat back and waited for the orders to roll in.

But as the trading world evolved, the smaller, nimble organizations had unprecedented access to new tools at little to no cost. Metrics like performance and accuracy were comparable. And the young graduates were leaving their academic life with these skills already under their belts.

Language wars are far from over. But the voracity is dying down. Most developers understand that you should pick the right tool for the job. Consider what you are building up front, and make sure to leave room for flexibility.

C++ is still my language of choice. I use it daily. But it has flaws. And sometimes it is simply not the best language for the job. At the moment I am prototyping, and building some algorithms that crunch data from multiple sources in multiple ways. The sources, elements, and methods that I write tomorrow will probably not be used in the final product. But these cycles are necessary to figure out what the final product should do. Should I write that in C++? Not in this case.

Much of what I’m looking at has been looked at by others. Many used Python to grind through it. There are scripts, designs, and visualizations written in Python that help me understand the data that is running through the system. Do I want to rewrite that in C++? No thank you. The final product may be written in C++. But for now, I’m happily tweaking other people’s code for my own purpose.

I must give props to the mighty data scientists and financial wizards that wrote some of these Python libraries. Of course, the language itself permits C libraries to be used within the language. So some of them are simply using Python to access libraries written in other languages. But still, Python has carved out a good market share in the big-data areas of trading systems.

Will Python take over everything in the trading world? The odds are very slim. Entire trading systems have been written in it. Many of them in fact. But there are still good reasons to choose other tools that do a better job at some aspect of the cycle.

I have added Python to my language stack. I will probably never dive deep into the language. But after only a short while using it, I feel comfortable with it.

Should your system be written in Python? Perhaps. I wouldn’t hesitate if it fits. But I also wouldn’t recommend forcing it to fit when it doesn’t belong. Standardize across the board on it? Nope. Pick and choose. Choose wisely. Hybrid models are common, and often best.

That is my $0.02. YMMV

Algorithmic detection of chart patterns

I have been building out some algorithms that rely on detecting “consolidation”. Chartists know what that looks like, but the term by itself is extremely vague to a quant. What to do?

I have often thought about the method used in creating “The encyclopedia of chart patterns” and wondered if such could be used as part of an automated strategy. I read somewhere that the source code for the application used to build the data in that book was written in VB, and was for his personal use. Sad, but I would have probably done the same thing.

Today I began reading “Foundations of Technical Analysis: Computational Algorithms, Statistical Inference, and Empirical Implementation” (2000) by Lo, Mamaysky, and Wang. What a great paper. It details some of the technical challenges of building such algorithms.

What I like of the paper is that it picked one smoothing algorithm, and basically said “we picked it and went with it, right or wrong”. My interpretation of that is “you can take this further, but we wanted to give you enough information to get you started”. Kudos to the authors.

Is it worth the effort to implement? Time will tell. But it certainly helped me get my head around some of the intricacies of smoothing estimators and finding optimal values for them.

Testing, probing and profiling the code

This is part 3 of a series on improving performance. Click here for part 1, and here for part 2.

Now that the basics are done, it is time to start thinking more about testing. Some may say “but what happened to test first?” Well that is a long discussion, but suffice it to say that there must be a balance between testing and productivity.

Those that subscribe to assuring 100% test coverage I applaud your efforts. But unfortunately, your code will still contain bugs. Testing is VERY important. But apply it in a broader context and you will be more productive. I built tests that assure this very naive system works.

So what do we need to test here? Firstly, we must exercise as much of the code as possible. Then we can profile to see where the bottlenecks are. After all, this project is about profiling.

The rules of the game are “throw as many transactions at this thing as rapidly as possible.” But we also must be concerned about the quality of the data. In a real-world matching engine, the majority of transactions happen real close to the best bid and offer. But not everything happens there. So we need to generate a large amount of test data that will be processed in order. The end result should be a somewhat repeatable time. That data should include:

  • Most (but not all) transactions close to the bid and ask
  • A varying amount of quantity and price
  • Price should not fall below zero (an upward bias perhaps?)
  • Orders that partially fill
  • Orders that take liquidity from several existing orders

There is functionality that does not exist. For example, this book does not provide for cancelling orders. It does not handle rounding (yet). So the test data will not include those things.

A very large CSV file was created with randomized data. Different tests will be created that limit the number of records processed. From that test, we can begin profiling to see which areas of code would benefit the most from further examination.

Tools

I will be using valgrind’s callgrind output to measure performance. kcachegrind will be used to help read the callgrind output.

Initial Results

Each time I profile a piece of code, I am somewhat surprised at the results. “No premature optimization” is alive and well. I had several assumptions before profiling. Some were valid, others do not seem to be so.

Firstly, the method “placeOnBook” took a good chunk of time. After a small amount of evaluation, it simply places an entry in the map. This area should be examined. I am thinking either there are copies or constructors that could be optimized or avoided altogether.

Secondly, the CSV Reader was high on the list. This is an example of how your test framework can show up in performance results. It doesn’t matter too much, as such results can be somewhat ignored. Optimization there is a waste of effort. But it does serve as an indication that the matching engine code is fairly robust.

Conclusion of Round 1

The std::map used is where effort should be put in. Keying differently, or replacing std::map with something faster will provide the best improvement. Here is a screenshot of the output, formatted by kcachegrind.

Order matching and container types

This is part two of my order book exercise. To start at the beginning, click here.

The initial commit of my order book uses a std::map as a collection of bids and asks. That works, but it has a problem. The key is the price. More than 1 order with the same price and oops! Bad things happen. In this case, the previous order disappears.

Well, is std::map the best choice? We could immediately say no, but that wouldn’t be any fun, now would it?

Our goal here is profiling, but I want a complete, working example. So something must be down to allow for two orders to be on the book at the same price. In addition, the oldest order should be used first (FIFO). How is that to be accomplished?

One way is create an object to be used as a key. The plan is to create an AssetOrderKey object that works with comparison operators. Let’s see that in action.

If we add an AssetKey object, and then expand it into two objects, we can have a different comparison operator for the two collections. Therefore, a begin() call on either collection will give us the best bid or ask.

To see these changes, take a look at this GitHub commit.

Another way to do that would be to pass a comparison operator to the std::map. I did not do that here.

Matching Engine Requirements

As an academic exercise, I wanted to take on building a matching engine in C++. The purpose here is to iterate through the process of measuring and improving performance.

I imagine the initial requirements as naive, with later iterations including removal of floating point calculations, variable precision, 128-bit integers, “dust” handling (may be more of an implementation question than a performance one).

This engine will be strictly single-threaded and purposeful. It is hoped that the input will be clean and optimized (which could be offloaded) to improve throughput.

Tooling will be sparse, on purpose. The idea here is not a discussion of the intricacies of the tools, but how tweaks to code affects speed.

The Idea

The engine will receive limit orders that specify the asset held, the asset to be bought, and the desired price. It will include a sequential id, externally guaranteed to be unique (such a key could be analyzed later… we’ll see…).

Once received, the order is processed and if not immediately filled, what is left over is placed on the order book.

Simple, right? There are many details yet to be sorted out. So we will get started! I will edit this post with links to my future posts on the subject. Stay tuned!

For the first cut of the order book, see this GitHub commit.

To see what I tackled next, see part two.

Bitshares Asset Terminology

The Bitshares Core code distinguishes between assets in the following manner:

  1. CORE – A base asset. Only one exists on the chain, and is created within the Genesis Block. On the BitShares mainnet, this is BTS. On the BitShares testnet, this is TEST.
  2. User Issued Asset (UIA) – An asset issued by a BitShares account.
  3. BitAsset – An asset that is backed by another. The backing asset is either CORE or an asset that itself is backed by CORE.

Some BitAssets have their parameters controlled by the BitShares Committee. These are distinguished by the ‘bit’ prefix(i.e. bitUSD, bitCNY, bitEUR, bitBTC). The price feeds for these assets come from committee members or witness members.

BitAssets could also be split into two types:

  1. Market Pegged Asset (MPA) – Assets who’s price is based on external price feeds (as opposed to the internal DEX market), and backed by the CORE asset or another asset that itself is backed by CORE.
  2. Prediction Market (PM) – Specialized BitAsset where total debt and total collateral are equal. Once a price feed (which will be between 0 and 1) is published, the market is globally settled.

Note: “Smartcoin” is an industry term with a few definitions. Those that refer to “smartcoins” on the BitShares platform are probably referring to Market Pegged Assets.

Note that assets must have some sort of exchange rate to calculate fees. This rate is called the Core Exchange Rate (CER).