(Almost) bare metal

I love writing in C++. The reasons are performance (I am a speed junkie) and learning opportunities. If necessary, I can tune a routine to seek out the fastest way to achieve the result. Each time I do, I learn a lot. I have been learning constantly since I picked up the book on BASIC that I received with my Timex-Sinclair 1500.

But what if my computer is simply not fast enough to perform the routine I need it to? If the task can be broken into smaller, somewhat independent pieces, we can use threading. My machine has 8 CPUs and 16 cores. My latest project included a process that took 20 seconds on 1 thread, and 6 seconds with 16 threads. And it wasn’t that hard to do.

But 6 seconds is not fast enough. And during those 6 seconds my CPUs are pegged at 100%. The routine is tuned to the best of my ability. What now? Well, the process is some number crunching. I have a video card that loves to crunch numbers. What will it take?

I have coded the routine in CUDA as well as SYCL and will have some results soon (Update: See Part II). I do not get to do this stuff often, so I am enjoying re-learning CUDA and this is my first round with SYCL. SYCL looks like a winner here, as I will be able to compare results between some NVIDIA and AMD cards with (hopefully) the same codebase.

But using GPUs are not the only option for such tasks. There are ASICs available for my current number-crunching necessity. And as I walked down the ASIC road I was able to explore another avenue that I had only glossed over in some prior projects: FPGAs.

ASICS are purpose-built microprocessors. They are designed to be fed a certain way, and spit out the answer in a certain way. You can think of it as getting a new gadget that has the sticker “no user serviceable parts inside”. You cannot get that chip to do anything beyond what it was meant to do.

FPGAs are in the middle between microprocessors and ASICs. In fact, you can use an FPGA as a way to prototype an ASIC. But here, we are not talking making it do what we want by manipulating C or C++. We are talking about manipulating transistors and wires. It is digital circuit building without the breadboard and soldering iron (well, at least in some cases).

FPGAs have been around for a while. But until fairly recently they were simply too expensive for anything but the big budget projects. Think government, Wall Street, Google, and Amazon. Today, you can experiment with an FPGA for $40 or less.

As a software developer, I had to twist my brain to begin to reason about how get these things to do what I want. I have done some microprocessor work. I have written software for embedded devices. I know what a transistor is and when to use a NAND. But FPGAs are requiring me to rewire my brain as I rewire that chip.

Where To Begin

I am just getting started. So here is what I have learned so far. The big hardware players in this arena are Xilinx (now owned by AMD), Altera (now owned by Intel) and Lattice. There are other players, but for what I am doing and where I am at in my exploration, I want to examine these suppliers before getting into the nuanced specifics. I am at the “I don’t know what I don’t know” stage.

When digging through the interwebs, there are a number of development boards and chips that keep popping up. So if you want to follow the tutorials, you will need to get your hands on some hardware. My first board will not be the latest and greatest (one I looked at on Mouser was just over USD$17,000.00). It will be modest and with a manufacturer that seems to often be used in the industry I am working with. Here are the 2 I am looking at:

The Digilent Basys 3

This is often used in FPGA tutorials. The development board includes a good amount of toys to play with to build your knowledge. Pushbuttons, LEDs, Ethernet port, etc. for around US$160

The Digilent Arty A7/S7

This also appears in some tutorials. The S7 seems to be the low-cost version of the A7. Both come in two flavors, so 4 different chips for 4 different prices. The A7 includes an Ethernet port, whereas the S7 does not. The A7 with the lower cost chip is around $160

Honorable Mentions

If you are looking to get into FPGA on the cheap, check out the Lattice Icestick. But don’t stop there. There are plenty of others with varying sizes of “maker” communities around them.

Leave a Reply

Your email address will not be published. Required fields are marked *