tornado & websockets

April 26, 2014

The core of my POC prototype is a server engine, which doesn’t really make for a great demo. Generally, people grasp concepts quicker if they can see a tangible realization. So I needed a realistic way to show live ticking data getting cranked out by the server. A browser GUI seemed a natural candidate. And being a Pythonista I wanted to do the server coding in Python. Until recently getting live ticking data pushed up to a browser was a big deal, requiring sophisticated server products like the Caplin Liberator, and rich GUI toolkits like Caplin Trader. Fortunately, it’s now possible to hack some demoware in the form of a live, ticking webpage using some really simple jQuery & websockets in the browser, and tornado on the server side. JavaScript and browser GUIs are not my forte, so I won’t comment any further, except to note how much easier it seems than five years ago. On the server side, though, I do have more experience. Back in 2000 I was doing server side web dev in Python using Zope. Zope is a very powerful system, featuring a built in Object DB and an inheritance by instance rather than class mechanism called acquisition. Consequently it has a rather steep learning curve. In recent years Plone has had some traction as a CMS built on top of Zope. In 2001/2 I discovered Twisted Matrix, a general networking toolkit you can use to build any IP based networking functionality. Again there’s a steep learning curve, but it’s much lighter than Zope, and is now very mature. I will be using Twisted to build a general socket server capability for my core product: I’ve got C++ and Python APIs, but I’ll need a socket server for Java support. But what I needed for my demo purpose was real time server push to the browser. And tornado proved to be a good choice. Simple, lightweight, lots of worked examples and focused entirely on websockets. It didn’t take long to get ticking data into a webpage. Recommended!

Nodally Xenograte

April 10, 2014

Well, xenograte from nodally sounds pretty cool: loosely coupled software components in the cloud. Brad Cox’s dream of snap together building blocks, finally realised. Yahoo Pipes, anyone? I’ve even hacked around similarly motivated code myself, but never got so far of course. The problem with these new paradigms is that they ask you to throw away all your old software assets so you can rebuild them again in the new framework. A bit like media companies asking us to buy the same content over and over on different formats: LPs, tapes, CDs, audio DVDs, downloads, pono, VHS, DVD, bluray….  Why can’t someone find a way to breath fresh life into existing assets without reengineering them. Why not, indeed?

POC M1 running

February 16, 2014

I’ve been building a proof of concept for a new product since last August. I’ve just got the first milestone running, which is a big, big step forward. When I’ve got M2 done it will be time to come out of stealth mode, get the message out, do some demos, and start fund raising so I can put a team together…


November 9, 2013

I’ve long been a fan of Mark Russinovich’s sysinternals utilities, especially procmon and procexp. Today I discovered that in procmon, when browsing filtered file system events, you can get a stack trace by right clicking on the event. Wow!  I didn’t know that. Very powerful diagnostic technique.


April 5, 2013

Just fixed one of those infuriating Visual Studio link errors. There’s lots of good stuff on stackoverflow, of course, but this one was a bit more subtle. It’s still probably quite common though, if you’re integrating code with differing build systems. The root of the problem is linking code built with Visual C++’s own STL with code using STLport. There are various tools and options in Visual Studio that will help you find the root of the error.

dumpbin: command line utility to dump all the symbols defined and referenced by a binary. You can use it on .exe, .lib and .obj files.

undname: command line utility to undecorate C++ names. Given a mangled name it will demangle it.

/P: compiler option to write pre-processor output to file as a .i file. Useful for seeing which headers are actually pulled into each unit of compilation.

/VERBOSE: linker option which will tell you which libraries and objects are searched, and which symbols they resolve.


Going off topic here, as this isn’t etrading related, but I’ll blog it as I suspect others might be having problems with Windows 7 and 8 PCs dropping connections to BT HomHub 2 modem routers. If you’re connection only lasts for a few minutes, and then shows as “limited connectivity” it could be because recent Windows does DHCP differently than Homehub. I found an MS KB article that suggests adding a registry flag in HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/Tcpip/Parameters/Interfaces/{GUID}. The flag is called DhcpConnEnableBcastFlagToggle, it should be a DWORD, and be set to 1. There may be several GUIDs under Interfaces, depending on how many Wifi hubs your PC talks to. Look inside each and examine IP addresses to figure out which is the Homehub.

CPU cache size

September 21, 2012

Here’s some code I wrote back in 2009 to figure out the L1 and L2 cache sizes on a Windows host. It works by populating an area of memory with randomized pointers that point back into that same area. The randomization defeats stride based attempts by CPUs to predict the next cache access. The size of the memory region is stepped up in powers of two, and timings taken. The timings should show when the region size exhausts the L1 and L2 caches. This code illustrates why control of memory locality is so important in writing cache friendly code.

#include <iostream>
#include <cmath>
#include <windows.h>

typedef unsigned __int64 ui64;
typedef int* iptr;

const int _1K = 1 << 10;
const int _16M = 1 << 24;

iptr aray[_16M/sizeof(iptr)];

inline ui64 RDTSC( ) {
   __asm {
      XOR eax, eax        ; Input 0 for CPUID, faster than mv
      CPUID            ; CPUID flushes pipelined instructions
      RDTSC            ; Get RDTSC counter in edx:eax matching VC calling conventions

int main( int argc, char** argv)
    // Ensure we're running on only one core
    DWORD proc_affinity_mask = 0;
    DWORD sys_affinity_mask = 0;
    HANDLE proc = GetCurrentProcess( );
    GetProcessAffinityMask( proc, &proc_affinity_mask, &sys_affinity_mask);
    std::cout << "proc_affinity_mask=" << proc_affinity_mask << ", sys_affinity_mask=" << sys_affinity_mask << std::endl;
    if ( proc_affinity_mask > 1) {
        if ( proc_affinity_mask & 2)
            proc_affinity_mask = 2;
            proc_affinity_mask = 1;
        SetProcessAffinityMask( proc, proc_affinity_mask);

    // avoid interrupts 
    SetPriorityClass( proc, REALTIME_PRIORITY_CLASS);
    // stepping up thru the candidate cache sizes
    for ( int bytes = _1K; bytes <= _16M; bytes *= 2) {

        // populate the array with ptrs back into the array
        int    slots = bytes/sizeof(iptr);
        int    slot = 0;
        iptr    start_addr = reinterpret_cast<iptr>( &aray[0]);
        iptr    end_addr = start_addr + slots;
        iptr    rand_addr = 0;
        iptr    addr = 0;

        std::cout << "slots=" << std::dec << slots << ", start_addr=" 
            << std::hex << start_addr << ", end_addr=" << end_addr << std::endl;

        // clear memory first so we can spot unpopulated slots below
        for ( addr = start_addr; addr < end_addr; addr++)
            *addr = 0;

        for ( addr = start_addr; addr < end_addr; addr++) {
            // pick a random slot in the array
            slot = int( float( slots) * float( rand( ))/ float( RAND_MAX));
            rand_addr = start_addr + ( slot == slots ? 0 : slot);

            // look for the next empty slot - nb we may need to wrap around
            while ( *rand_addr) {
                if ( rand_addr >= end_addr)
                    rand_addr = start_addr;
            *rand_addr = reinterpret_cast<int>( addr);

        // sanity check
        for ( addr = start_addr; addr < end_addr; addr++) {
            if ( !*addr)
                std::cout << "empty slot at " << std::hex << addr << std::endl;

        // now we're ready to ptr chase thru the array
        int accesses = int( 1e6);
        addr = aray[0];
        ui64 start_time = RDTSC( );
        for ( int i = 0; i < accesses; i++)
            addr = reinterpret_cast<iptr>( *addr);
        ui64 end_time = RDTSC( );

        ui64 cycles = end_time - start_time;
        float rw_cost = float( cycles)/float( accesses);
        std::cout << "size=" << std::dec << bytes << ", cycles=" 
                    << cycles << ", cost=" << rw_cost << std::endl;

    return 0;

C++ Disruptor

September 20, 2012

I’ve collected some handy links on the Disruptor and lock free programming in general here. I’m coding up a Windows specific C++ Disruptor implementation at the moment, using the Interlocked* family of Win32 API functions. I’m using the volatile keyword, in the Microsoft sense to enforce the code ordering intent on the compiler, and the Interlocked functions to address cache coherence when there are multiple writers to, for instance, the RingBuffer cursor.

Penny jumping on EBS

September 20, 2012

EBS is an FX ECN owned and operated by ICAP, a cash FX equivalent to Bloomberg or TradeWeb. It’s been in decline for some time as flow shifts to single dealer platforms. Here’s an interesting NY Times article on how introducing an extra decimal place in their prices resulted in a fall in trading volumes. It’s an interesting illustration of how market microstructure choices can affect participant behaviour. A key phrase in the article is: “But that fifth decimal attracted super-fast computer traders, often disrupting the flow of liquidity on the EBS platform”

So how did the high speed traders disrupt the flow of liquidity ? Note that EBS’s cash FX trading is organised as a limit order book.  Introducing an extra decimal creates 10 times as many price levels in the order book, which in turn makes penny jumping much easier for quick automated trading strategies. Which is obviously going to piss off the other market participants, who end up paying the spread earned by the penny jumpers. Larry Harris has an excellent discussion of penny jumping it in his seminal “Trading and Exchanges”, but the Wikipedia link has an adequate explanation.

No doubt considerations like the ones above motivated the rates dealers making govt bond markets on MTS when they decided not to admit hedge funds.

Progress ?

June 6, 2012

prog21 blogs that progress in software tech is slower than one might think. I suggest it’s slower than even prog21 thinks, possibly negative even. Xerox had the Alto running in 1973, and it had a GUI & mouse, TCP/IP networking and a Smalltalk-72 programming environment. So how much fundamental progress has there been since 1973 ?