Nodally Xenograte

April 10, 2014

Well, xenograte from nodally sounds pretty cool: loosely coupled software components in the cloud. Brad Cox’s dream of snap together building blocks, finally realised. Yahoo Pipes, anyone? I’ve even hacked around similarly motivated code myself, but never got so far of course. The problem with these new paradigms is that they ask you to throw away all your old software assets so you can rebuild them again in the new framework. A bit like media companies asking us to buy the same content over and over on different formats: LPs, tapes, CDs, audio DVDs, downloads, pono, VHS, DVD, bluray….  Why can’t someone find a way to breath fresh life into existing assets without reengineering them. Why not, indeed?

POC M1 running

February 16, 2014

I’ve been building a proof of concept for a new product since last August. I’ve just got the first milestone running, which is a big, big step forward. When I’ve got M2 done it will be time to come out of stealth mode, get the message out, do some demos, and start fund raising so I can put a team together…


November 9, 2013

I’ve long been a fan of Mark Russinovich’s sysinternals utilities, especially procmon and procexp. Today I discovered that in procmon, when browsing filtered file system events, you can get a stack trace by right clicking on the event. Wow!  I didn’t know that. Very powerful diagnostic technique.


April 5, 2013

Just fixed one of those infuriating Visual Studio link errors. There’s lots of good stuff on stackoverflow, of course, but this one was a bit more subtle. It’s still probably quite common though, if you’re integrating code with differing build systems. The root of the problem is linking code built with Visual C++’s own STL with code using STLport. There are various tools and options in Visual Studio that will help you find the root of the error.

dumpbin: command line utility to dump all the symbols defined and referenced by a binary. You can use it on .exe, .lib and .obj files.

undname: command line utility to undecorate C++ names. Given a mangled name it will demangle it.

/P: compiler option to write pre-processor output to file as a .i file. Useful for seeing which headers are actually pulled into each unit of compilation.

/VERBOSE: linker option which will tell you which libraries and objects are searched, and which symbols they resolve.


CPU cache size

September 21, 2012

Here’s some code I wrote back in 2009 to figure out the L1 and L2 cache sizes on a Windows host. It works by populating an area of memory with randomized pointers that point back into that same area. The randomization defeats stride based attempts by CPUs to predict the next cache access. The size of the memory region is stepped up in powers of two, and timings taken. The timings should show when the region size exhausts the L1 and L2 caches. This code illustrates why control of memory locality is so important in writing cache friendly code.

#include <iostream>
#include <cmath>
#include <windows.h>

typedef unsigned __int64 ui64;
typedef int* iptr;

const int _1K = 1 << 10;
const int _16M = 1 << 24;

iptr aray[_16M/sizeof(iptr)];

inline ui64 RDTSC( ) {
   __asm {
      XOR eax, eax        ; Input 0 for CPUID, faster than mv
      CPUID            ; CPUID flushes pipelined instructions
      RDTSC            ; Get RDTSC counter in edx:eax matching VC calling conventions

int main( int argc, char** argv)
    // Ensure we're running on only one core
    DWORD proc_affinity_mask = 0;
    DWORD sys_affinity_mask = 0;
    HANDLE proc = GetCurrentProcess( );
    GetProcessAffinityMask( proc, &proc_affinity_mask, &sys_affinity_mask);
    std::cout << "proc_affinity_mask=" << proc_affinity_mask << ", sys_affinity_mask=" << sys_affinity_mask << std::endl;
    if ( proc_affinity_mask > 1) {
        if ( proc_affinity_mask & 2)
            proc_affinity_mask = 2;
            proc_affinity_mask = 1;
        SetProcessAffinityMask( proc, proc_affinity_mask);

    // avoid interrupts 
    SetPriorityClass( proc, REALTIME_PRIORITY_CLASS);
    // stepping up thru the candidate cache sizes
    for ( int bytes = _1K; bytes <= _16M; bytes *= 2) {

        // populate the array with ptrs back into the array
        int    slots = bytes/sizeof(iptr);
        int    slot = 0;
        iptr    start_addr = reinterpret_cast<iptr>( &aray[0]);
        iptr    end_addr = start_addr + slots;
        iptr    rand_addr = 0;
        iptr    addr = 0;

        std::cout << "slots=" << std::dec << slots << ", start_addr=" 
            << std::hex << start_addr << ", end_addr=" << end_addr << std::endl;

        // clear memory first so we can spot unpopulated slots below
        for ( addr = start_addr; addr < end_addr; addr++)
            *addr = 0;

        for ( addr = start_addr; addr < end_addr; addr++) {
            // pick a random slot in the array
            slot = int( float( slots) * float( rand( ))/ float( RAND_MAX));
            rand_addr = start_addr + ( slot == slots ? 0 : slot);

            // look for the next empty slot - nb we may need to wrap around
            while ( *rand_addr) {
                if ( rand_addr >= end_addr)
                    rand_addr = start_addr;
            *rand_addr = reinterpret_cast<int>( addr);

        // sanity check
        for ( addr = start_addr; addr < end_addr; addr++) {
            if ( !*addr)
                std::cout << "empty slot at " << std::hex << addr << std::endl;

        // now we're ready to ptr chase thru the array
        int accesses = int( 1e6);
        addr = aray[0];
        ui64 start_time = RDTSC( );
        for ( int i = 0; i < accesses; i++)
            addr = reinterpret_cast<iptr>( *addr);
        ui64 end_time = RDTSC( );

        ui64 cycles = end_time - start_time;
        float rw_cost = float( cycles)/float( accesses);
        std::cout << "size=" << std::dec << bytes << ", cycles=" 
                    << cycles << ", cost=" << rw_cost << std::endl;

    return 0;

C++ Disruptor

September 20, 2012

I’ve collected some handy links on the Disruptor and lock free programming in general here. I’m coding up a Windows specific C++ Disruptor implementation at the moment, using the Interlocked* family of Win32 API functions. I’m using the volatile keyword, in the Microsoft sense to enforce the code ordering intent on the compiler, and the Interlocked functions to address cache coherence when there are multiple writers to, for instance, the RingBuffer cursor.


May 23, 2012

So it looks like Lodestone is an Open Source project backed by Deutsche Bank, and involving the estimable Martin Thompson of LMAX Disruptor fame. There have been investment bank (IB) backed OSS projects before, notably A+ and OpenAdaptor. And there have been collaborative specification efforts for APIs and message formats, like AMQP, FIX and FpML, which have used OSS like licenses for their intellectual property, and have often adopted reference implementations. However, within the IB vertical none of them have achieved the impact of canonical OSS project like the Linux Kernel, Apache or Mozilla. Why not ?  And why is OSS being touted as a silver bullet for trading systems software again, fifteen years after Eric Raymond’s The Cathedral and The Bazaar first alerted the  mainstream to a different way of doing software ?

My initial answers to the two questions are a little cynical, but when you’ve spent nearly fifteen years working in investment bank technology, cynical has to be your default mindset. As always, let’s follow the money. I suspect a large part of the motivation for open sourcing codebases like A+ and OpenAdaptor is cost driven. As proprietary codebases age inside banks they come to be viewed not as sources of competitive advantage or agility, but as irksome recurring costs preventing investment in other areas. OSS raises the tantalising prospect of getting the much vaunted Open Source “community” to shoulder a lot of the maintenance and testing burden. And at this point in the economic cycle, driving down in house development costs by any means makes sense.

So driving down costs is part of the motivation for OSS. But why didn’t A+ and OpenAdaptor win universal adoption by investment banks ?  Because they don’t provide sufficient incentive to adopt. Take A+, a niche APL based programming language. Why would any bank adopt it if they don’t have a large existing A+ codebase ?  OpenAdaptor is a middleware abstraction layer. It’s raison d’etre is to prevent coupling to underlying commercial vendor ware. Most banks had already built such layers themselves – that’s what IB software architecture teams are for !  Why would they switch from an in house layer that supports all their programming language and OS choices to one that doesn’t, and moreover isn’t controlled by them ?  If there’s no incentive to adopt such codebases, then the originating organisation reaps none of the maintenance and testing benefits that motivated them to open source in the first place. Ergo no cost savings, and a moribund project.

Which leads us to the question of what a bank led OSS project would need to do to incentivise adoption. Let’s bear in mind that the software stack used to build trading applications has  components that range from completely generic, like databases, to highly specific, like quant libraries. IBs won’t open source codebases like quant libraries or algo trading, that stuff really is the special sauce for competitive advantage. And there’s no point them open sourcing very generic stuff like programming languages or middleware abstration layers, where there’s no strong motivator for adoption. But there is a spectrum of code in between that a project like Lodestone should appraise. They need to be codebases where the benefits of cooperation outweigh the competitive edge given up. Those codebases must be vertical specific because the wider OSS world is already addressing truly generic requirements like programming languages and databases, so there’s no significant benefit from open sourcing bank codebases. For examples of IB cooperation we should look to successful consortia efforts. Standards efforts like FpML and FIX meant IBs sharing some of their IP, but that cost is outweighed by the interoperability benefits those standards bring to vendor and proprietary software. Another successful consortium was Tradeweb. The IBs sold their stake to Thomson Reuters, and then realised that they’d lost control of IRS etrading. They launched LiquidityHub in an attempt to shift swap liquidity away from Bloomberg and Tradeweb onto an ECN they controlled. Eventually Tradeweb responded to the LiquidityHub threat by letting the IBs buy back in, and giving them board seats. LiquidityHub was then quietly dropped. IB cooperation had restored dealer control to the swap etrading landscape, and dealers could compete for client inquiries in a trading venue they control.

So we’re looking for codebases that are specific to the IB vertical, that are not special sauce, and where the benefits of cooperation outweigh the cost of any IP give up. My rates etrading background suggests the ECN gateway space as a fruitful one to address. Here there’s an incumbent vendor, ION Trading, with a successful suite of rates etrading software. ION’s suite includes the MarketView gateways that present the same API to all the fixed income ECNs: Bloomberg, Tradeweb, BondVision, MTS, BrokerTec, eSpeed etc. Almost all the leading rates dealers use them. In my experience almost all the leading rates dealers see them as an irksome recurring cost. The time to market and developer headcount savings gained by adopting these gateways has long been discounted, and all the dealers can see are hefty recurring license fees. But the cost of building a set of gateways alone is too high, so nobody does. Why has no third party stepped in and built an OSS competitor to MarketView ?  One reason is the barriers to entry. To build ECN gateways you need access to ECN test systems and APIs. ECNs control that, it’s a cost they want to minimise as they are trading venues not software vendors, so they only give access to IBs.

An IB led OSS consortium like Lodestone could develop open source ECN gateways, sharing the cost of development including ECN test system and API access, and freeing themselves from ION’s license fees. They’d also gain more control: one frequent IB complaint is that new features requested from ION become available to all their competitors a few months later, when they’re rolled into the main codebase. Of course it’s unrealistic to expect anything else from a vendor operating a traditional closed source commercial licensing software model. But this point does highlight another benefit of applying an open source model to the ECN gateway space. Open access to gateway source code means that an individual IB can develop it’s own proprietary extensions, and then choose to give those back to the community, or to keep them secret, and bear the cost of porting and merging going forward. The choice remains with the IB – and that’s a key benefit of free software in the Stallman sense of the word free.

So I believe the Lodestone project should be asking which codebases are IB specific, are not special sauce, and which offer strong cost benefits from cooperation. I’ve  given an example from etrading above. Can you give one from another asset class, or another part of the IB stack, risk management or STP perhaps ?

Magic Ink II

March 22, 2012

I posted earlier on Bret Victor’s Magic Ink. Now I’ve finished reading this quite lengthy paper. In the second half Victor goes into more detail on how information applications – which includes most financial apps – should be implemented. Info apps need to learn from the history of user behaviour: Victor uses the example of his own implementation in the BART scheduler app for this. He points out that decades of research have been done on machine learning, but we still don’t have neatly packaged abstractions around the results of that research that would make it usable for Joe Average app developer. This shouldn’t be the case, as we have nice usuable abstractions for all sorts of other areas of comp sci research: file systems, sort algorithms, GUIs etc.

The other major pillar of info apps in the Victor scheme is context sensitivity. He makes a compelling case for achieving this via a dynamically bound component model that sounds similar to that advocated by Brad Cox 20 years ago. Plus ca change !

Finally, Victor discusses the kind of device that info apps should run on. It reads like he’s describing the iPad. Bear in mind that the iPad launched in 2010, and Victor wrote Magic Ink in 2006. So he was remarkably prescient.

Magic Ink

March 1, 2012

Thanks to reddit I’ve just discovered Bret Victor. I watched the Invention video, and enjoyed the whole theme on tightening the feedback loop between changing code and seeing results. The later part on moral crusading was interesting if not entirely convincing. So I checked out the web site, and am reading Magic Ink. Wow ! This is a full blown vision of doing software differently. Back in the 90s I got really excited by, in turn, Brad Cox’s vision, Patterns, and Open Source. About 10 years ago I discovered dynamically typed languages with Python and Smalltalk. And that’s the last time I had a real rush of excitement about some new approach in software. Sure, I’ve dabbled in functional languages like F#, and played with various OSS projects. But for the most part my attention has been on the trading topics that fascinate me, like electronic limit order books.

So what’s Magic Ink about ?  Victor divides software into three categories: information, manipulation and communication software. He focuses on information software, which is most apps really. And that includes most financial and trading apps. And then he proceeds to argue that there’s too much interactivity, and that interaction is bad. The way forward is context sensitivity combined with history and graphic design. Counterintuitive, and utterly convincing. A joy to read !

I can’t help wondering what the UX crew over at Caplin think of this ?  I haven’t seen them blogging on it. Victor’s views have radical implications for how etrading apps should work. I’d expect Sean Park to be pushing this angle with his portfolio companies too…

Early vs late binding

January 29, 2012

I agree with pretty much everything Jeff has to say on strongly typed, statically bound languages vs weakly typed late binding systems. But I don’t agree that we must decide in favour of the former. There is no one size fits all language, and I believe the right approach is to use a mix of early and late binding. Early binding when we want efficiency, compactness and speed. And late binding when we want expressiveness and rapid time to market. Electronic trading systems are a perfect example of a class of problem where we want all those contradictory virtues in the same system. A combination of languages is the only way to satisfy all those requirements. Brad Cox laid all this out this approach with great prescience in Planning the Software Industrial Revolution in 1990. What we need a development environments that allows us to mix languages at will. Debuggers that cope seamlessly when we examine a stack with different languages in different frames. Consistent threading models across runtimes. And consistent memory management models. Microsoft have come closest to achieving this with the .Net CLR. The Java VM can host multiple languages too, but not with the same quality tooling.

Personally I like the combination of C++ and Python. But there are tensions implicit in building systems with that mix. The first is the threading model. If you’re doing server engineering you mus be GIL aware. A single C++ thread works well with Python. A limited number of C++ threads on dedicated tasks, with just one of them invoking Python works well too. Multiple C++ threads as pooled workers using locking to cooperate in executing identical logic and each invoking Python will not work well.

Another tension is the different memory management models. The Python C runtime organises it’s own memory pool of PyObject* allocations. There are separate sub pools for different types, with different rules for the management of strings, integers and larger objects. Python’s memory pool tends to only grow, unlike a C++ program who’s memory profile we can see fall when the C RT lib hands memory back to the OS.

So if we have multiple languages in the same run time one of the biggest challenges is making the right architectural decisions so that those languages cooperate despite drawing on the same OS provided resources in different ways.