Spreadsheets are code

August 13, 2015

Felienne Hermans has made it her mission to point out that “spreadsheets are code”. She’s most definitely right about that, and a whole host of the other consequences that she draws from that insight, specifically that we should apply the techniques developed by mainstream software engineering to spreadsheets: version control, testing and design guidelines for clean structure, like the FAST standard. Whenever you create a sheet with formulae in it you’re programming. Ignoring that fact is one of the reasons spreadsheet disasters keep happening. I couldn’t agree more with Prof Hermans on that score. But I think we need to go further in the comparison of spreadsheets with code, and point out some major differences.

  1. Conventional code, when deployed to its production runtime environment does not come with an IDE that enables any user to change the implementation! A trader can’t reach inside his Bloomberg or TradeWeb terminal and change its implementation. But Excel allows any user to change any formula in a financial model.
  2. Well structured conventional code separates user interface from business logic. This separation of concerns is often called the MVC pattern. A typical modern web application has an HTML/CSS/JavaScript user interfaces running in browsers talking to a server hosted backend coded in Python on top of Django and an RDBMS. Excel makes it impossible to decouple the user interface from the logic expressed in formulae and VBA. Modern applications present a choice for their architects; does a given piece of code belong in the browser, the server tier or the DB ?
  3. Conventional code enables reuse through components. Each Excel spreadsheet is like an island, and monolithic. How can spreadsheets be composed together to draw input and feed output to each other? Only with manual, error prone operations.
  4. Unit testing: the unit testing philosophy calls for any significant component to have a set of separate test code that proves compliance with pre and post conditions as well as yielding specified results. Also required is the ability to run a set of tests automatically and record the results. All of that is a capability that Excel simply doesn’t have.

To realise points 1 to 4 for spreadsheets we need an alternate run time that can host spreadsheets on a server, and decouple the financial logic expressed in worksheet formulae, VBA & XLLs from the user interface. In the next post I’ll give more detail on how SpreadServe solves all the issues raised above.

In yesterday’s post I promised to give more detail on the Yield Curve Bootstrapping sheet running on the Amazon hosted SpreadServe instance. If you’d like to try running the sheet on your own desktop you can download it from the repository; just click on ycb_quandl_pub.xls. To run the sheet in your own Excel you’ll need to download the QuantLib and SpreadServe addins. ycb_quandl_pub.xls is based on one of QuantLibXL’s example spreadsheets, YieldCurveBootstrapping.xls, which gives a sample QuantLib Excel solution to a common fixed income rates maths problem: bootstrapping a yield curve. If you look at the original sheet you’ll see that all input data is present as simple cell values. To change it you must rekey it. Ideally this would be automated, so that deposit, futures and swap rates could be regularly pulled from a clean data source, and the bootstrapping results recalculated and published. ycb_quandl_pub.xls uses the SpreadServe Addin to pull the depo, futures and swap rates from quandl. Look at the top left block on the Quandl sheet within the ycb_quandl_pub workbook to see the invocations of the s2quandl function that pull the rates into the sheet from quandl.com. Lower down on the same sheet you can see the s2cron invocation that schedules a timer to go off every 5 minutes and trigger a new download of the same data. The same trigger is used as input to QuantLib’s qlPieceWiseYieldCurve function on the Bootstrapping sheet to force a recalculation when freshly downloaded data arrives. All that is great for automating an Excel spreadsheet. With SpreadServe we can take it one step further and get the sheet off the desktop and onto a server. The whole process is then automated, centralised and freed from possible manual disruption on the desktop.

NB QuantLib date calcs mean the results of this sheet are only good on weekdays, Mon-Fri, and not Sat or Sun.

The readthedocs github workflow is so smooth I had to knock together some docs for the SpreadServe Addin. Here they are

SpreadServe resources

August 7, 2015

In preparation for the launch of SpreadServe‘s beta program I’ve added a page of resources to this blog. I’ve just finished moving the documentation on to readthedocs.org. It’s very cool to be able to edit the docs on my laptop, push the changes to github, and have them appear automatically, via webhook, on readthedocs. The source ReStructured Text docs are on the SpreadServe github repository. Also on github is the SpreadServe Addin which extends Excel with background thread quandl queries and cron like scheduled triggers. And there’s a link to the Amazon hosted instance running a yield curve bootstrapping sheet that automatically pulls depo, futures and swap rates from quandl. More on that in another post. Finally, there’s a link to the Google Group for SpreadServe. Please join the group if you’d like to download the SpreadServe beta and kick the tyres.

Python 3 & PyCharm

July 1, 2015

I’ve been coding in Python since 2000, and for a long time my dev env preferences haven’t changed. Like many I used Python 1.5.2 with a basic text editor, often vim, for a long time. Once the 2.x series of Python releases started I held off and stuck with 1.5.2 for a long time. I never used 1.6.x. I can’t remember whether I made the jump to 2.1 or 2.2, but I’ve been using 2.x for a long time now, usually with notepad++ as my editor. Part of the reason is that it takes time for the extensive Python ecosystem to catch up and port all the libraries and frameworks. Anyway, I’ve just finished a contract where I used Python 3.3 and the PyCharm IDE, and it was a breath of fresh air. I’d never consider development in Java or C++ without an IDE, and my preferences are IntelliJ & MS Visual C++ respectively. Previously I’d felt an IDE was unnecessary in Python, mainly because the rapid cycle time is so quick. Unlike C++ the cycle is not edit, compile, link, test. In Python one just edits and tests, which makes the printf style of debugging far more effective. PyCharm turbocharges the debugging process with breakpoints and visual object graph traversal. And during coding it interactively highlights syntax errors and variable references. That’s a big time saver too, since it makes code run at the first attempt without throwing syntax errors. +1 for PyCharm!

So what about the shift from Python 2.x to 3.x ?  For me the important points have been the move to more iterator based coding. The iteritems( )/iterkeys( )/itervalues( ) methods no longer exist as items( )/keys( )/values( ) no longer return lists, they return iterable view objects. Those view objects are not stand in replacements for lists. And I had to get used to using the next( ) operator with generators. And, of course, print is now a function and no longer a statement. But apart from that it was straightforward.

Update 2015-07-01: I’ve just been pinged by an old coding compadre who downloaded PyCharm on my recommendation, and needs a tip on fixing up interpreter paths to pick up libs. I had to read a couple of StackOverflow articles to figure this out too, so I though I’d document it here. I’m using PyCharm Community Edition 4.5.2, and to add libraries to my interpreter search path I go to the File/Settings dialog. In the left had tree control, under the Project: <myproj> node I select Project Interpreter. Then I click on the cog icon in the top right, next to the selected interpreter, and choose the More… option. This throws up another dialog: Project Interpreters. On the right are several icons. The bottom one is a mini tree control that shows a pop up tooltip saying “show paths for the selected interpreter”. Click on that, and finally you get the Interpreter Paths dialog, and you can add your library. Phew!! Could this config be buried any deeper? IntelliJ: sort it out! PyCharm is very, very good, but this is quite a useability flaw….

I’ve been doing a lot of Excel RTD addin coding recently, as I’ve been adding RTD support to SpreadServe. As part of that work I’ve developed two new addins, both of which I’ve posted on github. Of course, both addins work in Excel and SpreadServe. The first, SSAddin, supports quandl.com queries and Unix cron style timer events on background threads. Both these things can be done with VBA of course, and that’s how quandl’s existing Excel addin does it. However, SSAddin gives you the means to achieve automated, scheduled downloads from quandl with no Visual Basic and no manual keystrokes into a GUI. The second addin, kkaddin, is based on Kenny Kerr‘s example C# RTD code. While I was researching RTD I read Kenny’s excellent material on the topic. John Greenan also has some quality content on his blog too. However, I wasn’t able to find a single, simple, download with C# boilerplate code that would build and run; so that’s what kkaddin addresses.

quandl badly formed URL

April 20, 2015

I’ve started working on some new code that pulls data from quandl, and I was getting this error…

 { "error":"Unknown api route."}

I was using the first example from quandl’s own API page


and googling didn’t turn up any answers. Fortunately the quandl folk responded on Twitter, and all’s well. The URL should be…


So I’m recording the issue here for any others that get stuck. Looks like “unknown api route”==”badly formed url”.

Excel industrialisation

April 3, 2015

John Greenan has produced an excellent series of posts on Excel VBA Industrialisation on his blog. It’s a topic dear to me, so I figured I’d better respond. In his posts JG presents a series of VB Extensions based techniques to enable the export of embedded VB from a spreadsheet, so it can be version controlled, as well as techniques for error logging and reporting. The code is out there on github, and it’s a valuable addition to the public domain, especially since there are several commercial offerings addressing this space. For instance, spreadgit, ClusterSeven and Finsbury Solutions. JG kicks off his discussion in part one by observing that VBA is in the doldrums, and that the cool kids are using MEAN, Scala, OCaml or Haskell. Sure, the cool kids are never going to use VBA. But that’s not just because other languages are cooler, it’s because VBA and the latest programming languages are aimed at completely different audiences. Scala, OCaml & Haskell are for developers, and Excel is for non developers, end users, business users. The very reason for Excel’s phenomenal success and ubiquity is because it enables end users to create software solutions. Apparently there are eleven million professional software developers in the world. But even those eleven million can’t meet the world’s demand for software, so end users have to generate their own solutions, and they use Excel to do it. The result is, as JG points out in the comments to part six in his series: “In many cases the requirement for Excel Industrialisation is for a firm with an existing portfolio of ‘000s of spreadsheets that cannot all, in a cost-effective manner, be manually rewritten to conform to a coding standard.”

A version control system is an important part of controlling those portfolios of end user developed spreadsheets. However, it solves only part of the problem. Another major underlying factor that causes so many spreadsheet problems is their manual, desktop operation. Since Excel is a desktop application, Excel spreadsheets must be manually operated by their users. Users have to start up Excel, load the sheet, key in unvalidated data, hit F9, and then copy & paste or email the results out. All of that is error prone. And all of this manual operation is a major factor preventing any organised, systematic testing. All of these problems were writ large with the London Whale. All these problems could be resolved if we could decouple Excel as a development environment from Excel as a runtime. It’s great that end users can develop their own solutions in Excel, but it’s burdensome and error prone for this solutions to be operated manually on desktop PCs. Those solutions should be automated, resilient and scalable, and hosted by a server side rumtime. That, of course, is SpreadServe.

I’ve been heads down working on SpreadServe recently, so haven’t paid so much attention to the etrading topics that I used to blog about so much. Thanks to an update from mdavey, I’ve been catching up on the excellent, thought provoking content that jgreco has been posting on his plans for a new US Treasury trading venue, organised as a limit order book, with buy and sell side trading on an equal footing. I enjoyed the post on internalization and adverse selection. His points about single dealer platforms are well founded too, though my own experience in rates trading is that it’s difficult to get client flow on to SDPs as by their very nature they can’t offer multi dealer RFQs, which are critical for real money clients that must prove best execution for regulatory reasons. Of course, if the inter dealer prices from BrokerTec, eSpeed and EuroMTS were public in the same way as equity prices from major exchanges are public, then more solutions to the best execution problem would be possible. As jgreco rightly points out, transparency is key.

Now I want to raise a few questions prompted by jgreco’s posts, both pure tech, and market microstructure…

  • Java? Really? I wonder if it’s inspired by LMAX’s Java exchange implementation, their custom collections and Disruptor. I would have expected C++, but then I’m an old school C++er.
  • Is that really the workflow ? That must be a tier 2 or 3 bank. All my experience has been at tier 1 orgs where all pricing and RFQ handling is automated. If a trader quotes a price by voice, it’s a price made by the bank’s own pricing engines. Those engines will be coded in C++, driven by Eurex futures or UST on the runs, and showing ticking prices on the trader desktop. Mind you, obfuscation techniques were used to frustrate step 2: copy and paste quote. After you’ve spent a fortune building a rates etrading infrastructure, you don’t want everyone backing out your curve from your Bloomberg pages.
  • Will DirectMatch have decimal pricing, or are you going to perpetuate that antiquated 1/32nd stuff?
  • How will you handle settlement/credit risk? Will each trade result in two, with counterparties facing off with a clearing house?
  • How do you shift liquidity? When liquidity is concentrated at a single venue, it’s difficult to move. The only case I know of is German Govt Futures moving from Liffe to Eurex. I guess UST liquidity is fragmented across D2D and D2C venues, so it’s not concentrated all in one place, improving DirectMatch’s chances of capturing some of the flow.

The old ones are the best ones, and I spent too many hours yesterday butting my head against a classic Windows bug. I was testing an optimisation in the SpreadServeEngine‘s handling of nested invocations of XLL supplied functions. I was using QuantLib‘s YieldCurveBootstrapping spreadsheet, and the QuantLib 1.4.0 XLL addin. The sheet invokes the qlPiecewiseYieldCurve function, and the fourth parameter, RateHelpers, is supplied by an invocation of ohPack. Normally in this scenario, the XLOPER returned by ohPack would be marshalled to the engine’s internal representation, before being marshalled back to an XLOPER for handing into qlPiecewiseYieldCurve. I added a shortcut that made the XLOPER available as well as the internal representation, so the marshalling process could proceed by shortcut if possible. Unwittingly I ended up newing an object in one DLL which was released by another DLL when qlPiecewiseYieldCurve returned and released it’s stack frame. My debug build threw a run time assert on _BLOCK_TYPE_IS_VALID. Cue a bug hunt with me toothcombing through the codebase for a double delete or a buffer overrun. Spreadsheet engines are complex beasts since they’re basically a runtime for a functional programming language, so must maintain an object graph and dispatch imperatively at the nodes to tokenised code and XLL supplied C/C++ functions. There’s a lot of memory pool and stack management to do in all that! So I was sure there was a subtle memory management bug somewhere that had been exposed by my optimisation for the nested XLL functions corner case. While I was looking for the bug I did get some good reading on Windows heap debugging done: this and this are recommended. When debugging it’s easy to keep going deeper and deeper in pursuit of a supposedly subtle bug. Better to take a breath and try and widen one’s focus. It was this article that made me think again about the fact that each DLL has it’s own C run time linked in, and if their memory management implementations differ, there will be problems. So it was in this case. Adding a static allocator function to the DLL responsible for freeing the object ensured that the new & delete were done by the same DLL, and the bootstrapping model started running through smoothly. Of course what I need to do now is revisit my build system’s link model and ensure all DLLs and EXEs are linking the same CRT implementation…