SpreadServe AMI part II

January 17, 2017

The core component in a SpreadServe deployment is the SpreadServeEngine, a headless C++ server binary that implements the Excel compatible calculation engine. The engine discovers its hostname through the win32 API using GetComputerNameExA( ComputerNameDnsFullyQualified, …). On AWS this was giving me hostnames like WIN-THU4IQNRN6F, when what I wanted was the fully qualified domain name, like ec2-54-186-184-85.us-west-2.compute.amazonaws.com. Harry Johnston helpfully advised on StackExchange that, since the host is not joined to a domain, GetComputerNameExA will only return the FQDN if I explicitly set it via Control Panel. Naturally I want to avoid manual fixes on a SpreadServe AMI so I settled on using Amazon’s EC2 instance metadata. The FQDN hostname can be discovered with an HTTP GET on this URL from any EC2 host: I built a small helper server process to query instance metadata using Tornado’s async HTTPClient and write it to the localFS, where SpreadServeEngine can read it. Result: any new SpreadServe AMI will automatically discover its public DNS.

The forthcoming cloud version of SpreadServe uses a Tornado based server to persist a breakdown of all formulae used in a spreadsheet loaded by SpreadServe. For complex sheets I found that the insertion of many formulae in the formula table could be timeconsuming. In one test scenarion a multi-formula insert took 5 minutes. So I checked out the RethinkDB’s troubleshooting page where there are some useful performance tips. Batch insertions with the recommended batch size of 200 brought the insert time down from 5 mins to 21 secs. Further improvements came from using soft durability and noreply, bringing the insert time down to ~3.5 secs. However, I found that my Tornado server couldn’t respond to incoming HTTP GETs while the insert coroutine was looping on the insert batches. I figured that noreply meant that the yield in the loop resumed immediately, without waiting for the reply IO from the DB. Taking out noreply allowed the single threaded server to handle HTTP GETs in the middle of an insert. If improved performance is necessary in future, splitting the Tornado server into two processes may be the way to go, but for current test scenarios performance is acceptable.

Recently I’ve been rediscovering the fact that threading is hard. I’ve been extending the SpreadServe Addin to support Tiingo‘s IEX market data feed. Real live ticking market data is usually only found inside investment banks, brokers and big hedge funds as it takes a lot of cash and infrastructure to connect to exchanges directly or to subscribe via Reuters. Even newer internet contenders like xignite are very expensive too. Tiingo’s IEX feed provides live ticking equity top of book data at an unprecedented price point. That is an exciting new development that I want to support in SSAddin. Coding it up has renewed my appreciation of how tricky multithreaded code can be. The SSAddin is implemented in C# packaged as an XLL using ExcelDNA. As with any Excel XLL, the worksheet functions it defines are executed on the main Excel thread. If they are long running, then they’ll block the GUI. So the worksheet functions pass off their work to a background thread. This means that SSAddin can do quandl and tiingo historical data queries without blocking the main Excel thread. Query results are cached, and there’s a set of worksheet functions to pull results out of the cache. So far so good. However, adding subscriptions to Tiingo’s IEX market data adds more complexity. In .net callbacks for web socket events are dispatched on pool threads. Ticking data is pushed back into Excel via RTD. So lots of lock statements are necessary to coordinate access to the queue for passing work from the Excel thread to the background thread, and for coordinating access to subscription management data structures and the RTDServer between the background thread and the pool threads that dispatch the socket callbacks. All good fun which has prompted a few thoughts. Firstly, threading is hard! Secondly, I must get round to learning Rust and understanding the borrow checker. Thirdly, thanks heavens for lock reentrancy in .net!

Windows Containers

October 13, 2015

Recently Microsoft has added support for containers to Windows Server. It’s available on the Azure cloud on VMs running Windows Server 2016 Tech Preview 3. I’ve been playing with it and I’ve got SpreadServe running inside a container. There’s much more detail here. But to summarise I found three workarounds were necessary…

  • A two step process to build images as Windows container doesn’t like SpreadServe’s NSIS installer
  • Web server inside the container should be on port 80 only internally
  • A one line launch script that sets up environment variables is necessary

Recently I’ve been using the excellent Very Sleepy profiler to performance tune the SpreadServeEngine’s loader, compiler and interpreter. One of our beta users had helpfully supplied a very large spreadsheet which was causing very long load and calc cycles. Back in the 90s you had to spend serious money on licenses for Pure Software‘s Purify and Quantify tools for this kind of work. Now tools like Dr Memory and Very Sleepy are free and OSS. It’s a while since I did a real, systematic performance profiling and tuning exercise, and I was soon reminded of how quickly preconceptions about which parts of the code might be CPU hogs can be shattered. It wasn’t long before I was soon nose to nose with one of the eternal truths of C++ development, or indeed development in any language: malloc & free are expensive. That’s why the LMAX team coded their own Java Collections. Printf isn’t cheap either. The answer was to introduce memory pooling for many of the most heavily used compiler and interpreter classes, and to set up config switches for the interpreter tracing. Interpreter tracing needs to be available in release builds as it a tremendously useful way of looking inside the execution of your spreadsheet. The result was a thirty fold improvement in load time, and much snappier calc cycles on very large sheets.

Upgrading the SpreadServe GUI

September 19, 2015

Recently I’ve been updating SpreadServe‘s JavaScript web GUI. The edit facility presented on the livesheet page was quite crude. It used a bootstrap-editable to pop up an edit field in a message box. Not really a seamless user experience, so I’ve upgraded to a JavaScript grid that supports in place editing. The end result is a user experience closer to Excel itself. I chose to use webismymind’s editablegrid since it can attach to an existing HTML table. In the case of SpreadServe the HTML table is generated in C++ code inside the SpreadServeEngine whenever a running spreadsheet is published to the RealTimeWebServer. I changed the C++ code to add attributes to the table and tr elements that editablegrid uses to latch on to. I made a couple of small changes to the editablegrid JavaScript so that the isEditable and modelChanged callbacks passed through more information. My isEditable implementation needed to get hold of the DOM element as well as the row column address so it could check for an ssedit attribute. And modelChanged needed the element ID so it could POST the changed data back to the RealTimeWebServer with a unique ID. The RealTimeWebServer sends the updated value back in to the SpreadServeEngine. I’ve forked editablegrid on github here with my changes. While I was doing this JS dev I found two resources very helpful: Oscar Rotero’s jQuery cheat sheet, and the Chrome debugger. I was a Firefox advocate for a long time, as Firebug was the first JavaScript debugger I used. The Chrome debugger is just faster, snappier, breakpointing is easier, and examining the DOM and variables seems better too.

Blockspring symphony

September 10, 2015

Blockspring is a cool startup that enables users to invoke web service APIs from Excel. There are ready made integrations with Amazon, Bing, Dropbox, Facebook, Github, Google, linkedin, Quandl, Slack, Twitter and many more. The beauty of Blockspring is that it normalises all of these diverse web APIs so they can be invoked from a single Excel worksheet function: =BLOCKSPRING(…) As we all know, the great strength of Excel is that it enables users who are not developers to build all kinds of solutions independently. Blockspring has dramatically extended the range of what you can do in Excel. To make a service available via =BLOCKSPRING(…) you code a ‘Block’ in Python, Ruby, JavaScript, PHP or R. The blocks are cloud hosted on Amazon servers. All well and good, but this means that you still need to be a developer to create a block. What if Excel users could implement blocks in Excel itself? To do that you’d need a cloud hosted server that can execute a spreadsheet automatically. And, of course, that’s exactly what SpreadServe does! All that’s necessary is some code for a block that can talk to a SpreadServe instance. Here’s the source on github, and here it is deployed on Blockspring. Using this SpreadServe block, a client spreadsheet can use any SpreadServe hosted spreadsheet as a web service. Here’s an example. InvestExcel‘s BlackScholes spreadsheet can calculate Call and Put prices for an option, given the usual inputs: stock and strike prices, volatility, risk free rate and tenor. Here’s that same spreadsheet deployed on an Amazon hosted SpreadServe instance, which makes it’s calculation functionality available to Blockspring client spreadsheets. This spreadsheet calculates the Put and Call prices of a series of options at different strike prices. Make sure you have Formulas/Calculation Options set to automatic when you run it. There are nine different strike prices in the series, so there are nine round trips to the back end spreadsheet, one for each call & put calculation. Try changing one of the inputs; stock or strike price. It will cause all the put and call prices to recalculate. If you point your browser at the SpreadServe host’s RealTimeWebServer while the recalculation is happening you’ll see all the inputs get pumped through, and all the outputs calculate. So there you have it: Blockspring and SpreadServe – two great tastes that taste great together!

In yesterday’s post I promised to give more detail on the Yield Curve Bootstrapping sheet running on the Amazon hosted SpreadServe instance. If you’d like to try running the sheet on your own desktop you can download it from the repository; just click on ycb_quandl_pub.xls. To run the sheet in your own Excel you’ll need to download the QuantLib and SpreadServe addins. ycb_quandl_pub.xls is based on one of QuantLibXL’s example spreadsheets, YieldCurveBootstrapping.xls, which gives a sample QuantLib Excel solution to a common fixed income rates maths problem: bootstrapping a yield curve. If you look at the original sheet you’ll see that all input data is present as simple cell values. To change it you must rekey it. Ideally this would be automated, so that deposit, futures and swap rates could be regularly pulled from a clean data source, and the bootstrapping results recalculated and published. ycb_quandl_pub.xls uses the SpreadServe Addin to pull the depo, futures and swap rates from quandl. Look at the top left block on the Quandl sheet within the ycb_quandl_pub workbook to see the invocations of the s2quandl function that pull the rates into the sheet from quandl.com. Lower down on the same sheet you can see the s2cron invocation that schedules a timer to go off every 5 minutes and trigger a new download of the same data. The same trigger is used as input to QuantLib’s qlPieceWiseYieldCurve function on the Bootstrapping sheet to force a recalculation when freshly downloaded data arrives. All that is great for automating an Excel spreadsheet. With SpreadServe we can take it one step further and get the sheet off the desktop and onto a server. The whole process is then automated, centralised and freed from possible manual disruption on the desktop.

NB QuantLib date calcs mean the results of this sheet are only good on weekdays, Mon-Fri, and not Sat or Sun.

The readthedocs github workflow is so smooth I had to knock together some docs for the SpreadServe Addin. Here they are

Going off topic here, as this isn’t etrading related, but I’ll blog it as I suspect others might be having problems with Windows 7 and 8 PCs dropping connections to BT HomHub 2 modem routers. If you’re connection only lasts for a few minutes, and then shows as “limited connectivity” it could be because recent Windows does DHCP differently than Homehub. I found an MS KB article that suggests adding a registry flag in HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/Tcpip/Parameters/Interfaces/{GUID}. The flag is called DhcpConnEnableBcastFlagToggle, it should be a DWORD, and be set to 1. There may be several GUIDs under Interfaces, depending on how many Wifi hubs your PC talks to. Look inside each and examine IP addresses to figure out which is the Homehub.