March 6, 2017
The SpreadServe 0.4.2b AMI is now available in the US West Oregon region: just search for SpreadServe in public images. There’s a now YouTube video on launching a SpreadServe AMI.
Two points of interest came up in preparing the AMI: Admin password persistence, and the region bound nature of AMIs. This blog from 2013 suggests that Admin passwords don’t persist in AMIs, but I’ve found they do now. So the Administrator password for a SpreadServe 0.4.2b AMI is SpreadServe042b. If you launch your own SpreadServe instance using the AMI, then I suggest you change the password!
The SpreadServe 0.4.2b AMI is only published in the US West Oregon region. Only AMI owners can copy to another region, so if you want a SpreadServe AMI in another region you’ll need to do a simple workaround: start an image in US West Oregon, stop it and create your own image, then copy to your preferred region.
February 23, 2017
This week I’ve been testing the SpreadServe addin with Tiingo’s IEX market data. I was checking performance on my sscalc0.online AWS host for a group of SpreadServeEngines executing various test and demo spreadsheets, including one that subscribes to IEX tickers for AAPL & SPY, via Tiingo API websockets. That API gives us real time top of book as well as last trade price and size for the cash equity traded on IEX. In my test scenario I was running five engines, two of them idle, three running spreadsheets, one of which was a simple IEX market data subscriber. Using Process Explorer I saw some odd CPU spiking on the idle engines. Zooming in with Process Explorer I could see the busyness was on a thread that should have been idle, sleeping inside a WaitForSingleObject call, waiting for a signal to check its input queue. The event object waited upon was created by some generic code invoking win32’s CreateEvent and also used in another thread. Reading the docs I found that CreateEvent’s fourth param, the event object name, implies that the caller will get a handle to a previously created event object if the names match. And I was using a hardwired name! So my thread was being repeatedly woken by events from another thread. A quick fix to make the names unique produced idling engines with no unnecessary CPU burn. All very instructive, partly because running on AWS makes one very aware of paying by the CPU hour.
January 17, 2017
The core component in a SpreadServe deployment is the SpreadServeEngine, a headless C++ server binary that implements the Excel compatible calculation engine. The engine discovers its hostname through the win32 API using GetComputerNameExA( ComputerNameDnsFullyQualified, …). On AWS this was giving me hostnames like WIN-THU4IQNRN6F, when what I wanted was the fully qualified domain name, like ec2-54-186-184-85.us-west-2.compute.amazonaws.com. Harry Johnston helpfully advised on StackExchange that, since the host is not joined to a domain, GetComputerNameExA will only return the FQDN if I explicitly set it via Control Panel. Naturally I want to avoid manual fixes on a SpreadServe AMI so I settled on using Amazon’s EC2 instance metadata. The FQDN hostname can be discovered with an HTTP GET on this URL from any EC2 host: http://169.254.169.254/latest/meta-data/public-hostname. I built a small helper server process to query instance metadata using Tornado’s async HTTPClient and write it to the localFS, where SpreadServeEngine can read it. Result: any new SpreadServe AMI will automatically discover its public DNS.
May 26, 2016
Recently I’ve been rediscovering the fact that threading is hard. I’ve been extending the SpreadServe Addin to support Tiingo‘s IEX market data feed. Real live ticking market data is usually only found inside investment banks, brokers and big hedge funds as it takes a lot of cash and infrastructure to connect to exchanges directly or to subscribe via Reuters. Even newer internet contenders like xignite are very expensive too. Tiingo’s IEX feed provides live ticking equity top of book data at an unprecedented price point. That is an exciting new development that I want to support in SSAddin. Coding it up has renewed my appreciation of how tricky multithreaded code can be. The SSAddin is implemented in C# packaged as an XLL using ExcelDNA. As with any Excel XLL, the worksheet functions it defines are executed on the main Excel thread. If they are long running, then they’ll block the GUI. So the worksheet functions pass off their work to a background thread. This means that SSAddin can do quandl and tiingo historical data queries without blocking the main Excel thread. Query results are cached, and there’s a set of worksheet functions to pull results out of the cache. So far so good. However, adding subscriptions to Tiingo’s IEX market data adds more complexity. In .net callbacks for web socket events are dispatched on pool threads. Ticking data is pushed back into Excel via RTD. So lots of lock statements are necessary to coordinate access to the queue for passing work from the Excel thread to the background thread, and for coordinating access to subscription management data structures and the RTDServer between the background thread and the pool threads that dispatch the socket callbacks. All good fun which has prompted a few thoughts. Firstly, threading is hard! Secondly, I must get round to learning Rust and understanding the borrow checker. Thirdly, thanks heavens for lock reentrancy in .net!
August 13, 2015
Felienne Hermans has made it her mission to point out that “spreadsheets are code”. She’s most definitely right about that, and a whole host of the other consequences that she draws from that insight, specifically that we should apply the techniques developed by mainstream software engineering to spreadsheets: version control, testing and design guidelines for clean structure, like the FAST standard. Whenever you create a sheet with formulae in it you’re programming. Ignoring that fact is one of the reasons spreadsheet disasters keep happening. I couldn’t agree more with Prof Hermans on that score. But I think we need to go further in the comparison of spreadsheets with code, and point out some major differences.
- Conventional code, when deployed to its production runtime environment does not come with an IDE that enables any user to change the implementation! A trader can’t reach inside his Bloomberg or TradeWeb terminal and change its implementation. But Excel allows any user to change any formula in a financial model.
- Conventional code enables reuse through components. Each Excel spreadsheet is like an island, and monolithic. How can spreadsheets be composed together to draw input and feed output to each other? Only with manual, error prone operations.
- Unit testing: the unit testing philosophy calls for any significant component to have a set of separate test code that proves compliance with pre and post conditions as well as yielding specified results. Also required is the ability to run a set of tests automatically and record the results. All of that is a capability that Excel simply doesn’t have.
To realise points 1 to 4 for spreadsheets we need an alternate run time that can host spreadsheets on a server, and decouple the financial logic expressed in worksheet formulae, VBA & XLLs from the user interface. In the next post I’ll give more detail on how SpreadServe solves all the issues raised above.