RethinkDB batch insert performance

November 21, 2016

The forthcoming cloud version of SpreadServe uses a Tornado based server to persist a breakdown of all formulae used in a spreadsheet loaded by SpreadServe. For complex sheets I found that the insertion of many formulae in the formula table could be timeconsuming. In one test scenarion a multi-formula insert took 5 minutes. So I checked out the RethinkDB’s troubleshooting page where there are some useful performance tips. Batch insertions with the recommended batch size of 200 brought the insert time down from 5 mins to 21 secs. Further improvements came from using soft durability and noreply, bringing the insert time down to ~3.5 secs. However, I found that my Tornado server couldn’t respond to incoming HTTP GETs while the insert coroutine was looping on the insert batches. I figured that noreply meant that the yield in the loop resumed immediately, without waiting for the reply IO from the DB. Taking out noreply allowed the single threaded server to handle HTTP GETs in the middle of an insert. If improved performance is necessary in future, splitting the Tornado server into two processes may be the way to go, but for current test scenarios performance is acceptable.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s