New thinking on spreadsheets

It’s been a while since I blogged about spreadsheets. This is mainly because nothing very interesting was happening: the spreadsheet governance vendors like Cimcon, Finsbury Solutions and Cluster 7 were doing their thing – and it is a good thing – but I haven’t seen any game changing developments from any of those vendors. However, it turns out that some interesting things have been happening in unexpected places.

In the first case there is Synapse (www.synapseinformation.com) in Birmingham, which is providing cloud-based spreadsheet integration and synchronisation through NanoApps that resolve shared spreadsheet issues. The basic idea is that you share a single version of the data, but you have your own NanoApp which does exactly what you want it to do for your purposes, without affecting the underlying data against which colleagues may be want to construct their own NanoApps.

Secondly, there is Infotron (www.infotron.nl) which is a spin-off from Delft University (see www.spreadsheetlab.org) and the research of Dr Felienne Hermans. This is particularly interesting on a couple of fronts. Dr Hermans’ doctoral thesis was around the visualisation of spreadsheets. This allows you to see how cells are inter-related, how workbooks and sheets are related, and how formulae are derived and related to one another. This visualisation not only allows you to understand how spreadsheets are constructed in a user-friendly fashion but also helps to detect errors, recognise broken links and so forth.

In addition, Dr Hemans has extended her work to incorporate the concept of code smells. While spreadsheet users will probably not be familiar with this concept it is well known within the development world and is an idea that is usually associated with Martin Fowler, even if he did not actually invent the term. The idea of code smells is akin to the saying “no smoke without fire”: if you detect something that doesn’t seem quite right in your code, or your spreadsheet, then that is probably a sign that there is a deeper underlying problem that needs to be resolved. In other words, this is applying software engineering concepts to spreadsheets but in a way that is potentially easy for spreadsheet users to understand and deploy rather than needing to rely on IT.

In practice, the way that the detection and investigation of code smells are supported is that the spreadsheets are loaded into a database and analysed (effectively, applying business intelligence and analytics to the spreadsheets). Dr Hermans’ tried to implement this using a relational database. This didn’t work. Where you have multiple links between spreadsheets or workbooks the number of joins you need to analyse the cross-references can escalate rapidly. In order to get her analyses to run with acceptable performance she has implemented her software on top of a graph database (specifically, Neo4j). As readers will know I am a big fan of graph databases, so this is an unexpected use case.

Returning to code smells specifically, this is a big deal. While managing spreadsheet proliferation is a major issue and detecting and correcting errors (and potential frauds) is very important, there are plenty of tools that can do these things. What, historically, has been the problem with spreadsheets is that when they are really applications they do not go through the testing processes that conventional applications do. This is because there have been no simple, self-service tools for spreadsheet developers to use. Dr Hermans’ research now makes this a practical proposition.