Future starting here? - Looking at DevOps processes beyond simple software development

I’ve just been to “The Future Starts Here”, an exhibition around “100 Projects shaping the world of tomorrow” at the V&A [The Future Starts Here]. Interesting stuff, and it supports the idea of Mutable Businesses, in a constant state of evolution, rather well, I think.

But there is a subtext in the exhibition. Just because we can do something, should we do something; if we should, can we manage the impact of change? Change always has consequences and these consequences have to be managed – and if they can’t be managed easily, it’s best to find out before you start.

Over to DevOps. It’s very probably a great way for the Mutable Business to put its evolutionary mutations for the “Future that is Starting Here” into production systems. But it is essential that all stakeholders are involved in the process and that consequences are explored as well as the expected positive use case.

If the system, or a user, makes a mistake in a process with millions of users and lots of interacting micro-services, there needs to be a “remediation process”, that corrects the results of changes made in error, and this is very likely to be non-trivial – data that shouldn’t be there can be used in other transactions before you can remove it, and remediation processes may well activate paths through the system that the designers didn’t anticipate (and therefore haven’t tested).

Remediation needs to be designed – and tested – as part of the original design. I think that the recent TSB debacle [UK ‘meltdown’ bank TSB’s owner: Our IT migration was a ‘success’] is a graphic example of further problems (not just system failures, but fraud and other criminal activity) arising when a process doesn’t go exactly as anticipated. And the TSB’s remediation efforts appear to have been woefully under-designed (or under-tested).

Cutting over from one “working well enough” system to a better system, apparently working well in a different bank, is largely an “Ops” process. Nevertheless, it has possible Dev, testing and customer impacts and (in these days of Ops/Dev collaboration) should be run as a DevOps-style process – with a realistic implementation target; with “fail small” incremental implementation; with automated validation (service virtualisation is your friend); with all of its stakeholders involved in validating it; and with all of them fully (and knowledgeably) committed to making it work. The evidence is suggesting that this TSB migration project skipped a lot of that – and the scope of impact of failure was far wider than anyone anticipated (going well beyond mere IT into an increase in fraud and loss of customer confidence).

So, if DevOps – agile collaboration between Dev, Ops and the Business using transparency, automated validation and feedback loops – is as effective as people say it is, perhaps it should have been used for this TSB migration.

Or, perhaps people thought that agile, collaborative, DevOps-style processes were being used but (because of time pressures?) people actually skipped key aspects of this – overlooking the involvement of all stakeholders; transparency and feedback; incremental delivery and “fail small”; some or all or more of that – thus turning it into “DevOps in name only”. DevOps can fail, if it is done wrong or by the wrong people. Perhaps it is harder to make DevOps fail, however, than old-style big bang implementation processes, although that won’t be much consolation if it does.