Hi- I’m Alex Manning, Technology Director at Tecknuovo and I’ve spent the last 20+ years getting lots of scars by helping people fix pretty thorny technology issues. I’m one of those
idiots people who run towards problems and have had to deal with bigger and more interesting (scarier?) challenges as my career has progressed.
Some of the highlights include dealing with scalability issues on payments platforms processing over £70bn worth of transactions on peak days for a high street bank, migrating one of the UK’s largest mobile banking platforms to alleviate outages that would appear on Sky News, as well as building, managing, migrating, and rescuing services of all shapes and sizes!
These (and hopefully more in the future) are reflections on cloud-based platforms I have seen over the past 5ish years, and common themes I see emerging based on personal experience.
I’ve been fortunate enough to have been able to get quite a lot of first-hand experience with cloud-based platforms over the past few years – using both private and public services – from bare metal startups to blue chip enterprise level implementations.
One common theme from all of these – no matter what services, platform type, or size organisation – is that all of the clients I have spoken to have underestimated the need for planning and building your infrastructure layer, or the “plumbing”. This covers all of the tools, services, and processes required to run your platform efficiently including the mundane things like cost management (apologies in advance to any finance people).
It should also be said that the majority of the companies where I have seen these implementations were relatively new to the cloud and the services offered. In almost every single instance, the architecture function had a good working knowledge of the services available and how to consume them at the application layer. However, there was little to no understanding of how any of the Infra-type services worked, or the need to actually design and implement them like any other critical service.
As a consequence, all of the implementations I have seen suffered from a large amount of debt causing wider issues that many senior leaders did not understand, including;
Overspend – inefficient designs meant too many resources were deployed, with an inability to manage due to lack of visibility. By adding tagging into a design upfront, creating ceilings and limits on services with alerting and building out dashboards can prevent this – but this takes time.
Standardisation – not enforcing the use of pipelines and putting automated testing into them to manage ALL environments, coupled with:
Automation – frequently believed to be something that “we can do later”. Build it, test it, automate it! Economies of scale to be had in efficiency here, listen to your DevOps teams…
And finally, the most common sin:
Product over platform – Not allowing time to iterate on your pipelines, templates or dev tooling because of the “need to ship product”. Your pipelines and artefacts that support it, are just as important, and if not dealt with upfront will magnify cost and wasted effort later.
“It’s just a drop down box like in Excel, right?”
A good example, of the misunderstanding of cloud, happened whilst building out a greenfield platform for a client. I was trying to get NFRs, for data recovery objectives, from a product teams architect – for the third time. The impatient reply I received was, “What’s the problem here? Surely this is all done using a drop-down menu like Excel” ….. Err no, sorry, it really isn’t. Needless to say, there was a lot of backtracking once we did a demo and outlined the criticality of what we were trying to document, and why it was important to the design.
All of the items above need to be carefully thought through, discussed and designed before embarking on a build; like you would for any critical business facing application that actually sits on the platform. Of course, cloud-based services accelerate deployments and are more efficient than traditional servers etc. But they have to be carefully thought out and designed, especially for platforms that have multiple, complex services, integrations with other systems, or that move data around – which, let’s face it, is nearly every cloud-based service….
Proper Planning Prevents Poor Performance (and waste!)
On the flip side, I helped a client implement a lightweight proof of concept implementation on AWS. It sat over multiple platforms to allow them the flexibility of migrating applications from legacy systems to reduce the risk profile. We took the approach of highly automating the environments and pipelines as there were only a small number of people who had the relevant skills (and this was to reduce operational overhead). It involved a lot of negotiating and out of hours work to land due to the client’s knowledge levels of AWS. This proof of concept acted as a facade across multiple different geo-located businesses. It then allowed them to integrate an acquisition in six weeks rather than the usual two years.
I hope you have found this blog interesting, even useful. One of the things I like to think I bring to my engagements is a pragmatic approach to solving problems. In summary, the one key takeaway I hope you get from this, is to treat your DevOps functions as you would the product function. Take the time to understand clearly what the requirements are, and iterate around it to make sure the platform keeps maturing and improving over time.
Top tip: listen to the person who keeps getting woken up at 3am to fix stuff on what needs sorting out!!!