Category Archives: Cloud Computing

Captain Nemo’s Data Centre Under the Sea

Enthusiasts have been water-cooling PCs and even enterprise servers for years. 

So surely water cooling an entire data centre by dropping it into the ocean, can’t really be that ridiculous of an idea?… can it?

Well, it’s not.

Because that’s exactly what Microsoft did when they deployed Project Natick Phase 2 off the coast of the Orkney Islands in Scotland.

I was intrigued and naturally wanted to learn more, to understand:

  • Why?
  • How?
  • What are the technical and ecological benefits? 
  • Is it a viable business model?
  • Can it become a repeatable solution?

… and those are the questions that this blog post will aim to answer.

With some help from Jules Verne

Twenty Thousand Leagues Under the Sea

Something instinctively stood out for me. I couldn’t help but draw parallels to Jules Verne’s Twenty Thousand Leagues Under the Sea. The pivotal character in this well-known science fiction novel is known to us as Captain Nemo. Though he was later identified as Prince Dakkar.

Prince Dakkar was an Indian Prince who journeyed to the depths of the ocean in his submarine – The Nautilius

The parallels between the Prince, his submarine and Project Natick are central to this post. It’s not just the commonality between the names – Nautilius vs. Natick – but the parallels in the underlying political motivations that drove the creation of both projects:

  • Captain Nemo’s mission was driven by his aversion towards imperialism, whereas the social injustice of the British Empire was his primary antagonist
  • Project Natick’s mission is driven by our aversion towards global warming, whereas our ever-increasing carbon footprint is this project’s primary antagonist

Both creations were born from a drive to change the world, deliver benefit to those around them and leave a lasting footprint on the globe.

And where did they both exist? In the depths of the ocean. 

Why does Project Natick exist?

  • Data centres have a global annual energy consumption of between 200TWh to 500TWh – that’s quite a range, but it covers the disparity in the reporting and estimation
  • This represents between 1% to 2.5% of the world’s energy consumption, which is between 0.3% to 0.5% of the worlds carbon emissions footprint
  • When you fold the lower end of these estimates into the entire ICT sector – networking, digital devices, televisions and cellular comms – this industry today accounts for approx. 2% of global emissions. This is equivalent to the carbon output of the airline industry!!!

Cooling a vast array of servers, storage and networking equipment is the largest energy burn for data centres.

Which is why Project Natick exists; to deploy data centres in locations where the requirement for cooling is not only reduced but eliminated and where power can be sourced from renewable means.

Future Energy Projections

Our data centre energy consumption is bound to increase but there are two schools of thought here:

  • Data centre providers argue that compute is becoming more efficient and energy demand will be steady as our data requirements grow
  • Environmentalists are projecting an 8-fold increase in power consumption in as little as 5 years

It’s not surprising why there is a polarised view between these two communities. It’s also not surprising where there is such a disparity in the current actuals.

However, whatever side of the spectrum you lean towards, it’s largely irrelevant. Just measuring ourselves against today’s emissions – even at the lower end – is enough to justify why we need to act now.

What’s the specification of the submarine-like vessel?

  • The unit compromises of two core components, a pressure vessel and a subsea docking structure
  • It’s the approximate size of an ISO shipping container, the ones that we typically see on the back of a lorry
  • The payload in this pressurised vessel is 2 racks with 864 standard Microsoft data centre servers and 27.6 Pb of storage
  • It has a maintenance-free life span of 5 years and has the data centre designation of Northern Isles – SSDC-002

What are the environmental benefits of the Project?

  • It’s purposefully positioned in the EMEC – European Marine Energy Centre – around the Orkney Islands
  • The EMEC is the world’s largest site for wave and tidal based power, so the data centre runs entirely from 100% renewable energy
  • The vessel uses a saltwater cooling system adapted from a submarine

The operational power demand for the project is entirely carbon neutral.

What are the business benefits of the Project?

Business Benefit #1 – Latency

With more organisations moving and deploying services into the cloud, the physical distance from networking hubs/offices to cloud-based applications or data stores can be problematic:

  • The fibre optic cables that transmit data are limited to the speed of light. The further the data has to travel the longer it takes. 
  • You might think, why anything on earth needs to travel faster than the speed of light? Well, many data scenarios; especially synchronous data processing, where data writes must be acknowledged at the receiving end before processing can continue, can be performance limiting. 
  • The advent of Machine Learning and Artificial Intelligence coupled with the fact that 50% of the world’s population lives by the sea, will increase the demand for data to be physically closer to people and remote devices.

Business Benefit #2 – Time to Deploy

This is where the story gets really interesting. This project took only 90 days to build and drop into the ocean. Deploying a whole data centre in less than 3 months is an incredible turnaround time. It’s often taken me 3 months to just get servers purchased, racked and deployed into an existing data centre. This solves two problems for cloud providers:

  • Planning consent is likely to be quicker and easier to achieve in comparison to the planning and build of a new on-site facility
  • Acquisition time and variable purchasing costs are eliminated, as many hyperscaler’s have been increasing their cloud footprint by negotiating the procurement of existing data centres

Business Benefit #3 – Reliability

This model is going to drive more focus on reliability and redundancy of hardware. For hardware geeks out there MTTF – Mean Time to Failure – will have to be greatly increased.

  • As system architects, we usually design for a 5-year lifecycle
  • However, traditional deployments are rarely maintenance-free within that time frame
  • When a datacentre in the ocean has a 5-year maintenance cycle, the equipment within must have sufficient reliability and redundancy to avoid re-floating, servicing and resubmerging
  • Having to physically maintain the payload within it’s lifecycle is unlikely to be economically viable
  • This entirely changes the focus around architecture design and reliability in the minds of engineers, architects and manufacturers

Other Benefits

Deployment of such vessels is not limited to cold locations. You only need to drop a container to 200m below sea level – even in tropical climates – to leverage the same cooling benefits.

It opens up a vast gateway into the unknown around data sovereignty. Using the UK as an example The Crown Estate can only exercise territorial jurisdiction of up to 12 nautical miles.

The pandemic has shifted our use of real cash to digital and Bitcoin alone consumes 0.33% of global electricity. If the trend towards digital currency continues – which I’m sure it will – then this is another likely requirement that will drive energy demand upwards.

Is there a future business model here?

Absolutely…

However, if this data centre concept is going to be economically viable, then the payload will need to stand the test of time. Redundancy and reliability for a maintenance-free operation is a key requirement.

Ironically, I drafted this post before Microsoft floated the vessel last week. Though Microsoft stated a lower failure rate compared to equivalent land-based deployments, it’s far too early in the lifecycle of this programme to make predictions. There are no other independent studies – not at this scale – and there are indeed a set of corporate optics that the marketing team at Microsoft must align too. 

Captain Nemo’s Nautilus even for a work of fiction was an engineering achievement of epic proportions. Verne himself described the Nautilus as “a masterpiece containing masterpieces”, which itself is a testament to himself. His imagination and scientific foresight was unprecedented at that time.

It’s the same degree of creativity that will drive similar creations to Project Natick. A containerised deployment where cloud providers could allow customers to design and deploy their own subsea datacentres.

Imagine, customising a payload, having it fitted, the vessel screwed shut and dropped at the bottom of the ocean, to only resurface for a payload refit in 5 years? All this in 90 days, with a whole array of opportunities to scale out by simply buying more containers.

This concept as a commoditised product would be an achievement as grand as when Verne penned the Nautilus on paper. It’s an exciting space to watch, not only for the environmental benefits but to support the imminent increase in demand for edge computing. 

Subscribe to my mailing list and receive new blog posts straight into your inbox.

Unclouding – Do you need a cloud exit strategy?

The CEO of a leading cloud provider was recently quoted as saying, that organisations not transforming their business to adopt the cloud were defying gravity and by not being “all-in”, toe-dippers were risking giving their competitors the edge.

The benefits of cloud computing for ERP are immense and well publicised. However I do feel as though, we have a limited counter balance to this view. This is largely fuelled by a growing impartiality between industry research and advisory groups, that are “supported” by cloud providers to push a cloud first agenda and leaving little room to consider the merits of other options.

Putting the cart before the horse

The most common error in technology enabled change is starting with the solution. Therefore, it’s not surprising that many cloud ERP implementations fail to deliver their intended benefits, whilst at the same time subjecting organisations to massive implementation and run costs.

Inevitably there is a growing trend towards cloud exit, or as I like to call it unclouding. It’s a real thing with big names like Dropbox who have reversed their approach.

The importance of an exit strategy

Companies running ERP today, fall into three categories when it comes to the cloud, all-in, partially in, or considering it. Whichever category you fall into and regardless of which stage you are at in your cloud lifecycle, it helps in understanding the reasons why organisations have, or are likely to uncloud. An exit could be both financially and politically costly therefore, developing and maintaining the right strategy is key for exit avoidance and/or readiness.

Why are organisations exiting the cloud?

The core reasons for “unclouding” are around security, regaining control and most commonly reducing cost – Yes! That wasn’t a typo – reducing cost!

One of the consistent messages coming out of the cloud sales push, is that you can’t run infrastructure as cheaply on-premise. But when you apply this to largely steady ERP workloads it might surprise you to hear that you can; of course, there are numerous variables in this statement. We constantly hear that cloud requires a different operating model to leverage cost savings and operational benefits, but again – is that not putting the cart before the horse?

A common example is where cloud providers, implementation partners and advisory groups are leveraging the flexibility of the cloud to architect “on-demand” solutions for ERP environments. This influences the business case to reduce operational costs, by shutting down test environments (when not in use) and scaling up/down your Production environment to meet demands. But there are some real issues with this approach:

  1. If you power down systems, you still have to pay for storage costs
  2. Database servers require pay-by-hour database licensing, but this is generally more expensive than purchasing perpetual licenses. License exchange discounts pitched in the sales cycle (moving from one database vendor to another) do not usually apply to the pay-by-hour model.
  3. Most ERP businesses are global and sizing/costing peak demand is usually limited to month-end/year-end processing. This level of frequency is usually insufficient to leverage significant cost savings. True elasticity in your demand needs daily extremes, which are not always common in most ERP solutions.
  4. System administrators are reluctant to shutdown test environments or scale up/down production, due to the disruption and associated risks. Automation tooling is still maturing and can be time consuming to implement and maintain.

Another cost factor that’s often built into a business case, is the reduction of your infrastructure operations teams, but in reality that team just changes shape.

I’m know I am only focusing on a few examples here and clearly business case modelling will continue to evolve. There are a multitude of reasons why cost comparison exercises between on-premise and cloud are likely to provide a distorted view of savings.

How do I develop an exit strategy?

An exit strategy is far wider than the mechanics of how you would perform the exit and where you are likely to land. We need to take a step back and identify the likely triggers (cost, performance, availability, security etc.), understand how we measure these and the levers to respond to them.

There is merit in devising an exit strategy whilst you develop your business case. The risks, impacts and assumptions that build a business case are key inputs into this strategy.

The key takeaways here are:

  1. Identify your exit criteria and levers, but ensure your governance model continues to monitor and measure these. Invest in an innovation team as operations are always too busy keeping the lights on. Empower this team to drive benefit by executing and identify further levers, to keep that exit criteria in check.
  2. Have a high level exit plan that covers what your landing options are likely to be, impact to the business, cost/risk of a migration etc. Also consider the impact of architectural limitations any future decisions will have on the exit strategy. Your exit criteria may identify certain apps and scenarios, even resulting in a hybrid cloud across multiple providers!
  3. Do your homework and due diligence by investing time and sourcing the right skills to develop a robust business case. This is a moving target and is constantly evolving, but that’s technology in general.
  4. Design to promote mobility – especially if you are greenfield. Try to avoid painting yourself into a corner with one cloud provider. Design your architecture to keep your options open and limit the impact of moving between providers/on-premise.

Benefits of an exit strategy?

Developing the exit strategy early not only provides that crucial “checks and balances” on the proposal, but once you are in a run state it provides a framework to govern and drive operational efficiencies. In the event of an unlikely exit (partial or full) you are somewhat prepared and also have some collateral to support contract renegotiations.

Don’t underestimate the cost and impact of an exit. Some organisations would have undergone costly transformation projects to move to the cloud and it is not uncommon for complex migrations to cost the equivalent of many years worth of cloud run costs.

If you have a made a solid commitment to your cloud journey and exit is politically damaging, there is still enormous value to your business in developing and maintaining an exit strategy.

Finally, lets not forget – without trying to contradict my opening sentiments in this post; we do have to applaud early adopters. It is those organisations that were brave enough to take the plunge, learn the hard lessons that have enabled the rest of the industry to benefit and develop more effective decision making.


Update – 8th August 2020: Not all organisations give into the market pressures to accelerate digitisation. Listen to the soundbite from Episode 3 – “Staying behind the digital curve” which explores the Aldi business model.

Full podcast episode is available here.

Subscribe to my mailing list and receive new blog posts straight into your inbox.

You can also leave feedback about this blog post on LinkedIn or Facebook

launching your erp into the clouds Bobby Jagdev

Launching your ERP into the clouds

The words “digital” and “cloud computing” seem to be embedded throughout every ERP presentation today. You can’t get away from promises of reducing risk, cost and faster deployments to enable your “digital transformation”. But when you lift the lid – what does this really mean?

When a mission critical SAP/ERP implementation undergoes a major technology enabled change (move to the public cloud, migrate to HANA or a system upgrade), there is a common denominator. The main event – the production downtime window to go live! 

The public cloud offers a multitude of benefits, however lurking underneath the covers is also an array of risks and issues, likely to sting you during the main event.

Don’t get me wrong, I totally support the cloud movement – though digital transformations can also be supported on-premise too, as they have been for the last 20 years. My aim however is to educate and inform to ensure the risk profile of production cutovers in the cloud are understood.

It’s all about visibility and control – these are the two key areas you lose.

The downtime window

Often mission critical ERP maintenance windows are fixed and sometimes agreed with the business a year in advance (sometimes greater). With such advanced scheduling there is likely to be limited insight into how they will be utilised and sometimes, insufficient for the change they are allocated too.

Even when you can influence the duration of the window, you will still be subject to business constraints and held to early experience based estimates.

Whatever environment you are operating in, there is a common challenge – execute a complex and often multi-dimensional change in a production downtime window that is never long enough!

So, we make the impossible possible by reducing the technical runtime, whilst at the same time reducing risk, eliminating variables and creating a repeatable recipe.

The approach

Over the years I have developed an approach to making that impossible possible. That itself deserves a separate blog, but in a nutshell, identify levers, variables, risks, benefits and then devise a strategy to rinse and repeat – prove it, break it and document it!

Cloud computing simplified and reduced the cost of the key on-premise prohibitor – compute and storage! The ability to stand up instances (at the right size/scale/config) and be able to pay by the hour for the privilege, became a real game changer. This allowed us to focus more on the creative levers to help solve the problem.

The team working on this will feel like the challenge is nothing short of launching a rocket into space.., well we like to think that!

The end product

Once we have mastered our recipe our toolbox is equipped with a technical runbook and detailed cutover plan. Somebody is assigned to ordering pizza, whilst some of us prepare for no sleep for the next few days (or catch what you can on the office sofa or even the floor – of which I’ve done both).

Authority to proceed

You secure a GO decision from the exec and the team quickly move into executing the plan. People mobilised, technical processes running, governance checkpoints governing and we are now full steam ahead.

There are two types of events that are likely to occur when things go wrong; something breaks and everything stops; or the most painful of all, it slows down! You are unable to achieve the benchmarks you recorded in your rehearsals and now your plan and contingency is at serious risk.

It’s only then in the dark of the night, that an exhausted team (with the eye’s of the execs reigning down on them) realise how vulnerable they are, due to a lack of visibility and control.

A recent project involved a complex migration of a large SAP implementation to the cloud. Even though this migration involved a database and operating system change, we soon realised there was less risk in moving all of SAP Production in a single event. Data transfer was one of our biggest challenges, we addressed this via a complex set of daisy chained events across several transfer links.

When it starts to go horribly wrong

and the adrenaline kicks in…

Once data hit the first staging area in the cloud something started to smell wrong, everything was running much slower than planned. Incidentally Hurricane Florence was battering the US East Coast during the same time. Even though our change was in Europe, there was news after the event that cloud providers were moving loads from North America to Europe to ensure availability. So replication of huge volumes of data and shifting of compute demand was likely to stretch hypervisors and push even the most highly provisioned storage solutions to their limits. There were no incidents, reports or status updates being declared by the cloud provider.

On another project an application upgrade slowed down far below our benchmarks. No hurricanes or world disasters to blame on this occasion, but we never really identified the root cause. However our analysis (once out of the heat of the battle) suspected this may have been unrelated to the cloud infrastructure.

Both examples bring us back to visibility and control. The decision for cloud providers to move workloads was entirely their own, they have availability SLA’s and other customer to look after too. When under pressure the lack of visibility across the full infrastructure platform severely impacts your ability to effectively troubleshoot. It soon becomes a distraction to the working team and stakeholders.

What did we learn?

You soon learn how to magic contingency from a plan that doesn’t seem to have any left – that itself is an art.

Lets not forget the cloud isn’t really this magical unified layer of compute that is always on, always performing in the sky somewhere. It’s a complex amalgamation of data centres, aimed to provide an unprecedented degree of scale and availability. But when it comes to maintenance of mission critical ERP you have to understand that decisions and changes by the cloud provider are not done with your go-live in mind. That lack of control is a real risk and if an incident occurs, getting the right level of visibility to support troubleshooting can be a real challenge.

The key take aways here are three things:

  1. Sufficient contingency – both planned and also know how/where you will find contingency if it becomes exhausted
  2. Set realistic expectations with the exec around the technical control/visibility risk of your go-live
  3. If you haven’t yet moved/deployed mission critical ERP in the cloud, reflect on the potential impact extended (if infrequent) planned downtimes may have on your business

Alternatively you could argue the cloud provided a degree of resilience to enable our change to be completed without a hardware failure and the ability tap into an immense amount of pay-as-you-go compute. Even if these changes were performed on-premise there is always a risk of hitting unexpected issues not witnessed during rehearsals (even a natural disaster). The argument can swing both ways, but again this is all about visibility and control.

Mission critical ERP are the crown jewels that run your business, the cloud is an incredible enabler, but does come with inherent risks/challenges that we should be tuned into.

Subscribe to my mailing list and receive new blog posts straight into your inbox.

You can also leave feedback about this blog post on LinkedIn or Facebook