Network scale, don’t get left behind

By Victor Kuarsingh

Picture: Carmo Convent Ruins form earthquake of 1755

What’s old is new again, so some have said. It is interesting to see how even in 2020 we continued to hit capacity walls with the sudden change in demand for one of the most precious resources on the Internet and inside networks – bandwidth. Clear to most is the trigger, COVID-19, for this onslaught of additional bits hurling across the networks.

In February 2020 many knew that something was brewing when news continued to flow that the new virus known as SARS-CoV2 was devastating Wuhan. What we did not know, at least not to any degree of certainty, was the sudden shift in demand as a large swath of the office going workforce hunkered down at home and began to use video conferencing as their main interface to conduct meetings, Interviews, and other work. Whereas some may claim this was a surprise that none of us could have foreseen, the phenomenon of a sudden shift in demand is not a new one.

Lets enter our virtual DeLorean, turn on the time circuits and enter the date 1985. Oh wait, that’s too far back! Lets set it for 20 years later, so 2005. Now that we are in 2005, our story begins. I recall many conversations as the rumbling began to emerge of this new phenomenon called YouTube. Until then, we saw the rapid, yet gradual increase of bandwidth within the ISP networks throwing caching infrastructure to help manage the customer’s experience while we applied the 2000s model of building out the broadband networks and peering needed to sustain our normal for that time. However, YouTube changed all that. Suddenly, the capacity models broke. It was increasingly hard to keep up with the customer demand for bandwidth as demand started to consume much more bandwidth per user than before. This demand was really new, and before that, the concept of mass customer generated content was really not a something we engineered around. Facebook had also hit the scene, but other than fighting for supremacy versus MySpace, it did not seem to push as many bits to the network …. yet. Many of us planning network scaling realized it was hard to predict what will drive demand for our networks especially since the use case was new and not predictable.

We learned our lesson and everyone worked perfectly since then. Well, not really. lets climb back into our fictional metal time machine and advance to the year 2010. Although Netflix streaming has started in the US prior to that year, in Canada (where I was at the time), Netflix hit the wire. Again, there was a massive shift in demand for the network. Traditional content distributors were baffled by the love for this pay service that seemed to have benign content, but a large audience none the less. Unlike models that worked for pay-per view and traditional movie rentals, streamed from more traditional ISP systems, this demand was different. People behaved differently and the old world of sitting in front of the TV at defined times to consume movies was displaced by watching content on other devices at any time. This seemed to also spawn the new social act of binge watching. Around the same time tablets hit the market, and as the next few years rolled on, people started to consume streaming services on many devices and families began watching content in parallel, putting pressure on the network once again. So finally, we got it, nothing left to learn. Once again, not really.

We don’t need to climb back in our our DeLorean again since around the same time over in our Wireless networks we saw a different transition hit. Smartphones had displaced older phones that often used WAP and now people were using their phones for content. These were not ordinary phones either, they were smartphones which were driven by Apple’s push in 2007 with the first iPhone. The old saying of “how many T1s do we need for that cell tower” was no longer sufficient to address this insatiable demand for bandwidth. New technology had effectively allowed people to use the Wireless network for new functions which of course included content viewing. Once again, the race was on to upgrade the Radio Access Networks, Wireless core, transit uplinks, peering and other network services. Often older network designs did not cut it, and new topologies and designs needed to be rolled out to address the demand for network bandwidth.

What was the point of this stroll down memory lane? Coming back to 2020, the lesson is that we cannot reliably predict demand as well as we think we can because new demand is often based on new phenomenon which we cannot model before we are exposed to it. If I told you last January (2020) that the vast majority of office personnel were going to work from home and use video conferencing you would likely have challenged me. So, what can we do if we can’t predict new demand based on existing models? Well one way is to just accept that things will happen and we need to shift and respond to new phenomenon. But business leaders and designers can do something more. When we produce our network and system designs and network scaling plans, we should do so with an expectation that we may need to not just scale our infrastructure based on known models. We should understand how we would scale for models based on ‘what-if’ scenarios which look like 10x, 20x, 50x and even 100x of our current network and service infrastructure. Businesses may not have the funds to support an advanced build out of capacity; however, we should know how we will get there if we need to. Waiting for the next emergency to figure out how to scale outside of our known curve is a sure fire way of getting left behind by those who can respond.

Leaders need to run these what-if scenarios, understand what technical constraints may exist, what business constraints may need to be overcome, and what time scale needs to be factored into achieving such goals. It is possible that some of the blockers cannot be addressed with current technology, funding or workforce, but knowing what needs to be done can be very helpful. Perhaps even knowing how to address one or two challenges with others left for future solutions can still provide a head start when a sudden scaling crisis arrives. If leaders find that they cannot deal with such scenarios such as a 10-100x increase in demand, then a hard look at what we have built may be in order. Making design changes and producing processes to scale even under extreme multipliers is best done before the emergency is upon us.

So practically, what can get in your way? What should a leader be aware of? First, do the designs in place support scaling, even at one to two orders of magnitude? If not, does it require a clean sheet design or are changes sufficient to meet such needs? Second, do your vendors have equipment types that support such a large change in capacity? If you use your own hardware, you would ask yourself the same question. What would supply chain need to look like? Especially Should extra or new equipment be needed? If you need to leverage a new vendor, have you put in the needed work to onboard that to fit the provisioning, configuration, management and operational models? Is training needed, especially if it relates to new potential hardware or software if deployed. If massive expansion is needed, are power needs able to be satisfied? What about time to build? Labor is often a challenge, especially during periods of high demand. There are many other self reflecting questions one can ask within their organization, but the important point is that leadership ask these in advance, and understands what is needed to provide a reasonable answer to them.

What will drive your business may be unique to what you offer, it may be related to more global drivers and behaviors. Never assume we know what’s next because the Internet and the industry in general has a weird way of teaching us that we can’t predict the demand of the future. If you ask me what’s next? 5G will likely continue to unleash new types of demand that are unlocked by having rich bandwidth resources and new application options opened up. Depending on where you play in the industry, you will want to figure out what that means for your business. Cloud acceptance is also continuing to gain acceptance to CIOs and CTOs, many of which were weary of such migrations just a few years ago. This impacts scale and performance. Ask hard questions now, before the next demand shift occurs.

To explain why I used the Carmo Convent ruins picture above, which I took while in Lisbon a few years ago, is a way to highlight crisis and preparedness. The initial impact of the 1755 quake was devastating. However, the ensuing inability to deal with fires throughout the city and recognizing the receding waters meant a tsunami was coming had caused much more impact than necessary. We may not always be in a position to eliminate all impact, but preparedness can help us avoid additional impact which can be managed with the right foresight and planning.

Leave a comment