When end users ask how to best manage and maintain their new edge computing systems, the answer is not always straightforward. Unlike data centers, most applications at the edge are deemed non-critical. If a system or network fails at a chain restaurant, and people can no longer access the automated option to view menus, the level of convenience drops, but business is not dramatically disrupted. However, if the local food preparation systems or billing systems won’t work and customer meals cannot be delivered, the restaurant will need to temporarily close and those downtime costs are significant. As a result, each location that hosts an edge computing system is faced with different elements that impact their cost of unplanned downtime.
Lost business is an example of the direct cost of downtime. In other types of edge computing environments, indirect cost associated with unplanned downtime can also be substantial. Consider the example of a wind turbine located in a remote area of Scandinavia. If an on-site $300 router fails, a technician will need to spend at least a day to access the location, repair or replace the faulty unit, and then return. A helicopter may be needed, and, if the winter weather is severe, the risk to human safety increases. Replacement of that single router could end up costing tens of thousands of dollars in maintenance fees alone.
Three approaches for managing edge computing system maintenance
Although issues such as human error and insufficient power protection contribute to instances of edge computing systems downtime, the biggest cause of downed systems at the edge is lack of maintenance. Devices such as basic Uninterruptible Power Supplies (UPS), will emit an audible alarm if they encounter a problem such as a spent battery or a power sag. If no one is on site to hear the alarm and to address the issue, power protection is lost and the system is exposed to both blackouts, brownouts and power surges (see APC white paper “The Seven Types of Power Problems” to learn more about how to combat these power issues).
End users who invest in edge computing systems generally choose one of three different approaches to manage and maintain their edge computing sites:
- Run-to-fail approach – This is an extreme approach that discount retailers, for example, often implement. Systems are allowed to deteriorate until a breakdown occurs that interrupts the flow of business. Response crews are brought in to rectify the situation as quickly as possible so that the resulting downtime can be kept to a minimum.
- Preventive approach – This is a marginally proactive approach where an on-site, non-technical, staff member– for example, a real estate manager–periodically monitors the edge locations. Based on how old the assets are and the historical failure rates, regularly scheduled maintenance is performed (a technician may stop by once a week and only tweak the equipment or perform any necessary, obvious maintenance).
- Predictive approach – In this fully proactive maintenance model, assets are actively managed (often remotely) and monitored in real-time ensuring that minimal unplanned downtime occurs. Automatic alerts are generated based on signs of abnormal equipment behavior. These alerts are quickly acted on and repairs are performed before any downtime occurs.
Field data indicates that, although the run-to-fail and preventive maintenance models at first appear to be less costly than the predictive approach, the opposite is true. Run-to-fail and preventive costs manifest themselves in small increments over time, but add-up to a substantial amount each year. In addition, stakeholders who manage these environments live with a higher degree of fear, uncertainty and doubt regarding the uptime of their edge computing systems.
Downtime mitigation methods
Regardless of the nature of the edge computing site in question and the management and maintenance approach selected, edge system downtime will always be perceived as a necessary evil that drives up cost. The question is, “how much cost?” Each installation will answer that question differently based on budgets and adversity to risk.
Today, a number of options exist for improving edge management and maintenance. One approach is to invest in monitoring software which provides visibility to the remote assets in the field (like APC UPSs that are shipped ready to be remotely monitored). A second approach is to make certain the edge solution is equipped with robust hardware so that instances of downtime are less frequent. The third option is to mitigate downtime risk through service contracts. In this case, qualified 3rd party partners manage the monitoring and maintenance of edge computing assets at a reasonable cost.