My daughter was getting ready to make bread pudding and she realized that the milk had gone bad. She asked me: “Dad, how quickly can you get me a gallon of milk?” The closest grocery store is about 10 minutes from my house, so I calculated the time to go back-and-forth plus the time needed at the store and told her 25 to 30 minutes.
On the way, I realized it was game day at the high school and unfortunately that meant a lot of traffic. What I thought would take ten minutes took 20 minutes one way. As I reached the store, I had more challenges such as finding a place to park and the long line at the counter. All in all, it took an extra 22 minutes.
By the time I reached home with the milk, 62 minutes had passed since my daughter made the request. Obviously, her experience was not great with that level of latency.
Latency is defined as the round trip time between making a request and receiving the response. In this case, the expected latency was 30 mins, but the actual latency was 62 mins.
Application performance is not any different. Application Developers build applications with amazing functionality with an expected latency in mind, however many factors influence the actual latency and in turn impact end user experience.
Why bother understanding latency?
The impact of latency can vary significantly depending on the application. On one hand, high latency for an application like web browsing might lead to poor user experience. However, for applications that enable autonomous cars and robotic surgery, latency could be the difference between life and death. Similarly, for applications that are focused on fraud detection, latency can result in missed opportunities to prevent bad things from happening.
While latency cannot be completely eliminated, it can be minimized with well-architected designs. In turn, this could enable excellent customer experiences, near-real time intervention, and delivery of new opportunities.
The five C’s of latency
While many factors influence latency, some of the more common factors that are most important for any design are captured by the five C’s of latency.
Conexión
Connection is the path between a client that makes a request and a server that processes the request. In a typical application flow, the end-to-end connection can consist of multiple hops. These can include the last-mile connection from the user to the wireless network, the connection to/through the internet, and connections between different servers in the path of execution. Every hop and processing stage is a potential contributor to latency. For example, if the last mile is a 4G wireless connection, you can experience a network latency of 40-50 ms, but if you are using a 5G connection, latency could drop to the range of 10 ms. On the backend, various processing components might be connected across wide area networks in different regions.
Closeness
Closeness is defined as the distance between the client, making the request, and the server, processing the request. In a typical on-premise scenario, the client and server might be connected on the same local area network and hence the distance will be very short. However, as workloads are migrated to the cloud, the distance between the client and the server increases, thereby increasing the latency. It is important to not only consider the distance between the client and the compute server, but also to consider other distances such as the distance to the location of data assets, and the distance between dependent internal processing nodes.
Capacidad
Capacity is the availability of resources to process a request quickly and efficiently. In a typical application, many resources can be used, hence the capacity of each of these elements must be carefully planned and designed. The capacity of the network determines how quickly a request is transported, while the availability of compute servers determines how quickly processing can occur on the backend. Emerging wireless technologies such as 5G can significantly increase last mile capacity to users. This, combined with the ability to elastically increase or decrease capacity based on demands, is extremely important.
Contention
Contention is the backlog of requests waiting for a bottleneck to be cleared. Contention can occur for various reasons, such as not having enough capacity, not distributing the load effectively, and lack of policies to treat requests differently. For example, a critical request might be queued behind a set of low priority requests waiting to be processed. Understanding bottlenecks and effectively placing resources, load balancing, and defining policies can help mitigate contention of key resources, thereby ensuring faster responses.
Consistency
Consistency is the ability to deliver the same performance characteristics over a period of time. Ensuring consistency can be one of the biggest challenges in latency design since the applications will not have control over all the resources used in the path of a request. In addition to a well-architected design, the ability to understand the runtime characteristics of dependent resources, the most commonly used requests, and dynamically adapting the path to reduce contention are important aspects to delivering consistency.
An interesting point to note about the 5 C's is that the weight contribution of each of C’s to the overall latency can vary based on application needs, workload type and architecture. As you design, it is really important to start by understanding what the true needs of your applications are. Some questions to consider include - do you need low-level latency or mid-level consistent latency? Based on these, you can start modeling resources that satisfy connection, closeness, capacity, contention and consistency needs.
In Six design principles to help mitigate latency, we will look at strategies to help mitigate the impact of the five C’s of Latency.
Learn more about 5G Edge to further understand how you can jumpstart your journey in delivering low-latency applications.
Rajesh Vargheese is a Technology Strategist & Distinguished Architect for Verizon's 5G/MEC Professional Services organization. Rajesh brings 20+ years of expertise in technology strategy, engineering, product management and consulting to help customers innovate and drive business outcomes.