Scaling & Dynamic Routing#

As shown in the microservice list above, every microservice is categorized into stateful vs stateless based on its characteristics. Stateful microservices are the ones that perform continuous real-time processing on video or audio streams ,therefore having to maintain its intrinsic states within the memory to avoid long latency and inconsistency. A stateful microservice is modeled as a StatefulSet resource in Kubernetes. Stateless microservices are generally the ones that are more permissive in terms of latency and mainly serve memoryless REST requests, therefore it persists its state externally either in a cache or a data base A stateless microservice is modeled as a Deployment resource in Kubernetes

Scaling stateless microservices is as simple as increasing the replicas to its deployment resource, meanwhile keeping all replicas of the same microservice within the same message bus consumer group in order to dish out a subset of messages to all the clients.

However, things can get a little bit tricky when we start scaling stateful microservices. As a stateful microservice maintains its state internally for the duration of a session. A state generated by a specific user session shares the same lifecycle with the stateful microservice. In the event of having more than one replica of the same microservice running, the source event will be picked up by one of the replicas, and the traffic needs to be correctly routed to the same replica for the lifetime of the user session.

As a result, we need information about the relationship between a workload object and its processing worker, in another word: a map between a specific stream id to the statefulset replica index, e.g., Chat controller pod 0 is currently processing stream id xxx. We need this information as we need to make dynamic routing decision on incoming requests so that they land on the correct replica where the workload is currently being processed.

Both the workload distribution and traffic routing is handled by NVIDIA SDR, which hides the complexity of scaling from developers. To scale a stateful microservice, developers only have to integrate with SDR. Details provided in the SDR documentation (Stream Distribution & Routing).