Performance#

E2E and Component Latency#

The following table shows the end-to-end latency breakdown for the reference workflow:

Configuration Details:

End-to-End Latency Performance Data#
KPI	Unit	Average	P90	P75	P50
E2E latency	ms	2124	2408	2184	2058
ASR latency	ms	323	400	372	303
LLM Latency	ms	789	940	809	756
TTS Latency	ms	200	219	200	195
Component Latency	ms	1312	1483	1386	1260
Other Latency	ms	812	934	845	796

Configuration Details:

Resource Usage per GPU for Different Concurrent Streams#
KPI	Unit	GPU Index	1 Stream	3 Streams	6 Streams
Average GPU VRAM usage	GiB	0	7.0	7.0	7.0
		1	4.2	4.1	8.4
		2	0	4.1	8.3
		3	0.4	4.5	8.8
Average GPU utilization	%	0	20.4	45.4	64.6
		1	54.6	46.8	91.0
		2	0	44.4	93.9
		3	3.8	48.4	97.0
Average CPU utilization	Number of logical core used		4.2	11.0	21.9
Average RAM usage	GiB		7.3	12.2	19.5
Average Renderer FPS	FPS		30	30	29.8
Average WebRTC FPS	FPS		30	30	29.8