Run NVIDIA Speech NIM Microservices on Windows with WSL2#

This guide shows how to run NVIDIA Speech NIM microservices on Windows with WSL2. You will learn how to configure WSL2 for NIM containers and how to set up port forwarding when the client runs on a different machine than the WSL2 host.

Unlike the other tutorials in this section, this page is environment-specific. It covers the same deployment workflow (start a container, check readiness, send requests) but adapted for the WSL2 environment. Complete at least the ASR tutorial first to learn the core workflow on a standard Linux system.

What You Learn#

By completing this guide, you:

  • Configure WSL2 memory limits for NIM containers.

  • Deploy ASR and TTS NIMs on WSL2 using the same container workflow.

  • Set up Windows port forwarding so clients on other machines can reach the NIM running in WSL2.

  • Configure Windows Firewall rules for NIM ports.

Prerequisites#

Note

Support for NIMs on WSL is currently in Public Beta.

Before starting, complete the following:

  1. Follow NVIDIA NIM on WSL2 for WSL2 installation and GPU passthrough configuration.

  2. Check the support matrix to confirm your GPU and model are supported on WSL2. Not all models are supported in this environment.

  3. Configure WSL memory. By default, WSL can use up to half of system RAM. NIM containers need significant memory. To adjust the limit, create a .wslconfig file in your Windows user profile directory (C:\Users\<UserName>):

    [wsl2]
    memory=38GB
    

    Then restart WSL:

    wsl --shutdown
    

    If WSL memory is too low, the NIM container will fail during model loading or engine building. Set this to at least 32 GB for most models.

    For WSL issues, refer to WSL configuration and troubleshooting.

  4. On WSL2 you can need to use podman instead of docker. Substitute podman for docker in the commands below if your environment requires it.

  5. Complete NGC access setup and run docker login nvcr.io (or equivalent for podman).

Same-Machine Setup (Client Inside WSL2)#

If both the NIM container and your client run inside the same WSL2 environment, the workflow is identical to standard Linux. The only difference is model selection: choose smaller profiles that fit within WSL2 memory constraints.

Step 1: Launch the NIM#

Open a WSL2 terminal and deploy your service.

Deploy the Parakeet 0.6b en-US streaming profile. This is a smaller model suited to WSL2 memory constraints.

export CONTAINER_ID=parakeet-0-6b-ctc-en-us
export NIM_TAGS_SELECTOR="name=parakeet-0-6b-ctc-en-us,mode=str,bs=1"

docker run -it --rm --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -e NIM_TAGS_SELECTOR \
  nvcr.io/nim/nvidia/$CONTAINER_ID:latest

The NIM_TAGS_SELECTOR specifies bs=1 (batch size 1), which reduces memory usage. On WSL2, prefer the smallest batch size and a single inference mode (mode=str or mode=offline).

For other ASR models and modes, refer to the ASR tutorial and the support matrix.

Deploy Magpie TTS Flow (or another WSL-supported TTS model).

export CONTAINER_ID=magpie-tts-flow
export NIM_TAGS_SELECTOR=name=magpie-tts-flow

docker run -it --rm --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -e NIM_TAGS_SELECTOR \
  nvcr.io/nim/nvidia/$CONTAINER_ID:latest

For other TTS models, refer to the TTS tutorial and the support matrix. Not all TTS models are supported on WSL.

Step 2: Run Client Inference#

When the NIM is ready (/v1/health/ready returns {"status":"ready"}), open a new WSL2 terminal and follow the inference steps from the corresponding tutorial:

Follow Step 5: Streaming Transcription in the ASR tutorial.

Note

With the streaming profile above, only streaming client examples work. For offline transcription, deploy a profile with mode=offline. See the support matrix.

Follow Step 4: Synthesize Speech in the TTS tutorial.

Remote Client Setup (Client on a Different Machine)#

If the client runs on a different machine than the WSL2 host, you need port forwarding because WSL2 uses NAT networking. The NIM binds to ports inside WSL2, but those ports are not directly accessible from the external network.

Use a Windows terminal with Administrator rights for the following steps.

Step 1: Get the WSL2 IP#

In the WSL2 terminal:

hostname -I

Use the first address (for example, 172.x.x.x) as <wsl2-ip> in the next steps.

WSL2 gets a private IP on a virtual network. Windows port forwarding maps the external Windows IP to this internal WSL2 IP.

Step 2: Forward Ports from Windows to WSL2#

In an elevated Windows command prompt or PowerShell, create port forwarding rules for both NIM ports.

Forward HTTP port 9000:

netsh interface portproxy add v4tov4 listenport=9000 listenaddress=0.0.0.0 connectport=9000 connectaddress=<wsl2-ip>

Forward gRPC port 50051:

netsh interface portproxy add v4tov4 listenport=50051 listenaddress=0.0.0.0 connectport=50051 connectaddress=<wsl2-ip>

Replace <wsl2-ip> with the value from Step 1.

Note

WSL2’s IP can change after each restart. Update the portproxy rules with the new IP when you restart WSL2.

Step 3: Verify Port Forwarding#

netsh interface portproxy show v4tov4

You should refer to entries for ports 9000 and 50051 connecting to <wsl2-ip>.

Step 4: Allow Traffic Through Windows Firewall#

In PowerShell as Administrator:

New-NetFirewallRule -DisplayName "Allow NIM HTTP on Port 9000" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 9000
New-NetFirewallRule -DisplayName "Allow NIM gRPC on Port 50051" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 50051

These commands create inbound firewall rules that allow external traffic on the NIM ports. Without these rules, the Windows firewall blocks incoming connections even though port forwarding is configured.

Step 5: Run the Client from the Remote Machine#

On the client machine, use the Windows host’s IP (not the WSL2 IP) in all commands. Replace <windows-host-ip> with the Windows machine’s network IP.

Verify the service is reachable from the remote machine:

curl -X 'GET' 'http://<windows-host-ip>:9000/v1/health/ready'

Run inference using the examples for your service:

python3 python-clients/scripts/asr/transcribe_file.py \
  --server <windows-host-ip>:50051 \
  --language-code en-US --automatic-punctuation \
  --input-file en-US_sample.wav
python3 python-clients/scripts/tts/talk.py --server <windows-host-ip>:50051 \
  --language-code en-US \
  --text "Hello from the client machine." \
  --voice Magpie-Multilingual.EN-US.Aria \
  --output output.wav

What You Learned#

In this guide, you have learned the following:

  • Configured WSL2 memory to support NIM containers, understanding that insufficient memory causes model loading failures.

  • Selected smaller model profiles (bs=1, single inference mode) appropriate for the WSL2 environment.

  • Set up port forwarding from Windows to WSL2, understanding that WSL2’s NAT networking requires explicit forwarding for external access.

  • Configured firewall rules to allow external traffic on NIM ports.

  • Verified the complete path from a remote client through Windows port forwarding to the NIM running inside WSL2.