Quick Start Guide#
In this guide, we will walk you through the process of building a simple speech-to-speech voice assistant pipeline using NVIDIA Pipecat along with the Pipecat-AI library and deploying it for testing.
Installing the NVIDIA Pipecat Library#
Create a Python virtual environment and use the pip
command to install the nvidia-pipecat package.
python -m venv venv
source venv/bin/activate
pip install nvidia-pipecat
Note
The nvidia-pipecat package requires Python version 3.12. The source code of the nvidia-pipecat package is available in the ACE Controller GitHub repository.
Building a Basic Pipeline#
In this section, we will showcase how to build a simple speech-to-speech voice assistant pipeline using nvidia-pipecat along with the pipecat-ai library and deploy it for testing. This pipeline will use WebSocket-based ACETransport, Riva ASR and TTS models, and NVIDIA LLM Service. It is recommended to first follow the Pipecat documentation or the Pipecat Overview section to understand core concepts.
Note
The source code for this example is available in the ACE Controller GitHub repository.
Configuring the Pipeline#
Create a Python file containing the pipeline code.
Create a file named
bot.py
and import all necessary frame processors from thepipecat
andnvidia_pipecat
packages to set up the pipeline.Configure the pipeline as follows:
bot.py## Import necessary libraries ... async def create_pipeline_task(pipeline_metadata: PipelineMetadata): # Using a transport for audio input/output transport = ACETransport( websocket=pipeline_metadata.websocket, params=ACETransportParams( vad_enabled=True, vad_analyzer=SileroVADAnalyzer(), vad_audio_passthrough=True, ), ) #ACETransport supports both RTSP and WebSocket protocols for communication and enables pipeline scaling to handle multiple streams simultaneously # Setting up a ASR service stt = RivaASRService(api_key=os.getenv("NVIDIA_API_KEY")) # Setting up a TTS service tts = RivaTTSService(api_key=os.getenv("NVIDIA_API_KEY")) #RivaASRService and RivaTTSService support both NVCF hosted and RivaSkills local deployment endpoints # Setting up a LLM service llm = NvidiaLLMService(api_key=os.getenv("NVIDIA_API_KEY")) #NvidiaLLMService supports connecting to both NIM-hosted models and locally deployed NIM LLMs. # Create context for the LLM messages = [ { "role": "system", "content": "You are a helpful assistant. Always answer as helpful, friendly and polite. Respond with one sentence or less than 75 characters. Do not respond with a bulleted or numbered list. Your output will be converted to audio so don't include special characters in your answers.", }, ] context = OpenAILLMContext(messages) context_aggregator = llm.create_context_aggregator(context) # Creating a simple processing pipeline pipeline = Pipeline([ transport.input(), # Websocket input from client stt, # Speech-To-Text context_aggregator.user(), llm, # LLM tts, # Text-To-Speech transport.output(), # Websocket output to client context_aggregator.assistant(), ] ) task = PipelineTask( pipeline,) # Handling client events and generating responses @transport.event_handler("on_client_connected") async def on_client_connected(transport, client): # Kick off the conversation. messages.append({"role": "system", "content": "Please introduce yourself to the user."}) await task.queue_frames([LLMMessagesFrame(messages)]) return task # Initialize the FastAPI application and configure routing app = FastAPI() app.include_router(websocket_router) runner = ACEPipelineRunner(pipeline_callback=create_pipeline_task) # Connecting to WebUI Client app.mount("/static", StaticFiles(directory=os.path.join(os.path.dirname(__file__), "../static")), name="static") # Managing application lifecycle if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=8100)
If you have tried deploying pipecat pipelines in the past, notice the following differences from the above code:
FastAPI Server: We use FastAPI http and WebSocket server to support scaling of pipelines to multiple users.
ACEPipelineRunner: Instead of the Pipecat PipelineRunner, we use ACEPipelineRunner to manage multiple instances of pipeline along with FastAPI router APIs. ACEPipelineRunner expects a function as input which creates new instances of Pipeline and PipelineTask on demand.
Refer to the NVIDIA Speech-to-Speech Example workflow for source code and detailed deployment steps.
Set up a Web UI interface. To create a simple Web UI interface for your application, use the provided frames.proto and index.html files. Copy these files into the
../static
directory or provide your own custom FastAPI app mount directory in the above code. These files are essential for establishing a WebSocket connection and handling audio data transmission between the client and server.frames.proto
defines the Protocol Buffers (protobuf) message structures used for encoding and decoding audio and text data.index.html
provides the web interface for capturing audio from the user’s microphone and sending it to the server via WebSocket.
Running the Pipeline#
Set up the environment variables. Export all required environment variables for the necessary API keys. This ensures that the application can access the services it needs. For example:
export NVIDIA_API_KEY = nvapi-…
Start the application. Run the Python file to start the FastAPI application and initialize the pipeline.
python bot.py
Interacting with the WebUI Client#
Access the WebUI. Open your web browser and navigate to http://localhost:8100/static/index.html.
Configure your browser:
Grant the application permission to access your microphone and speakers to enable audio input and output.
If you encounter issues, ensure that you add
http://localhost:8100
to your browser’s allowed origins. This may require adjusting flags in Chrome or Edge.