Patronus Lynx Deployment#
vLLM#
Lynx is fully open source, so you can host it however you like. One simple way is using vLLM.
- Get access to Patronus Lynx on HuggingFace. See here for the 70B parameters variant, and here for the 8B parameters variant. The examples below use the - 70Bparameters model, but there’s no additional configuration to deploy the smaller model, so you can swap the model name references out with- 8B.
- Log in to Hugging Face 
huggingface-cli login
- Install vLLM and spin up a server hosting Patronus Lynx 
pip install vllm
python -m vllm.entrypoints.openai.api_server --port 5000 --model PatronusAI/Patronus-Lynx-70B-Instruct
This will launch the vLLM inference server on http://localhost:5000/. You can use the OpenAI API spec to send it a cURL request to make sure it works:
curl http://localhost:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
  "model": "PatronusAI/Patronus-Lynx-70B-Instruct",
  "messages": [
   {"role": "user", "content": "What is a hallucination?"},
  ]
}'
- Create a model called - patronus_lynxin your- config.ymlfile, setting the host and port to what you set it as above. If the vLLM is running on a different server from- nemoguardrails, you’ll have to replace- localhostwith the vLLM server’s address. Check out the guide here for more information.
Ollama#
You can also run Patronus Lynx 8B on your personal computer using Ollama!
- Install Ollama: https://ollama.com/download. 
- Get access to a GGUF quantized version of Lynx 8B on Huggingface. Check it out here. 
- Download the gguf model from the repository here. This may take a few minutes. 
- Create a file called - Modelfilewith the following contents:
 FROM "./patronus-lynx-8b-instruct-q4_k_m.gguf"
 PARAMETER stop "<|im_start|>"
 PARAMETER stop "<|im_end|>"
 TEMPLATE """
 <|im_start|>system
 {{ .System }}<|im_end|>
 <|im_start|>user
 {{ .Prompt }}<|im_end|>
 <|im_start|>assistant
Ensure that the FROM field correctly points to the patronus-lynx-8b-instruct-q4_k_m.gguf file you downloaded in Step 3.
- Run - ollama create patronus-lynx-8b -f Modelfile.
- Run - ollama run patronus-lynx-8b. You should now be able to chat with- patronus-lynx-8b!
- Create a model called - patronus_lynxin your- config.ymlfile, like this:
models:
  ...
  - type: patronus_lynx
    engine: ollama
    model: patronus-lynx-8b
    parameters:
      base_url: "http://localhost:11434"
Check out the guide here for more information.