Generic Deployment#

Generic deployment provides flexible configuration for deploying any custom server that isn’t covered by built-in deployment configurations.

Configuration#

See configs/deployment/generic.yaml for all available parameters.

Basic Settings#

Key arguments:

  • image: Docker image to use for deployment (required)

  • command: Command to run the server with template variables (required)

  • served_model_name: Name of the served model (required)

  • endpoints: API endpoint paths (chat, completions, health)

  • checkpoint_path: Path to model checkpoint for mounting (default: null)

  • extra_args: Additional command line arguments

  • env_vars: Environment variables as {name: value} dict

Best Practices#

  • Ensure server responds to health check endpoint (ensure that health endpoint is correctly parametrized)

  • Test configuration with --dry_run

Contributing Permanent Configurations#

If you’ve successfully applied the generic deployment to serve a specific model or framework, contributions are welcome! We’ll turn your working configuration into a permanent config file for the community.