core.inference.apis.serve_config#
Module Contents#
Classes#
Programmatic configuration for |
API#
- class core.inference.apis.serve_config.ServeConfig#
Programmatic configuration for
MegatronAsyncLLM.serve(...).This dataclass also serves as the future source of truth for a
megatron serveCLI. It controls only the HTTP serving surface; engine construction and coordinator addressing are configured separately via theMegatronLLM/MegatronAsyncLLMconstructor.- host: str#
‘0.0.0.0’
HTTP bind host for the OpenAI-compatible frontend.
Distinct from the
MegatronLLM/MegatronAsyncLLMconstructor’scoordinator_hostargument:coordinator_hostis the internal/routable address used for coordinator ZMQ traffic, whereashostis the externally-visible interface where the HTTP server accepts client connections.
- port: int#
5000
HTTP bind port for the OpenAI-compatible frontend.
- parsers: list[str]#
‘field(…)’
Response parser names to enable on the HTTP frontend.
Examples include
["json", "tool_use"]. Values are passed through to the underlying text-generation server unchanged.
- verbose: bool#
False
Whether the HTTP frontend should log per-request detail.
- frontend_replicas: int#
4
Number of HTTP frontend processes spawned on the primary rank.
The default of 4 matches the existing
start_text_gen_serverdefault ofnum_replicas=4.