core.inference.async_stream#
Module Contents#
Classes#
Class for encapsulating an asynchronous stream of InferenceRequest outputs. |
Data#
API#
- core.inference.async_stream.STOP_ITERATION#
‘Exception(…)’
- class core.inference.async_stream.AsyncStream(
- request_id: int,
- cancel: Callable[[str], None],
- loop: Optional[asyncio.AbstractEventLoop] = None,
Class for encapsulating an asynchronous stream of InferenceRequest outputs.
Adopted from https://github.com/vllm-project/vllm/blob/eb881ed006ca458b052905e33f0d16dbb428063a/vllm/v1/engine/async_stream.py # pylint: disable=line-too-long
Initialization
- put(
- item: Union[megatron.core.inference.inference_request.InferenceRequest, Exception],
Adds a new value to the stream
- finish(
- exception: Optional[Union[BaseException, Type[BaseException]]] = None,
Completes the stream by adding a sentinel value
- property finished: bool#
Whether the stream has finished
- async generator() AsyncGenerator[megatron.core.inference.inference_request.InferenceRequest, None]#
Creates an AsyncGenerator over the stream queue
- static _is_raisable(value: Any)#