`core.inference.async_stream`#

Module Contents#

Classes#

AsyncStream

Class for encapsulating an asynchronous stream of InferenceRequest outputs.

Data#

STOP_ITERATION

API#

core.inference.async_stream.STOP_ITERATION#: ‘Exception(…)’

class core.inference.async_stream.AsyncStream( request_id: int, cancel: Callable[[str], None], loop: Optional[asyncio.AbstractEventLoop] = None, )#

Class for encapsulating an asynchronous stream of InferenceRequest outputs.

Adopted from https://github.com/vllm-project/vllm/blob/eb881ed006ca458b052905e33f0d16dbb428063a/vllm/v1/engine/async_stream.py # pylint: disable=line-too-long

Initialization

put( item: Union[megatron.core.inference.inference_request.InferenceRequest, Exception], ) → None#: Adds a new value to the stream

finish( exception: Optional[Union[BaseException, Type[BaseException]]] = None, ) → None#: Completes the stream by adding a sentinel value

property finished: bool#: Whether the stream has finished

async generator() → AsyncGenerator[megatron.core.inference.inference_request.InferenceRequest, None]#: Creates an AsyncGenerator over the stream queue

static _is_raisable(value: Any)#

core.inference.async_stream#

Module Contents#

Classes#

Data#

API#

`core.inference.async_stream`#