core.inference.async_stream#

Module Contents#

Classes#

AsyncStream

Class for encapsulating an asynchronous stream of InferenceRequest outputs.

Data#

API#

core.inference.async_stream.STOP_ITERATION#

‘Exception(…)’

class core.inference.async_stream.AsyncStream(
request_id: int,
cancel: Callable[[str], None],
loop: Optional[asyncio.AbstractEventLoop] = None,
)#

Class for encapsulating an asynchronous stream of InferenceRequest outputs.

Adopted from https://github.com/vllm-project/vllm/blob/eb881ed006ca458b052905e33f0d16dbb428063a/vllm/v1/engine/async_stream.py # pylint: disable=line-too-long

Initialization

put(
item: Union[megatron.core.inference.inference_request.InferenceRequest, Exception],
) None#

Adds a new value to the stream

finish(
exception: Optional[Union[BaseException, Type[BaseException]]] = None,
) None#

Completes the stream by adding a sentinel value

property finished: bool#

Whether the stream has finished

async generator() AsyncGenerator[megatron.core.inference.inference_request.InferenceRequest, None]#

Creates an AsyncGenerator over the stream queue

static _is_raisable(value: Any)#