aistore.sdk.batch.extractor.archive_stream_extractor

Module Contents

Classes

Name	Description
`ArchiveStreamExtractor`	Parent class for extracting batch archive streams from AIStore.

Data

logger

API

class aistore.sdk.batch.extractor.archive_stream_extractor.ArchiveStreamExtractor()

Abstract

Parent class for extracting batch archive streams from AIStore.

Integrates with MossReq/MossResp (Multi-Object Streaming Service) objects to provide proper metadata mapping.

_supported_fmts

= tuple()

aistore.sdk.batch.extractor.archive_stream_extractor.ArchiveStreamExtractor._get_moss_out(
    index: int,
    content_length: int,
    moss_req: aistore.sdk.batch.types.MossReq,
    moss_resp: typing.Optional[aistore.sdk.batch.types.MossResp] = None
) -> aistore.sdk.batch.types.MossOut

Get MossOut for the current file being extracted.

In multipart mode (when moss_resp is provided), uses the actual response metadata. In streaming mode, infers MossOut from the request as streaming mode streams only content (no metadata).

Parameters:

index

int

Index of the file in the batch

content_length

int

Length of file content in bytes (used to set size in streaming mode)

moss_req

MossReq

Original batch request

moss_resp

Optional[MossResp]Defaults to None

Response metadata (None for streaming mode)

Returns: MossOut

Response metadata for this file

aistore.sdk.batch.extractor.archive_stream_extractor.ArchiveStreamExtractor._handle_extraction_error(
    filename: str,
    error: Exception,
    moss_req: aistore.sdk.batch.types.MossReq,
    archive_type: str
) -> None

Handle individual file extraction errors.

If cont_on_err is enabled, logs the error and allows continuation. Otherwise, raises a RuntimeError.

Parameters:

filename

str

Name of the file that failed to extract

error

Exception

The exception that occurred

moss_req

MossReq

Original batch request

archive_type

str

Type of archive (e.g., ‘tar’, ‘zip’)

Raises:

RuntimeError: If cont_on_err is False

aistore.sdk.batch.extractor.archive_stream_extractor.ArchiveStreamExtractor.extract(
    response: requests.Response,
    data_stream: typing.Union[io.BytesIO, typing.Any],
    moss_req: aistore.sdk.batch.types.MossReq,
    moss_resp: typing.Optional[aistore.sdk.batch.types.MossResp] = None
) -> typing.Generator[typing.Tuple[aistore.sdk.batch.types.MossOut, bytes], None, None]

abstract

Extract from archive stream and yield individual file contents.

Sequentially streams the archive to avoid memory-intensive buffering.

Parameters:

response

Response

HTTP response object containing connection for stream

data_stream

Union[BytesIO, Any]

Archive data stream or bytes

moss_req

MossReq

Request that fetched the archive

moss_resp

Optional[MossResp]Defaults to None

Response metadata (None if streaming mode)

Raises:

RuntimeError: If stream extraction fails

aistore.sdk.batch.extractor.archive_stream_extractor.ArchiveStreamExtractor.get_supported_formats() -> typing.Tuple[str, ...]

Get formats supported by this extractor.

Returns: Tuple[str, ...]

Tuple[str]: Tuple of support formats

aistore.sdk.batch.extractor.archive_stream_extractor.logger = get_logger(__name__)