aistore.sdk.batch.extractor.archive_stream_extractor

View as Markdown

Module Contents

Classes

NameDescription
ArchiveStreamExtractorParent class for extracting batch archive streams from AIStore.

Data

logger

API

class aistore.sdk.batch.extractor.archive_stream_extractor.ArchiveStreamExtractor()
Abstract

Parent class for extracting batch archive streams from AIStore.

Integrates with MossReq/MossResp (Multi-Object Streaming Service) objects to provide proper metadata mapping.

_supported_fmts
= tuple()
aistore.sdk.batch.extractor.archive_stream_extractor.ArchiveStreamExtractor._get_moss_out(
index: int,
content_length: int,
moss_req: aistore.sdk.batch.types.MossReq,
moss_resp: typing.Optional[aistore.sdk.batch.types.MossResp] = None
) -> aistore.sdk.batch.types.MossOut

Get MossOut for the current file being extracted.

In multipart mode (when moss_resp is provided), uses the actual response metadata. In streaming mode, infers MossOut from the request as streaming mode streams only content (no metadata).

Parameters:

index
int

Index of the file in the batch

content_length
int

Length of file content in bytes (used to set size in streaming mode)

moss_req
MossReq

Original batch request

moss_resp
Optional[MossResp]Defaults to None

Response metadata (None for streaming mode)

Returns: MossOut

Response metadata for this file

aistore.sdk.batch.extractor.archive_stream_extractor.ArchiveStreamExtractor._handle_extraction_error(
filename: str,
error: Exception,
moss_req: aistore.sdk.batch.types.MossReq,
archive_type: str
) -> None

Handle individual file extraction errors.

If cont_on_err is enabled, logs the error and allows continuation. Otherwise, raises a RuntimeError.

Parameters:

filename
str

Name of the file that failed to extract

error
Exception

The exception that occurred

moss_req
MossReq

Original batch request

archive_type
str

Type of archive (e.g., ‘tar’, ‘zip’)

Raises:

  • RuntimeError: If cont_on_err is False
aistore.sdk.batch.extractor.archive_stream_extractor.ArchiveStreamExtractor.extract(
response: requests.Response,
data_stream: typing.Union[io.BytesIO, typing.Any],
moss_req: aistore.sdk.batch.types.MossReq,
moss_resp: typing.Optional[aistore.sdk.batch.types.MossResp] = None
) -> typing.Generator[typing.Tuple[aistore.sdk.batch.types.MossOut, bytes], None, None]
abstract

Extract from archive stream and yield individual file contents.

Sequentially streams the archive to avoid memory-intensive buffering.

Parameters:

response
Response

HTTP response object containing connection for stream

data_stream
Union[BytesIO, Any]

Archive data stream or bytes

moss_req
MossReq

Request that fetched the archive

moss_resp
Optional[MossResp]Defaults to None

Response metadata (None if streaming mode)

Raises:

  • RuntimeError: If stream extraction fails
aistore.sdk.batch.extractor.archive_stream_extractor.ArchiveStreamExtractor.get_supported_formats() -> typing.Tuple[str, ...]

Get formats supported by this extractor.

Returns: Tuple[str, ...]

Tuple[str]: Tuple of support formats

aistore.sdk.batch.extractor.archive_stream_extractor.logger = get_logger(__name__)