aistore.sdk.cluster

View as Markdown

Module Contents

Classes

NameDescription
ClusterA class representing a cluster bound to an AIS client.

Data

logger

API

class aistore.sdk.cluster.Cluster(
client: aistore.sdk.request_client.RequestClient
)

A class representing a cluster bound to an AIS client.

client

Client this cluster uses to make requests

aistore.sdk.cluster.Cluster._get_smap()
aistore.sdk.cluster.Cluster._get_targets()
aistore.sdk.cluster.Cluster._request_log(
node_id: str,
severity: str,
all_logs: bool = False
) -> requests.Response

Make the HTTP request to fetch a node’s log.

Parameters:

node_id
str

Daemon ID of the node.

severity
str

Log severity string value.

all_logs
boolDefaults to False

If True, fetch all rotated logs as TAR.GZ archive.

aistore.sdk.cluster.Cluster.get_cluster_logs(
severity: aistore.sdk.enums.LogSeverity = LogSeverity.INFO,
role: aistore.sdk.enums.NodeFilter = NodeFilter.TARGET
) -> typing.Dict[str, str]

Get logs from all nodes of a given role.

Parameters:

severity
LogSeverityDefaults to LogSeverity.INFO

Log severity level (default: LogSeverity.INFO).

role
NodeFilterDefaults to NodeFilter.TARGET

Node filter (default: NodeFilter.TARGET).

Returns: Dict[str, str]

Dict[str, str]: Mapping of node ID to log content.

aistore.sdk.cluster.Cluster.get_info() -> aistore.sdk.types.Smap

Returns state of AIS cluster, including the detailed information about its nodes.

Returns: Smap

aistore.sdk.types.Smap: Smap containing cluster information

Raises:

  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.ReadTimeout: Timed out waiting response from AIStore
aistore.sdk.cluster.Cluster.get_node_log(
node_id: str,
severity: aistore.sdk.enums.LogSeverity = LogSeverity.INFO
) -> str

Get the current log from a specific cluster node (target or proxy).

Parameters:

node_id
str

Daemon ID of the node (e.g., “hHQZBnBQ”).

severity
LogSeverityDefaults to LogSeverity.INFO

Log severity level (default: LogSeverity.INFO).

Returns: str

Current log content as text.

aistore.sdk.cluster.Cluster.get_node_log_archive(
node_id: str,
severity: aistore.sdk.enums.LogSeverity = LogSeverity.INFO
) -> bytes

Download a TAR.GZ archive of all rotated logs from a specific node.

TODO: stream the archive instead of loading into memory for large logs.

Parameters:

node_id
str

Daemon ID of the node (e.g., “hHQZBnBQ”).

severity
LogSeverityDefaults to LogSeverity.INFO

Log severity level (default: LogSeverity.INFO).

Returns: bytes

TAR.GZ archive containing all rotated log files. Loaded into memory; typically ~5MB compressed per node.

aistore.sdk.cluster.Cluster.get_performance() -> typing.Dict

Retrieves the raw performance and status data from each target node in the AIStore cluster.

Returns: Dict

A dictionary where each key is the ID of a target node and each value is the raw AIS performance/status JSON returned by that node (for more information, see https://aistore.nvidia.com/docs/monitoring-metrics#target-metrics).

Raises:

  • requests.RequestException: If there’s an ambiguous exception while processing the request
  • requests.ConnectionError: If there’s a connection error with the cluster
  • requests.ConnectionTimeout: If the connection to the cluster times out
  • requests.ReadTimeout: If the timeout is reached while awaiting a response from the cluster
aistore.sdk.cluster.Cluster.get_primary_url() -> str

Returns: URL of primary proxy

aistore.sdk.cluster.Cluster.get_uuid() -> str

Returns: UUID of AIStore Cluster

aistore.sdk.cluster.Cluster.is_ready() -> bool

Checks if cluster is ready or still setting up.

Returns: bool

True if cluster is ready, or false if cluster is still setting up

aistore.sdk.cluster.Cluster.list_buckets(
provider: typing.Union[str, aistore.sdk.provider.Provider] = Provider.AIS
)

Returns list of buckets in AIStore cluster.

Parameters:

provider
str or ProviderDefaults to Provider.AIS

Provider of bucket (one of “ais”, “aws”, “gcp”, …). Defaults to “ais”. Empty provider returns buckets of all providers.

Returns:

List[BucketModel]: A list of buckets

Raises:

  • requests.RequestException: “There was an ambiguous exception that occurred while handling…”
  • requests.ConnectionError: Connection error
  • requests.ConnectionTimeout: Timed out connecting to AIStore
  • requests.ReadTimeout: Timed out waiting response from AIStore
aistore.sdk.cluster.Cluster.list_etls(
stages: typing.Optional[typing.List[str]] = None
) -> typing.List[aistore.sdk.types.ETLInfo]

Lists ETLs filtered by their stages.

Parameters:

stages
List[str]Defaults to None

List of stages to filter ETLs by. Defaults to [“running”].

Returns: List[ETLInfo]

List[ETLInfo]: A list of details on ETLs matching the specified stages

aistore.sdk.cluster.Cluster.list_jobs_status(
job_kind = '',
target_id = ''
) -> typing.List[aistore.sdk.types.JobStatus]

List the status of jobs on the cluster

Parameters:

job_kind
strDefaults to ''

Only show jobs of a particular type

target_id
strDefaults to ''

Limit to jobs on a specific target node

Returns: List[JobStatus]

List of JobStatus objects

aistore.sdk.cluster.Cluster.list_running_jobs(
job_kind = '',
target_id = ''
) -> typing.List[str]

List the currently running jobs on the cluster

Parameters:

job_kind
strDefaults to ''

Only show jobs of a particular type

target_id
strDefaults to ''

Limit to jobs on a specific target node

Returns: List[str]

List of jobs in the format job_kind[job_id]

aistore.sdk.cluster.logger = logging.getLogger('cluster')