nemo_automodel.components.loggers.metric_logger

View as Markdown

Module Contents

Classes

NameDescription
MetricLoggerSimple JSON Lines logger.
MetricLoggerDistRank-zero JSON Lines metric logger for distributed jobs.
MetricsSampleSingle timestamped metrics record.

Functions

NameDescription
build_metric_loggerBuild a local or distributed metric logger depending on distributed state.
stack_and_move_tensor_metrics_to_cpuConvert tensor metrics in buffered samples to CPU-backed scalar or list values.

API

class nemo_automodel.components.loggers.metric_logger.MetricLogger(
filepath: str,
flush: bool = False,
append: bool = True,
buffer_size: int = 100
)

Simple JSON Lines logger.

  • Appends one JSON object per line.
  • Thread-safe writes via an internal lock.
  • Creates parent directories as needed.
  • UTF-8 without BOM, newline per record.
_fp
= open(self.filepath, mode, encoding='utf-8')
_lock
= threading.Lock()
buffer
List[MetricsSample] = []
filepath
= os.path.abspath(filepath)
nemo_automodel.components.loggers.metric_logger.MetricLogger.__exit__(
exc_type: type[BaseException] | None,
exc: BaseException | None,
tb: typing.Any
) -> None
nemo_automodel.components.loggers.metric_logger.MetricLogger._move_to_cpu(
buffer: typing.List[nemo_automodel.components.loggers.metric_logger.MetricsSample]
) -> typing.List[str]
nemo_automodel.components.loggers.metric_logger.MetricLogger._save(
lines: typing.List[str]
) -> None
nemo_automodel.components.loggers.metric_logger.MetricLogger.close() -> None
nemo_automodel.components.loggers.metric_logger.MetricLogger.log(
record: nemo_automodel.components.loggers.metric_logger.MetricsSample
) -> None
class nemo_automodel.components.loggers.metric_logger.MetricLoggerDist(
filepath: str,
flush: bool = False,
append: bool = True
)

Bases: MetricLogger

Rank-zero JSON Lines metric logger for distributed jobs.

rank
= dist.get_rank()
world_size
= dist.get_world_size()
nemo_automodel.components.loggers.metric_logger.MetricLoggerDist.__exit__(
exc_type: type[BaseException] | None,
exc: BaseException | None,
tb: typing.Any
) -> None
nemo_automodel.components.loggers.metric_logger.MetricLoggerDist.close() -> None
nemo_automodel.components.loggers.metric_logger.MetricLoggerDist.log(
record: nemo_automodel.components.loggers.metric_logger.MetricsSample
) -> None
class nemo_automodel.components.loggers.metric_logger.MetricsSample(
step: int,
epoch: int,
metrics: typing.Dict[str, typing.Any] = dict(),
timestamp: str | None = None
)
Dataclass

Single timestamped metrics record.

epoch
int
metrics
Dict[str, Any] = field(default_factory=dict)
step
int
timestamp
str | None = None
nemo_automodel.components.loggers.metric_logger.MetricsSample.__post_init__() -> None
nemo_automodel.components.loggers.metric_logger.MetricsSample.to_dict() -> typing.Dict[str, typing.Any]
nemo_automodel.components.loggers.metric_logger.build_metric_logger(
filepath: str,
flush: bool = False,
append: bool = True
) -> nemo_automodel.components.loggers.metric_logger.MetricLogger

Build a local or distributed metric logger depending on distributed state.

nemo_automodel.components.loggers.metric_logger.stack_and_move_tensor_metrics_to_cpu(
metric_vector: typing.List[nemo_automodel.components.loggers.metric_logger.MetricsSample]
) -> typing.List[nemo_automodel.components.loggers.metric_logger.MetricsSample]

Convert tensor metrics in buffered samples to CPU-backed scalar or list values.