3. Specify Dashboard Configurations#

The dashboard includes two tabs: AI Training and AI Inference.

3.1. AI Training#

The AI Training tab provides three configuration guidance cards:

  • Why Precision Matters: Compare how different precision modes (for example, BF16 and FP8) affect training speed, memory use, and compute costs.

  • Why Framework Version Matters: Evaluate how framework versions influence accuracy, speed, and reproducibility.

  • Why Cluster Size Matters: Analyze how GPU cluster sizes can impact throughput and project cost.

_images/image_2.png

To add custom metrics, choose Explore Results. Use these metrics to evaluate your LLM’s performance during training:

3.1.1. AI Training Metrics#

Training Metrics#

Metric

Configuration Example

Optimization Guidance

Model

Maxtext_GPT3_175B

Select a model architecture and size to balance training time, accuracy, and resource usage. Use pipeline parallel optimization for large models.

Evaluate

By CSP/System Config

Select the performance indicator to measure training efficiency.

Number of GPUs

256

Scale GPU count based on budget, deadlines, and performance goals.

Cloud Service Provider / System Config

Native GCP, Google TPU v5p

Optimize infrastructure by evaluating network latency, resource availability, and compatibility.

Framework Version

24.08.01

Select a software version to ensure compatibility and reproducibility. Validate framework-specific optimizations.

Precision

BF16

Specify data precision to balance accuracy, training speed, and memory usage. Weigh performance gains against detail loss.

KPI Number of Tokens to Train (Trillion)

15

Scale token count based on model capacity and convergence behavior while avoiding overfitting.

_images/image_3.png

3.2. AI Inference#

The AI Inference tab provides four configuration guidance cards. Each card describes a unique use case or application:

  • Text Summarization: Generate concise summaries from lengthy documents.

  • Data Generation: Create synthetic data (structured or unstructured) using prompts.

  • Chat/QA: Deploy chatbots or QA systems to process user queries.

  • Translation: Translate text from one language to another.

_images/image_4.png

To add custom metrics, choose Explore Results. Use these metrics to evaluate your LLM’s performance during inference when your trained model makes predictions on new data:

3.2.1. AI Inference Metrics#

Inference Metrics#

Metric

Configuration Example

Optimization Guidance

Evaluate

By models

Select a key performance indicator to identify inference efficiency. Prioritize based on your requirements for speed, response time, and prediction quality.

Model

Llama 3.1_8B_Instruct:1.2.2

Choose a model to evaluate inference performance. Verify model versions align with your accuracy, speed, and resource constraints.

Precision

BF16

Specify data precision for calculations during inference. Use FP8/BF16 to accelerate inference. Balance speed gains against potential accuracy tradeoffs.

Input and Output Lengths (tokens)

512 in → 2000 out

Configure sequence lengths based on your task requirements. Longer sequences improve context but require more memory.

Concurrency

50

Adjust based on hardware capacity and response time requirements.

GPU Type

NVIDIA H100 80GB HBM3

Choose the GPU type to specify your hardware accelerator. Use H100/Tensor Core GPUs for large workloads and monitor cloud costs.

_images/image_5.png

3.3. View Results#

After selecting metrics, choose View Result to analyze your benchmarking outcomes.

3.3.1. Explore Benchmarking Results#

Analyze training results using “Total Cost to Train” and “Total Time to Train” metrics. These appear as bar charts to help you quickly identify trends and cost/time tradeoffs.

_images/image_6.png

Total costs use chip and platform list prices. To adjust pricing directly in the graphs and refine analysis with custom comparisons, choose Edit Cost above the chart or click inside the chart area to bring up the Edit Cost Dialog.

_images/image_7.png

You can evaluate inference performance across different configuration setups. Adjust parameters in real-time to test scenarios and optimize results.

For support resources, go to the Get Help tab.