Log Collection Guide#
Understanding Log Collection Groups#
NVDebug organizes log collection into the following groups:
Redfish Logs: Out-of-band collection using Redfish APIs.
IPMI Logs: System management data using IPMI commands.
SSH Logs: Direct BMC access through SSH.
Host Logs: Operating system and hardware logs from the host.
HealthCheck: System health verification and diagnostics.
Collection Levels#
NVDebug supports different collection levels to control the scope of log collection:
(Default): All necessary collectors. Always included.
(-V): Default Log Collections + Increased Log Collection Level.
(-VV): Default Log Collections + Increased Log Collection Level + Additional Collectors that can take a very long time (potentially hours) to run.
Note
Refer to Appendix A
for more information about log collection levels for each collector.
You can specify the collection level using the -V
flag:
# Default collection
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM>
# Increased Log Collection Level with -V
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM> -V
# Additional Collectors with -VV
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM> -VV
Collection Strategies#
Single Node Collection#
You can collect logs from a system using only the CLI interface.
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM>
Multi-Node Collection#
For multi-node collection, use the YAML configuration files:
Create a DUT configuration file based on the templates in the
nvdebug
archive. Additional details can be found in theconfiguration_guide
section.Use the
--dutconfig
parameter to specify the path to the DUT configuration file.Logs are collected in parallel for improved efficiency.
nvdebug --dutconfig multi_node_config.yaml
Best Practices#
Storage Management#
Ensure sufficient disk space for the collection archive on heavily used rack systems. We recommend a minimum of 2GB.
To avoid disk space exhaustion, monitor archive sizes.
Use the following appropriate compression options:
# Set custom archive split threshold nvdebug [...] --zipsplit-threshold 500
Performance Optimization#
- System Compute Resources
Ensure sufficient CPU and memory resources are available for the collection.
Consider the collection duration and the number of nodes to be collected.
The greater the number of nodes and the longer the collection duration, the more resources are required.
The number of available threads correlate to the speed at which collection is completed. NVDebug parallelizes collection across multiple threads as efficiently as possible.
- Collection Timing
Schedule during maintenance windows to avoid impacting other system operations.
Avoid peak usage periods to ensure stable collection performance.
Consider time zone differences for global deployments.
- Resource Usage
Monitor system load during collection to avoid impacting other system operations.
Balance parallel collections based on network capacity.
Use appropriate log levels based on urgency.
- Network Considerations
Ensure stable network connectivity between the collector and the collection targets.
Account for firewall rules that might block collection.
Consider bandwidth limitations that might impact collection performance.
Security Best Practices#
- Access Control
Use dedicated service accounts for collection.
Implement proper credential management for the service account.
Follow principle of least privilege for the service account.
- Data Protection
Enable log sanitization to remove sensitive information.
Secure log storage to prevent unauthorized access.
Implement retention policies to manage log storage.
- Network Security
Use secure protocols for collection.
Implement proper firewall rules to block unauthorized access.
Monitor access attempts to the collection service.
Log Analysis Guidelines#
- Initial Assessment
Use HTML reports to review collection summary. Refer to the tool_execution_summary for more information.
Identify collection failures by checking the the
Execution Summary Report
or thenvdebug_logs_<timestamp>_summary.txt
file in the collection archive or the HTML report interface.
- Error Investigation
Cross-reference timestamps with the
Execution Summary Report
to identify the source of the error.Review related log entries in the
nvdebug_logs_<timestamp>.zip
archive.Check hardware state changes in the
nvdebug_logs_<timestamp>.zip
archive.Review the
nvdebug_logs_<timestamp>_summary.txt
file for a list of failed collections. The HTML report interface will also display a list of failed collections in a table.