Log Collection Guide#

Understanding Log Collection Groups#

NVDebug organizes log collection into the following groups:

Redfish Logs: Out-of-band collection using Redfish APIs.
IPMI Logs: System management data using IPMI commands.
SSH Logs: Direct BMC access through SSH.
Host Logs: Operating system and hardware logs from the host.
HealthCheck: System health verification and diagnostics.

Collection Levels#

NVDebug supports different collection levels to control the scope of log collection:

(Default): All necessary collectors. Always included.
(-V): Default Log Collections + Increased Log Collection Level.
(-VV): Default Log Collections + Increased Log Collection Level + Additional Collectors that can take a very long time (potentially hours) to run.

Note

Refer to Appendix A for more information about log collection levels for each collector.

You can specify the collection level using the -V flag:

# Default collection
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM>

# Increased Log Collection Level with -V
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM> -V

# Additional Collectors with -VV
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM> -VV

Collection Strategies#

Single Node Collection#

You can collect logs from a system using only the CLI interface.

nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM>

Multi-Node Collection#

For multi-node collection, use the YAML configuration files:

Create a DUT configuration file based on the templates in the nvdebug archive. Additional details can be found in the configuration_guide section.
Use the --dutconfig parameter to specify the path to the DUT configuration file.
Logs are collected in parallel for improved efficiency.

nvdebug --dutconfig multi_node_config.yaml

Best Practices#

Storage Management#

Ensure sufficient disk space for the collection archive on heavily used rack systems. We recommend a minimum of 2GB.
To avoid disk space exhaustion, monitor archive sizes.

Use the following appropriate compression options:

# Set custom archive split threshold
nvdebug [...] --zipsplit-threshold 500

Performance Optimization#

System Compute Resources
- Ensure sufficient CPU and memory resources are available for the collection.
- Consider the collection duration and the number of nodes to be collected.
- The greater the number of nodes and the longer the collection duration, the more resources are required.
- The number of available threads correlate to the speed at which collection is completed. NVDebug parallelizes collection across multiple threads as efficiently as possible.
Collection Timing
- Schedule during maintenance windows to avoid impacting other system operations.
- Avoid peak usage periods to ensure stable collection performance.
- Consider time zone differences for global deployments.
Resource Usage
- Monitor system load during collection to avoid impacting other system operations.
- Balance parallel collections based on network capacity.
- Use appropriate log levels based on urgency.
Network Considerations
- Ensure stable network connectivity between the collector and the collection targets.
- Account for firewall rules that might block collection.
- Consider bandwidth limitations that might impact collection performance.

Security Best Practices#

Access Control
- Use dedicated service accounts for collection.
- Implement proper credential management for the service account.
- Follow principle of least privilege for the service account.
Data Protection
- Enable log sanitization to remove sensitive information.
- Secure log storage to prevent unauthorized access.
- Implement retention policies to manage log storage.
Network Security
- Use secure protocols for collection.
- Implement proper firewall rules to block unauthorized access.
- Monitor access attempts to the collection service.

Log Analysis Guidelines#

Initial Assessment
- Use HTML reports to review collection summary. Refer to the tool_execution_summary for more information.
- Identify collection failures by checking the the Execution Summary Report or the nvdebug_logs_<timestamp>_summary.txt file in the collection archive or the HTML report interface.
Error Investigation
- Cross-reference timestamps with the Execution Summary Report to identify the source of the error.
- Review related log entries in the nvdebug_logs_<timestamp>.zip archive.
- Check hardware state changes in the nvdebug_logs_<timestamp>.zip archive.
- Review the nvdebug_logs_<timestamp>_summary.txt file for a list of failed collections. The HTML report interface will also display a list of failed collections in a table.