Log Collection Guide#

Understanding Log Collection Groups#

NVDebug organizes log collection into the following groups:

  • Redfish Logs: Out-of-band collection using Redfish APIs.

  • IPMI Logs: System management data using IPMI commands.

  • SSH Logs: Direct BMC access through SSH.

  • Host Logs: Operating system and hardware logs from the host.

  • HealthCheck: System health verification and diagnostics.

Collection Levels#

NVDebug supports different collection levels to control the scope of log collection:

  • (Default): All necessary collectors. Always included.

  • (-V): Default Log Collections + Increased Log Collection Level.

  • (-VV): Default Log Collections + Increased Log Collection Level + Additional Collectors that can take a very long time (potentially hours) to run.

Note

Refer to Appendix A for more information about log collection levels for each collector.

You can specify the collection level using the -V flag:

# Default collection
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM>

# Increased Log Collection Level with -V
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM> -V

# Additional Collectors with -VV
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM> -VV

Collection Strategies#

Single Node Collection#

You can collect logs from a system using only the CLI interface.

nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM>

Multi-Node Collection#

For multi-node collection, use the YAML configuration files:

  1. Create a DUT configuration file based on the templates in the nvdebug archive. Additional details can be found in the configuration_guide section.

  2. Use the --dutconfig parameter to specify the path to the DUT configuration file.

  3. Logs are collected in parallel for improved efficiency.

nvdebug --dutconfig multi_node_config.yaml

Best Practices#

Storage Management#

  • Ensure sufficient disk space for the collection archive on heavily used rack systems. We recommend a minimum of 2GB.

  • To avoid disk space exhaustion, monitor archive sizes.

  • Use the following appropriate compression options:

    # Set custom archive split threshold
    nvdebug [...] --zipsplit-threshold 500
    

Performance Optimization#

  • System Compute Resources
    • Ensure sufficient CPU and memory resources are available for the collection.

    • Consider the collection duration and the number of nodes to be collected.

    • The greater the number of nodes and the longer the collection duration, the more resources are required.

    • The number of available threads correlate to the speed at which collection is completed. NVDebug parallelizes collection across multiple threads as efficiently as possible.

  • Collection Timing
    • Schedule during maintenance windows to avoid impacting other system operations.

    • Avoid peak usage periods to ensure stable collection performance.

    • Consider time zone differences for global deployments.

  • Resource Usage
    • Monitor system load during collection to avoid impacting other system operations.

    • Balance parallel collections based on network capacity.

    • Use appropriate log levels based on urgency.

  • Network Considerations
    • Ensure stable network connectivity between the collector and the collection targets.

    • Account for firewall rules that might block collection.

    • Consider bandwidth limitations that might impact collection performance.

Security Best Practices#

  • Access Control
    • Use dedicated service accounts for collection.

    • Implement proper credential management for the service account.

    • Follow principle of least privilege for the service account.

  • Data Protection
    • Enable log sanitization to remove sensitive information.

    • Secure log storage to prevent unauthorized access.

    • Implement retention policies to manage log storage.

  • Network Security
    • Use secure protocols for collection.

    • Implement proper firewall rules to block unauthorized access.

    • Monitor access attempts to the collection service.

Log Analysis Guidelines#

  • Initial Assessment
    • Use HTML reports to review collection summary. Refer to the tool_execution_summary for more information.

    • Identify collection failures by checking the the Execution Summary Report or the nvdebug_logs_<timestamp>_summary.txt file in the collection archive or the HTML report interface.

  • Error Investigation
    • Cross-reference timestamps with the Execution Summary Report to identify the source of the error.

    • Review related log entries in the nvdebug_logs_<timestamp>.zip archive.

    • Check hardware state changes in the nvdebug_logs_<timestamp>.zip archive.

    • Review the nvdebug_logs_<timestamp>_summary.txt file for a list of failed collections. The HTML report interface will also display a list of failed collections in a table.