image image image image image image



On This Page

Created on Nov 1, 2019

Introduction


Red Hat, NVIDIA and Mellanox are collaborating to provide a high-performance platform for HPC, Artificial Intelligence and Machine Learning workloads.

This is a Reference Deployment Guide (RDG) for Red Hat OpenShift Container Platform (RH OCP) v4.1 deployment over a bare metal user-provisioned infrastructure (UPI) designed for RDMA accelerated Machine Learning (ML) and Deep Learning(DL) applications over Mellanox InfiniBand fabric.

In this document we will go through the following:

  1. How to deploy RH OCP v4.1 over bare metal GPU-enabled nodes running RHEL 7.6
  2. How to run distributed TensorFlow benchmarks with Horovod framework over high-performance InfiniBand fabric.

High-performance InfiniBand fabric for RH OCP is currently a Technology Preview feature.

Red Hat does not recommend using Technology Preview features for production. These features are not supported with Red Hat production service level agreements (SLAs) and might not be final or functional. These features provide early access to upcoming product features and enable customers to test functions and provide feedback during the development process.

See the Red Hat Technology Preview features support scope for more information.

References

Components Overview

  • NVIDIA GPU 
    NVIDIA GPUs for servers are designed for the most demanding workloads of HPC, AI and Deep Learning applications. GPUs accelerate server computation capabilities while driving costs down. GPU-accelerated deep learning frameworks offer the flexibility to design and train custom deep neural networks.
    Every major deep learning framework such as TensorFlow, PyTorch and more, are already GPU-accelerated so that data scientists and researchers can get productive instantly without the need to program the GPU.
  • Red Hat OpenShift Container Platform (RH OCP)
    Red Hat OpenShift Container Platform (RH OCP) provides developers and IT organizations with a hybrid cloud application platform for deploying both new and existing applications on secure, scalable resources with minimal configuration and management overhead. 
    Built on Red Hat Enterprise Linux platform and Kubernetes, the OCP provides a more secure and scalable multi-tenant operating system for today’s enterprise-class applications, while delivering integrated application runtimes and libraries. 
  • Kubeflow
    Kubeflow is a cloud-native platform for machine learning applications which is based on Google’s internal machine learning pipelines. Find out more at kubeflow.org.

  • Horovod
    Horovod is a distributed training framework for TensorFlow, Keras, PyTorch, and MXNet. The goal of Horovod is to make distributed Deep Learning fast and easy to use.

  • TensorFlow
    TensorFlow is an open source software library developed by the Google Brain team to conduct machine learning and deep neural networks research.The library performs numerical computation by using data flow graphs, where the nodes in the graph represent mathematical operations and the graph edges represent the multi-dimensional data arrays (tensors) which communicate between the nodes. TensorFlow supports Cuda & cuDNN(req. registration). 
    This guide uses sources from TensorFlow website for an easier installation procedure.

  • Kubernetes RDMA device plugin
    The RDMA device plugin provides access to Kubernetes Worker node to share a single RDMA device (HCA) among multiple Pods running in a Kubernetes Worker node.
  • GPUDirect RDMA
    GPUDirect RDMA enables a direct P2P (Peer-to-Peer) path for data exchange between GPUs on the same or different hosts directly to/from Mellanox devices which utilize the RDMA protocol. This allows for a significant decrease in GPU-to-GPU communication latency and offloads the CPU completely, removing it from all GPU-to-GPU communications across the network.

    The GPUDirect RDMA technology works seamlessly with Mellanox ConnectX®-4 adapter cards (and later generations).

Solution Overview

Equipment

The below hardware specifications are used in this solution:



Logical Design

The logical design includes the following layers:

  • Two separate networking layers: 
    1. Management
    2. High-speed InfiniBand Network
  • Compute layer: 
    1. UPI Network Gateway node
    2. OCP4 UPI Helper Node
    3. Bootstarp Node
    4. 3xMaster node 
    5. Worker0 Node(without GPU)
    6. 4 x Worker Nodes with Nvidia Tesla P100 GPU cards and Mellanox ConnectX-5 adapter. 



GPU-based Node Logical Design

The following illustrates the GPU-based Worker node's components:


Bill of Materials

The below table specifies the hardware components used in this deployment guide:

 


This deployment guide does not cover server/hypervisor virtualization installation and virtual machine creation steps.

Server Wiring

In this GPU-based servers setuponly the first port from each HCA will be wired to an InfiniBand switch using EDR cables:


Network and Fabric Configuration

Network Configuration

Each GPU-based server is connected to a Mellanox QM8700 InfiniBand switch using an EDR InfiniBand copper cable. 

Below is a table detailing the server and switch names with the network configuration:


Server/Switch type


Server/Switch name
IP and NICS               
InfiniBand networkManagement network
Gateway Nodeclx-ocp-gwc

none

eno0: Static (Wan)

eno1: Static (UPI Lan)

OCP4 UPI nodeocp-helpernoneeno0: Static (UPI Lan)
Master Node 1-3master[0-2]noneeno0: From DHCP (reserved by UPI)
Worker Node 0worker0noneeno0: From DHCP (reserved by UPI)
Worker Node 1worker-p1

ib0: auto

ib1: auto

eno0: From DHCP (reserved by UPI)
Worker Node 2worker-p2

ib0: auto

ib1: auto

eno0: From DHCP (reserved by UPI)
Worker Node 3worker-p3

ib0: auto

ib1: auto

eno0: From DHCP (reserved by UPI)
Worker Node 4worker-p4

ib0: auto

ib1: auto

eno0: From DHCP (reserved by UPI)
InfiniBand switchswx-mld-s01nonemgmt0: From DHCP (reserved by UPI)
The IBx interfaces (ib0, ib1) do not require any additional configuration.


InfiniBand Fabric Network Topology  

Initial Setup for a One Switch Solution

In this deployment scenario you can connect up to 20 servers by using Mellanox Quantum™ HDR 200Gb/s QM8700 InfiniBand Smart Switch.

Scaled Setup for a Two-Layer Fat-Tree Topology

In this deployment scenario you can scale up to 20 Spine switches and 40 Leaf switches (single connectivity between Spine and Leaf Switches) and supports up to 400 servers.


For a scaled setup we recommend to use Mellanox Unified Fabric Manager (UFM®)


InfiniBand Fabric Configuration

Below is a list of recommendations and prerequisites that are important for the configuration process:

  • Refer to the MLNX-OS User Manual to become familiar with switch software (located at support.mellanox.com)
  • Upgrade the switch to the latest MLNX-OS version
  • InfiniBand Subnet Manager (SM) is required to configure InfiniBand fabric properly

There are three ways to run InfiniBand Subnet Manager (SM) in your InfiniBand fabric:

  1. Start the SM on one or more managed switches. This is a very convenient and quick operation which allows for an easier InfiniBand ‘plug & play'.
  2. Run OpenSM daemon on one or more servers by executing the /etc/init.d/opensmd command. It is recommended to run the SM on a server in case there are 648 nodes or more.
  3. Use Unified Fabric Management (UFM®). 
    Mellanox’s Unified Fabric Manager (UFM®) is a powerful platform for scale-out computing, eliminates the complexity of fabric management, provides deep visibility into traffic and optimizes fabric performance.

In this guide, we will use the method with launching the InfiniBand SM on the InfiniBand switch.

Below are the configuration steps for the chosen method.

To enable the SM on one of the managed switches please do following:

Login to the switch and enter the next configuration commands (swx-mld-s01 is our switch name):

IB switch configuration
Mellanox MLNX-OS Switch Management

switch login: admin
Password: 
 
swx-mld-s01 [standalone: master] > enable 
swx-mld-s01 [standalone: master] # configure terminal
swx-mld-s01 [standalone: master] (config) # ib smnode swx-mld-s01 enable 
swx-mld-s01 [standalone: master] (config) # ib smnode swx-mld-s01 sm-priority 0

swx-mld-s01 [standalone: master] (config) # ib sm virt enable
swx-mld-s01 [standalone: master] (config) # write memory
swx-mld-s01 [standalone: master] (config) # reload
 

After the switch reboots, check the switch configuration. It should look like the following:

Switch config example
Mellanox MLNX-OS Switch Management

switch login: admin
Password: 

swx-mld-s01 [standalone: master] > enable 
swx-mld-s01 [standalone: master] # configure terminal
swx-mld-s01 [standalone: master] (config) # show running-config 
##
## Running database "initial"
## Generated at 2019/03/19 17:58:53 +0200
## Hostname: swx-mld-s01
##

##
## Running-config temporary prefix mode setting
##
no cli default prefix-modes enable

##
## Subnet Manager configuration
##
   ib sm virt enable
   
##
## Other IPv6 configuration
##
no ipv6 enable
   
##
## AAA remote server configuration
##
# ldap bind-password ********
# radius-server key ********
# tacacs-server key ********
   
##
## Network management configuration
##
# web proxy auth basic password ********
   clock timezone Asia Middle_East Jerusalem
no ntp server 192.114.62.250 disable
   ntp server 192.114.62.250 keyID 0
no ntp server 192.114.62.250 trusted-enable
   ntp server 192.114.62.250 version 4
   
##
## X.509 certificates configuration
##
#
# Certificate name system-self-signed, ID 0cd5b6a0da88a0e68b8f3b49408b361afc73289d
# (public-cert config omitted since private-key config is hidden)

   
##
## IB nodename to GUID mapping
##
   ib smnode swx-mld-s01 create
   ib smnode swx-mld-s01 enable
   ib smnode swx-mld-s01 sm-priority 0
##
## Persistent prefix mode setting
##
cli default prefix-modes enable
 

Deployment Steps

Gateway Node Configuration

For the Gateway Node we will use virtual machine with two NICs and CentOS 7 as the operating system.

Steps for configuring CentOS 7 as a NAT Router:

  1. Configure the NICs as follows:
    1. ens224 - public with DHCP
    2. ens192 - static UPI Lan

      # cat /etc/sysconfig/network-scripts/ifcfg-ens192
      TYPE=Ethernet
      BOOTPROTO=static
      NAME=ens192
      DEVICE=ens192
      ONBOOT=yes
      IPADDR=192.168.7.1
      NETMASK=255.255.255.0
      DNS1=192.168.7.254
  2. Enable IP forwarding:

    # sysctl -w net.ipv4.ip_forward=1
    # echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.d/ip_forward.conf
    # sysctl -p


  3. Enable NAT:

    # firewall-cmd --permanent --direct --passthrough ipv4 -t nat -I POSTROUTING -o ens224 -j MASQUERADE -s 192.168.7.0/24
    # firewall-cmd --change-interface=ens224 --zone=external --permanent
    # firewall-cmd --change-interface=ens192 --zone=internal --permanent
    # firewall-cmd --set-default-zone=internal 
    # firewall-cmd --complete-reload
  4. Check the configuration:

    # firewall-cmd --get-active-zones
    internal
    interfaces: ens192
    external
    interfaces: ens224

OCP4 UPI Helper Node Configuration

OCP4 UPI Helper Node deployment requires a separate network with internet access and includes the following components:

  • DNS server
  • 2x Load balancer
  • Web server
  • DHCP server
  • PXE server
  • TFTP server
  • NFSv4  server
  • Bastion Host

The steps for OCP4 UPI Helper Node configuration are as follows:

  1. Installing the operating system
  2. Checking the prerequisites 
  3. preparing the UPI
  4. Create the Ignition Configs

Installing the OCP4 Helper Node Operating System

CentOS 7 operating system is recommended by UPI Helper Node installation guide with EPEL repo.

For our setup, we need RHEL 7.6 OS for the OCP4 UPI Helper Node. This will enable us to add Bare Metal GPU-based nodes and scale-out the OpenShift cluster.

For RHEL 7.6, you will need to enable the following repo rhel-7-server-rpms, rhel-7-server-extras-rpms, rhel-7-server-ansible-2.7-rpms and rhel-7-server-ose-4.1-rpms. For more info please refer to OpenShift User guide.

Checking the OCP4 UPI Prerequisites

Clone the github repository and install the additional packages from https://github.com/christianh814/ocp4-upi-helpernode.

# yum -y install ansible git
# git clone https://github.com/christianh814/ocp4-upi-helpernode
# cd ocp4-upi-helpernode

Preparing the UPI

After finishing the preparation steps, your working directory will be ocp4-upi-helpernode.

Copy the vars.yaml file from the docs/examples folder and modify it to match your network configuration.

Below is an example of vars.yaml file:

vars.yaml
---
disk: sda
helper:
  name: "ocp-helper"
  ipaddr: "192.168.7.254"
  networkifacename: "ens192"
dns:
  domain: "ocp.labs.mlnx"
  clusterid: "ocp4"
  forwarder1: "8.8.8.8"
  forwarder2: "8.8.4.4"
dhcp:
  router: "192.168.7.1"
  bcast: "192.168.7.255"
  netmask: "255.255.255.0"
  poolstart: "192.168.7.10"
  poolend: "192.168.7.30"
  ipid: "192.168.7.0"
  netmaskid: "255.255.255.0"
bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.7.20"
  macaddr: "00:0c:29:cc:87:b6"
masters:
  - name: "master0"
    ipaddr: "192.168.7.21"
    macaddr: "00:0c:29:82:0f:6c"
  - name: "master1"
    ipaddr: "192.168.7.22"
    macaddr: "00:0c:29:f0:f5:11"
  - name: "master2"
    ipaddr: "192.168.7.23"
    macaddr: "00:0c:29:19:75:42"
workers:
  - name: "worker0"
    ipaddr: "192.168.7.11"
    macaddr: "00:0c:29:c9:b9:c6"
  - name: "worker1"
    ipaddr: "192.168.7.12"
    macaddr: "ac:1f:6b:25:1f:f0"
  - name: "worker2"
    ipaddr: "192.168.7.13"
    macaddr: "ac:1f:6b:25:85:ec"
  - name: "worker3"
    ipaddr: "192.168.7.14"
    macaddr: "ac:1f:6b:25:20:12"
  - name: "worker4"
    ipaddr: "192.168.7.15"
    macaddr: "ac:1f:6b:25:1f:dc"

Review and set the desired OCP installation components in vars/main.yml. For example:

vars/main.yml
---
staticips: false
force_ocp_download: true
ocp_bios: "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.1/4.1.0/rhcos-4.1.0-x86_64-metal-bios.raw.gz"
ocp_initramfs: "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.1/4.1.0/rhcos-4.1.0-x86_64-installer-initramfs.img"
ocp_install_kernel: "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.1/4.1.0/rhcos-4.1.0-x86_64-installer-kernel"
ocp_client: "https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest-4.1/openshift-client-linux-4.1.20.tar.gz"
ocp_installer: "https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest-4.1/openshift-install-linux-4.1.20.tar.gz"

Run Ansible Playbook to setup your OCP UPI Helper Node:

# ansible-playbook -e @vars.yaml tasks/main.yml

For OCP UPI Helper Node verification, run the /usr/local/bin/helpernodecheck command with the following parameters {dns-masters|dns-workers|dns-etcd|install-info|haproxy|services|nfs-info}. For example:

[root@ocp-helper ocp4-upi-helpernode]# /usr/local/bin/helpernodecheck dns-workers
======================
DNS Config for Workers
======================

; Create entries for the worker hosts
worker0         IN      A       192.168.7.11
worker1         IN      A       192.168.7.12
worker2         IN      A       192.168.7.13
worker3         IN      A       192.168.7.14
worker4         IN      A       192.168.7.15

======================
DNS Lookup for Workers
======================

worker0.ocp4.ocp.labs.mlnx
-------------------------------------------------
IP: 192.168.7.11
Reverse: worker0.ocp4.ocp.labs.mlnx.

worker1.ocp4.ocp.labs.mlnx
-------------------------------------------------
IP: 192.168.7.12
Reverse: worker1.ocp4.ocp.labs.mlnx.

worker2.ocp4.ocp.labs.mlnx
-------------------------------------------------
IP: 192.168.7.13
Reverse: worker2.ocp4.ocp.labs.mlnx.

worker3.ocp4.ocp.labs.mlnx
-------------------------------------------------
IP: 192.168.7.14
Reverse: worker3.ocp4.ocp.labs.mlnx.

worker4.ocp4.ocp.labs.mlnx
-------------------------------------------------
IP: 192.168.7.15
Reverse: worker4.ocp4.ocp.labs.mlnx.

Creating the Ignition Configuration File

Creating the Ignition configuration file install-config.yaml is required for the RH OCP installation.

To create the Ignition configuration file, we will first create an installation folder:

mkdir ~/ocp4
cd ~/ocp4

For the complete configuration of the install-config.yaml , we will need two additional parameters: pullSecret and sshKey.

pullSecret can be obtained from cloud.redhat.com 

  • Login with your Red Hat account
  • Click on “Bare Metal”
  • Click on “Download Pull Secret” or “Copy Pull Secret”

sshKey is your public SSH key (e.g. ~/.ssh/id_rsa.pub)

Below is an example of the install-config.yaml file:

install-config.yaml
apiVersion: v1
baseDomain: ocp.labs.mlnx
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 1
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: ocp4
networking:
  clusterNetworks:
  - cidr: 10.254.0.0/16
    hostPrefix: 24
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
pullSecret: '{"auths":{"cloud.openshift.com":{"auth":....}}}'
sshKey: 'ssh-rsa AAAA... root@ocp-helper'

Generate the ignition configs by running the following command:

# openshift-install create ignition-configs

Now copy the ignition files to the websever ignition directory:

# cd ~/ocp4/
# cp *.ign /var/www/html/ignition/
# restorecon -vR /var/www/html/

The OCP4 UPI Helper node is now ready for the RH OCP installation process.

RH OCP Deployment 

Creating the OpenShift Container Platform Cluster

In the following steps we will install the OCP bootstrap, OCP management and OCP monitoring components over RHEL CoreOS-based virtual machines.

Before starting the installation process, make sure the ssh-agent is configured on your OCP4 UPI Helper Node. You can follow this guide for a step-by-step configuration process.

Boot the virtual machines we prepared using a PXE boot in the following order:

  1. Bootstrap
  2. Masters
  3. Workers

For additional information about installing RH OCP 4.1 over bare metal please refer to the OCP 4.1 installation guide.

To monitor the installation process use the following commands:

  • bootstrap stage:  openshift-install wait-for bootstrap-complete --log-level debug

    # openshift-install wait-for bootstrap-complete --log-level debug
    DEBUG OpenShift Installer v4.1.20-201910102034-dirty
    DEBUG Built from commit e4708ece20e3f03947e9f5f460f1d5cbcd401249
    INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.ocp.labs.mlnx:6443...
    INFO API v1.13.4+520769a up
    INFO Waiting up to 30m0s for bootstrapping to complete...
    DEBUG Bootstrap status: complete
    INFO It is now safe to remove the bootstrap resources
  • Remove the bootstrap resources from the load-balancer configuration /etc/haproxy/haproxy.cfg and restart the haproxy service:

    /etc/haproxy/haproxy.cfg
    #---------------------------------------------------------------------
    # Example configuration for a possible web application.  See the
    # full configuration options online.
    #
    #   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
    #
    #---------------------------------------------------------------------
    
    #---------------------------------------------------------------------
    # Global settings
    #---------------------------------------------------------------------
    global
        # to have these messages end up in /var/log/haproxy.log you will
        # need to:
        #
        # 1) configure syslog to accept network log events.  This is done
        #    by adding the '-r' option to the SYSLOGD_OPTIONS in
        #    /etc/sysconfig/syslog
        #
        # 2) configure local2 events to go to the /var/log/haproxy.log
        #   file. A line like the following can be added to
        #   /etc/sysconfig/syslog
        #
        #    local2.*                       /var/log/haproxy.log
        #
        log         127.0.0.1 local2
    
        chroot      /var/lib/haproxy
        pidfile     /var/run/haproxy.pid
        maxconn     4000
        user        haproxy
        group       haproxy
        daemon
    
        # turn on stats unix socket
        stats socket /var/lib/haproxy/stats
    
    #---------------------------------------------------------------------
    # common defaults that all the 'listen' and 'backend' sections will
    # use if not designated in their block
    #---------------------------------------------------------------------
    defaults
        mode                    http
        log                     global
        option                  httplog
        option                  dontlognull
        option http-server-close
        option forwardfor       except 127.0.0.0/8
        option                  redispatch
        retries                 3
        timeout http-request    10s
        timeout queue           1m
        timeout connect         10s
        timeout client          1m
        timeout server          1m
        timeout http-keep-alive 10s
        timeout check           10s
        maxconn                 3000
    
    #---------------------------------------------------------------------
    
    listen stats
        bind :9000
        mode http
        stats enable
        stats uri /
        monitor-uri /healthz
    
    
    frontend openshift-api-server
        bind *:6443
        default_backend openshift-api-server
        mode tcp
        option tcplog
    
    backend openshift-api-server
        balance source
        mode tcp
    #    server bootstrap 192.168.7.20:6443 check # remark after finish bootstarp
        server master0 192.168.7.21:6443 check
        server master1 192.168.7.22:6443 check
        server master2 192.168.7.23:6443 check
    
    frontend machine-config-server
        bind *:22623
        default_backend machine-config-server
        mode tcp
        option tcplog
    
    backend machine-config-server
        balance source
        mode tcp
    #    server bootstrap 192.168.7.20:22623 check # remark after finish bootstarp 
        server master0 192.168.7.21:22623 check
        server master1 192.168.7.22:22623 check
        server master2 192.168.7.23:22623 check
    
    frontend ingress-http
        bind *:80
        default_backend ingress-http
        mode tcp
        option tcplog
    
    backend ingress-http
        balance source
        mode tcp
        server worker0-http-router0 192.168.7.11:80 check
        server worker1-http-router1 192.168.7.12:80 check
        server worker2-http-router2 192.168.7.13:80 check
        server worker3-http-router3 192.168.7.14:80 check
        server worker4-http-router4 192.168.7.15:80 check
    
    frontend ingress-https
        bind *:443
        default_backend ingress-https
        mode tcp
        option tcplog
    
    backend ingress-https
        balance source
        mode tcp
        server worker0-https-router0 192.168.7.11:443 check
        server worker1-https-router1 192.168.7.12:443 check
        server worker2-https-router2 192.168.7.13:443 check
        server worker3-https-router3 192.168.7.14:443 check
        server worker4-https-router4 192.168.7.15:443 check
    # service haproxy restart



  • Finalize the cluster installation: openshift-install wait-for install-complete --log-level debug

    # openshift-install wait-for install-complete --log-level debug
    DEBUG OpenShift Installer v4.1.20-201910102034-dirty 
    DEBUG Built from commit e4708ece20e3f03947e9f5f460f1d5cbcd401249 
    INFO Waiting up to 30m0s for the cluster at https://api.ocp4.ocp.labs.mlnx:6443 to initialize... 
    DEBUG Cluster is initialized                       
    INFO Waiting up to 10m0s for the openshift-console route to be created... 
    DEBUG Route found in openshift-console namespace: console 
    DEBUG Route found in openshift-console namespace: downloads 
    DEBUG OpenShift console route is created           
    INFO Install complete!                            
    INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/root/install/auth/kubeconfig' 
    INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4.ocp.labs.mlnx 
    INFO Login to the console with user: kubeadmin, password: *****-*****-*****-***** 

OpenShift Cluster Scale-up using RHEL Compute Machine

Please install the required packages on the OCP4 UPI helper node to run the cluster scale-up playbook, including Openshift-Ansible:

# yum install openshift-ansible openshift-clients jq

The next steps is relevant for Kubernetes Worker Nodes - worker1, worker2, worker3 and worker4.

Preparing a GPU-based RHEL compute node

The Red Hat Enterprise Linux (RHEL) compute or worker node in your OpenShift Container Platform environment must meet the hardware specifications and system-level requirements. RHEL 7.6 "Minimal" installation option is required as the base OS.

Only RHEL 7.6 is supported in OpenShift Container Platform 4.1. You must not upgrade your compute machines to RHEL 8.

Enable only the repositories required by OpenShift Container Platform 4.1:

# subscription-manager repos \
    --enable="rhel-7-server-rpms" \
    --enable="rhel-7-server-extras-rpms" \
    --enable="rhel-7-server-ose-4.1-rpms"

Stop and disable the firewall on the host:

# systemctl disable firewalld.service
# systemctl stop firewalld.service

Install any additional packages that are required and lock the kernel version:

# yum -y install yum-plugin-versionlock 
# yum versionlock kernel-3.10.0-1062.1.2.el7
# yum -y update kernel
# yum -y install perl gtk2 atk cairo tcl gcc-gfortran tcsh tk pciutils lsof
# reboot

The installation of the NVIDIA GPU driver is validated only for kernel-3.10.0-1062.1.2.el7.

Disable the nouveau kernel module:

# echo 'blacklist nouveau' > /etc/modprobe.d/blacklist-nouveau.conf
# echo 'options nouveau modeset=0' >> /etc/modprobe.d/blacklist-nouveau.conf
# dracut --force
# reboot


After rebooting, make sure that the nouveau module in not listed here:

# lsmod | grep nouveau

Installing Mellanox OFED

There are two methods to install Mellanox OFED for the above specified kernel version.

  1. Download Mellanox OFED v4.7-1.0.0.1 from Mellanox websiteDownload the installation package and run the command  “mlnx_add_kernel_support.sh” to add support for your kernel. Refer to this User Guide for instructions.
  2. Alternatively you can download a pre-configured Mellanox OFED image from here and copy it to the root folder of your compute node. this comes with built-in support for kernel-3.10.0-1062.1.2.el7.

Installation steps:

After obtaining the image, run:

# mkdir /mnt/iso
# mount -o loop /root/MLNX_OFED_LINUX-4.7-1.0.0.1-rhel7.6-x86_64-ext.iso /mnt/iso
# /mnt/iso/mlnxofedinstall --force
# reboot

Install the SELinux with the InfiniBand patch.
Extract infiniband.* from the attached archive (infiniband.zip) and copy it to each compute node, then execute from the local folder:

# semodule -i infiniband.pp

Now the compute node is ready join the OpenShift cluster.

Adding GPU-based RHEL Compute Nodes to the OpenShift Cluster

For additional information about adding RHEL compute nodes to the OpenShift cluster please refer to Adding a RHEL compute machine section in the OCP installation guide.

To scale-up an OpenShift cluster with RHEL compute nodes:

  • Use the ssh-copy-id to install SSH keys from OCP4 UPI Helper Node on compute nodes as authorized keys for passwordless authentication
  • Extract the "pull secret" for your OpenShift cluster
  • Create an Ansible inventory file named hosts that defines your compute nodes and the required variables
  • Run the Ansible playbook for the scale-up cluster with RHEL compute nodes
  • Approve the CSRs for your RHEL compute nodes

Below is an example of our hosts file for OpenShift cluster scale-up:

[all:vars]
ansible_user=root
#ansible_become=True

openshift_kubeconfig_path="~/.kube/config"
openshift_pull_secret_path="~/pull-secret.txt"

[workers]
worker0.ocp4.ocp.labs.mlnx

[new_workers]
worker1.ocp4.ocp.labs.mlnx
worker2.ocp4.ocp.labs.mlnx
worker3.ocp4.ocp.labs.mlnx
worker4.ocp4.ocp.labs.mlnx

Run the scale-up playbook:

# cd /usr/share/ansible/openshift-ansible
# ansible-playbook -i ~/hosts playbooks/scaleup.yml

Scale-up playbook execution output:

Playbook execution output
[root@ocp-helper openshift-ansible # ansible-playbook -i ~/hosts playbooks/scaleup.yml

PLAY [Pre-scaleup checks] ******************************************************************************************************************************************************************************************************************

TASK [fail] ********************************************************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:31:43 +0200 (0:00:00.068)       0:00:00.068 ******* 
skipping: [localhost]

PLAY [install nodes] ***********************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:31:43 +0200 (0:00:00.039)       0:00:00.107 ******* 
ok: [worker3.ocp4.ocp.labs.mlnx]
ok: [worker4.ocp4.ocp.labs.mlnx]
ok: [worker1.ocp4.ocp.labs.mlnx]
ok: [worker2.ocp4.ocp.labs.mlnx]

TASK [openshift_node : include_tasks] ******************************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:31:45 +0200 (0:00:02.000)       0:00:02.107 ******* 
included: /usr/share/ansible/openshift-ansible/playbooks/roles/openshift_node/tasks/install.yml for worker1.ocp4.ocp.labs.mlnx, worker2.ocp4.ocp.labs.mlnx, worker3.ocp4.ocp.labs.mlnx, worker4.ocp4.ocp.labs.mlnx

TASK [openshift_node : Install openshift support packages] *********************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:31:45 +0200 (0:00:00.642)       0:00:02.750 ******* 
lok: [worker2.ocp4.ocp.labs.mlnx]
ok: [worker4.ocp4.ocp.labs.mlnx]
ok: [worker1.ocp4.ocp.labs.mlnx]
ok: [worker3.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Install openshift packages] *****************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:34:47 +0200 (0:03:01.994)       0:03:04.744 ******* 
ok: [worker1.ocp4.ocp.labs.mlnx]
ok: [worker2.ocp4.ocp.labs.mlnx]
ok: [worker3.ocp4.ocp.labs.mlnx]
ok: [worker4.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Enable the CRI-O service] *******************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:18 +0200 (0:00:30.724)       0:03:35.469 ******* 
ok: [worker2.ocp4.ocp.labs.mlnx]
ok: [worker3.ocp4.ocp.labs.mlnx]
ok: [worker4.ocp4.ocp.labs.mlnx]
ok: [worker1.ocp4.ocp.labs.mlnx]

TASK [openshift_node : include_tasks] ******************************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:19 +0200 (0:00:00.820)       0:03:36.289 ******* 
included: /usr/share/ansible/openshift-ansible/playbooks/roles/openshift_node/tasks/config.yml for worker1.ocp4.ocp.labs.mlnx, worker2.ocp4.ocp.labs.mlnx, worker3.ocp4.ocp.labs.mlnx, worker4.ocp4.ocp.labs.mlnx

TASK [openshift_node : Disable swap] *******************************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:19 +0200 (0:00:00.461)       0:03:36.751 ******* 
ok: [worker2.ocp4.ocp.labs.mlnx]
ok: [worker3.ocp4.ocp.labs.mlnx]
ok: [worker1.ocp4.ocp.labs.mlnx]
ok: [worker4.ocp4.ocp.labs.mlnx]

TASK [openshift_node : sysctl] *************************************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:20 +0200 (0:00:00.437)       0:03:37.188 ******* 
 [WARNING]: The value 1 (type int) in a string field was converted to u'1' (type string). If this does not look like what you expect, quote the entire value to ensure it does not change.

ok: [worker2.ocp4.ocp.labs.mlnx]
ok: [worker1.ocp4.ocp.labs.mlnx]
ok: [worker3.ocp4.ocp.labs.mlnx]
ok: [worker4.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Disable firewalld service] ******************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:20 +0200 (0:00:00.476)       0:03:37.665 ******* 
ok: [worker1.ocp4.ocp.labs.mlnx]
ok: [worker2.ocp4.ocp.labs.mlnx]
ok: [worker3.ocp4.ocp.labs.mlnx]
ok: [worker4.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Setting sebool container_manage_cgroup] *****************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:21 +0200 (0:00:00.447)       0:03:38.112 ******* 
ok: [worker4.ocp4.ocp.labs.mlnx]
ok: [worker1.ocp4.ocp.labs.mlnx]
ok: [worker3.ocp4.ocp.labs.mlnx]
ok: [worker2.ocp4.ocp.labs.mlnx]

TASK [openshift_node : create temp directory] **********************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:21 +0200 (0:00:00.601)       0:03:38.714 ******* 
changed: [worker1.ocp4.ocp.labs.mlnx]
changed: [worker3.ocp4.ocp.labs.mlnx]
changed: [worker2.ocp4.ocp.labs.mlnx]
changed: [worker4.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Wait for bootstrap endpoint to show up] *****************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:22 +0200 (0:00:00.432)       0:03:39.147 ******* 
ok: [worker4.ocp4.ocp.labs.mlnx]
ok: [worker2.ocp4.ocp.labs.mlnx]
ok: [worker3.ocp4.ocp.labs.mlnx]
ok: [worker1.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Fetch bootstrap ignition file locally] ******************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:23 +0200 (0:00:00.931)       0:03:40.078 ******* 
changed: [worker1.ocp4.ocp.labs.mlnx]
changed: [worker2.ocp4.ocp.labs.mlnx]
changed: [worker3.ocp4.ocp.labs.mlnx]
changed: [worker4.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Copy pull secret in the directory] **********************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:23 +0200 (0:00:00.668)       0:03:40.747 ******* 
changed: [worker2.ocp4.ocp.labs.mlnx]
changed: [worker1.ocp4.ocp.labs.mlnx]
changed: [worker4.ocp4.ocp.labs.mlnx]
changed: [worker3.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Get release image] **************************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:24 +0200 (0:00:00.986)       0:03:41.733 ******* 
changed: [worker1.ocp4.ocp.labs.mlnx -> localhost]
changed: [worker3.ocp4.ocp.labs.mlnx -> localhost]
changed: [worker4.ocp4.ocp.labs.mlnx -> localhost]
changed: [worker2.ocp4.ocp.labs.mlnx -> localhost]

TASK [openshift_node : Set openshift_release_image fact] ***********************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:25 +0200 (0:00:01.111)       0:03:42.845 ******* 
ok: [worker1.ocp4.ocp.labs.mlnx]
ok: [worker2.ocp4.ocp.labs.mlnx]
ok: [worker3.ocp4.ocp.labs.mlnx]
ok: [worker4.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Pull release image] *************************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:26 +0200 (0:00:00.246)       0:03:43.091 ******* 
changed: [worker2.ocp4.ocp.labs.mlnx]
changed: [worker3.ocp4.ocp.labs.mlnx]
changed: [worker4.ocp4.ocp.labs.mlnx]
changed: [worker1.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Get machine controller daemon image from release image] *************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:53 +0200 (0:00:27.708)       0:04:10.799 ******* 
changed: [worker1.ocp4.ocp.labs.mlnx]
changed: [worker2.ocp4.ocp.labs.mlnx]
changed: [worker3.ocp4.ocp.labs.mlnx]
changed: [worker4.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Pull MCD image] *****************************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:35:56 +0200 (0:00:02.139)       0:04:12.939 ******* 
changed: [worker4.ocp4.ocp.labs.mlnx]
changed: [worker3.ocp4.ocp.labs.mlnx]
changed: [worker1.ocp4.ocp.labs.mlnx]
changed: [worker2.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Apply ignition manifest] ********************************************************************************************************************************************************************************************
Tuesday 29 October 2019  16:36:05 +0200 (0:00:09.036)       0:04:21.975 ******* 
changed: [worker1.ocp4.ocp.labs.mlnx]
changed: [worker2.ocp4.ocp.labs.mlnx]
changed: [worker3.ocp4.ocp.labs.mlnx]
changed: [worker4.ocp4.ocp.labs.mlnx]

TASK [openshift_node : Reboot the host and wait for it to come back] ***********************************************************************************************************************************************************************
Tuesday 29 October 2019  16:36:06 +0200 (0:00:01.052)       0:04:23.028 ******* 
changed: [worker1.ocp4.ocp.labs.mlnx]
changed: [worker2.ocp4.ocp.labs.mlnx]
changed: [worker4.ocp4.ocp.labs.mlnx]
changed: [worker3.ocp4.ocp.labs.mlnx]

PLAY RECAP *********************************************************************************************************************************************************************************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
worker1.ocp4.ocp.labs.mlnx : ok=21   changed=9    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
worker2.ocp4.ocp.labs.mlnx : ok=21   changed=9    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
worker3.ocp4.ocp.labs.mlnx : ok=21   changed=9    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
worker4.ocp4.ocp.labs.mlnx : ok=21   changed=9    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

Tuesday 29 October 2019  16:38:36 +0200 (0:02:30.434)       0:06:53.462 ******* 
=============================================================================== 
openshift_node : Install openshift support packages ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 181.99s
openshift_node : Reboot the host and wait for it to come back --------------------------------------------------------------------------------------------------------------------------------------------------------------------- 150.43s
openshift_node : Install openshift packages ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 30.72s
openshift_node : Pull release image ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 27.71s
openshift_node : Pull MCD image ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 9.04s
openshift_node : Get machine controller daemon image from release image ------------------------------------------------------------------------------------------------------------------------------------------------------------- 2.14s
Gathering Facts --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2.00s
openshift_node : Get release image -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.11s
openshift_node : Apply ignition manifest -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.05s
openshift_node : Copy pull secret in the directory ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.99s
openshift_node : Wait for bootstrap endpoint to show up ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.93s
openshift_node : Enable the CRI-O service ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.82s
openshift_node : Fetch bootstrap ignition file locally ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.67s
openshift_node : include_tasks ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.64s
openshift_node : Setting sebool container_manage_cgroup ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.60s
openshift_node : sysctl ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.48s
openshift_node : include_tasks ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.46s
openshift_node : Disable firewalld service ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.45s
openshift_node : Disable swap ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.44s
openshift_node : create temp directory ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.43s

After executing the Scale-up playbook, approve all pending certificate signing requests (CSRs) that were generated for each machine that you added:

# oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve
# oc get csr
NAME        AGE     REQUESTOR                                                                   CONDITION
csr-2pn2r   44s     system:node:worker2.ocp4.ocp.labs.mlnx                                      Approved,Issued
csr-7fd8p   6m31s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-djzv6   6m30s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-h985k   6m21s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-j6rdh   6m32s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-lhvqq   46s     system:node:worker3.ocp4.ocp.labs.mlnx                                      Approved,Issued
csr-m52kp   49s     system:node:worker1.ocp4.ocp.labs.mlnx                                      Approved,Issued
csr-x47hg   55s     system:node:worker4.ocp4.ocp.labs.mlnx                                      Approved,Issued
csr-x8cgl   6m21s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued


Confirm that the cluster recognizes the machines:

# oc get nodes -o wide
NAME                         STATUS     ROLES    AGE   VERSION             INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                   KERNEL-VERSION                CONTAINER-RUNTIME
master0.ocp4.ocp.labs.mlnx   Ready      master    1d   v1.13.4+a80aad556   192.168.7.21   <none>        Red Hat Enterprise Linux CoreOS 410.8.20191011.0 (Ootpa)   4.18.0-80.11.2.el8_0.x86_64   cri-o://1.13.11-0.13.dev.rhaos4.1.gitbdeb2ca.el8-dev
master1.ocp4.ocp.labs.mlnx   Ready      master    1d   v1.13.4+a80aad556   192.168.7.22   <none>        Red Hat Enterprise Linux CoreOS 410.8.20191011.0 (Ootpa)   4.18.0-80.11.2.el8_0.x86_64   cri-o://1.13.11-0.13.dev.rhaos4.1.gitbdeb2ca.el8-dev
master2.ocp4.ocp.labs.mlnx   Ready      master    1d   v1.13.4+a80aad556   192.168.7.23   <none>        Red Hat Enterprise Linux CoreOS 410.8.20191011.0 (Ootpa)   4.18.0-80.11.2.el8_0.x86_64   cri-o://1.13.11-0.13.dev.rhaos4.1.gitbdeb2ca.el8-dev
worker0.ocp4.ocp.labs.mlnx   Ready      worker    1d   v1.13.4+a80aad556   192.168.7.11   <none>        Red Hat Enterprise Linux CoreOS 410.8.20191011.0 (Ootpa)   4.18.0-80.11.2.el8_0.x86_64   cri-o://1.13.11-0.13.dev.rhaos4.1.gitbdeb2ca.el8-dev
worker1.ocp4.ocp.labs.mlnx   Ready      worker    1h   v1.13.4+a80aad556   192.168.7.12   <none>        OpenShift Enterprise                                       3.10.0-1062.1.2.el7.x86_64    cri-o://1.13.11-0.11.dev.rhaos4.1.git3338d4d.el7
worker2.ocp4.ocp.labs.mlnx   Ready      worker    1h   v1.13.4+a80aad556   192.168.7.13   <none>        OpenShift Enterprise                                       3.10.0-1062.1.2.el7.x86_64    cri-o://1.13.11-0.11.dev.rhaos4.1.git3338d4d.el7
worker3.ocp4.ocp.labs.mlnx   Ready      worker    1h   v1.13.4+a80aad556   192.168.7.14   <none>        OpenShift Enterprise                                       3.10.0-1062.1.2.el7.x86_64    cri-o://1.13.11-0.11.dev.rhaos4.1.git3338d4d.el7
worker4.ocp4.ocp.labs.mlnx   Ready      worker    1h   v1.13.4+a80aad556   192.168.7.15   <none>        OpenShift Enterprise                                       3.10.0-1062.1.2.el7.x86_64    cri-o://1.13.11-0.11.dev.rhaos4.1.git3338d4d.el7

NVIDIA GPU Driver and Plugin Deployment 

The next step in our deployment is to install the NVIDIA components for the OCP.

This step must be executed on the OCP4 UPI Helper Node.

Deployment Node Feature Discovery (NFD) 

  1. Deploy NFD from github in OpenShift 4.X

    # mkdir ~/install
    # cd ~/install
    # git clone https://github.com/openshift/cluster-nfd-operator
    # PULLPOLICY=Always make -C cluster-nfd-operator deploy
  2. Verify that the GPU nodes are labelled correctly

    # oc describe nodes | grep 10de                   
    	feature.node.kubernetes.io/pci-10de.present=true
    	feature.node.kubernetes.io/pci-10de.present=true
    
    # oc describe nodes | grep kernel               
    	feature.node.kubernetes.io/kernel-version.full=3.10.0-XXXXX-x86_64
    	feature.node.kubernetes.io/kernel-version.major=3
    	feature.node.kubernetes.io/kernel-version.minor=10
    	feature.node.kubernetes.io/kernel-version.revision=0

Special Resource Operator (SRO) Deployment

Execute on OCP4 UPI Helper Node

  1. Deploy the SRO from github

    # cd ~/install
    # git clone https://github.com/openshift-psap/special-resource-operator
    # cd special-resource-operator
    # git checkout release-4.2  # This works for OCP 4.0, 4.1, 4.2
    # PULLPOLICY=Always make deploy
    
  2. Verify that the GPUs are enabled, one will see the extended resource GPU and misc NVIDIA features: 

    # oc describe node worker1.ocp4.ocp.labs.mlnx | grep nvidia
                        nvidia.com/cuda.driver.major=418
                        nvidia.com/cuda.driver.minor=87
                        nvidia.com/cuda.driver.rev=01
                        nvidia.com/cuda.runtime.major=10
                        nvidia.com/cuda.runtime.minor=1
                        nvidia.com/gfd.timestamp=1572712654
                        nvidia.com/gpu.compute.major=6
                        nvidia.com/gpu.compute.minor=0
                        nvidia.com/gpu.family=pascal
                        nvidia.com/gpu.machine=SYS-4028GR-TR2
                        nvidia.com/gpu.memory=16280
                        nvidia.com/gpu.product=Tesla-P100-PCIE-16GB
     nvidia.com/gpu:  4
    

    If SRO deployments hangs on the NVIDIA driver verification step, restart the CRI-O service on each GPU node with the following commands:

    # systemctl restart crio

    # systemctl status crio

    Successful installation of the SRO looks like the following:

    # oc get pod -n openshift-sro -o wide
    NAME                                         READY   STATUS      RESTARTS   AGE    IP             NODE                         NOMINATED NODE   READINESS GATES
    cuda-vector-add                              0/1     Completed   0            1h   10.254.6.13    worker2.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-dcgm-exporter-8w25q                   2/2     Running     0            1h   192.168.7.15   worker4.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-dcgm-exporter-k7nkr                   2/2     Running     0            1h   192.168.7.14   worker3.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-dcgm-exporter-pxb2b                   2/2     Running     0            1h   192.168.7.12   worker1.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-dcgm-exporter-t5xtf                   2/2     Running     0            1h   192.168.7.13   worker2.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-device-plugin-daemonset-52w7n         1/1     Running     0            1h   10.254.7.9     worker3.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-device-plugin-daemonset-7hpwk         1/1     Running     0            1h   10.254.5.14    worker1.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-device-plugin-daemonset-brk87         1/1     Running     0            1h   10.254.4.9     worker4.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-device-plugin-daemonset-zcsv7         1/1     Running     0            1h   10.254.6.14    worker2.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-device-plugin-validation              0/1     Completed   0            1h   10.254.7.10    worker3.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-driver-daemonset-2pmh5                1/1     Running     0            1h   10.254.4.8     worker4.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-driver-daemonset-5qzww                1/1     Running     0            1h   10.254.7.8     worker3.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-driver-daemonset-72bgb                1/1     Running     0            1h   10.254.5.11    worker1.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-driver-daemonset-qvsnj                1/1     Running     0            1h   10.254.6.9     worker2.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-driver-validation                     0/1     Completed   0            1h   10.254.5.13    worker1.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-feature-discovery-54xt5               1/1     Running     0            1h   10.254.4.10    worker4.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-feature-discovery-np6rj               1/1     Running     0            1h   10.254.5.16    worker1.ocp4.ocp.labs.mlnx   <none>           <none>
    nvidia-feature-discovery-t5lpl               1/1     Running     0            1h   10.254.7.11    worker3.ocp4.ocp.labs.mlnx   <none>           <none>

Enable the GPUDirect Kernel Module

Start the nv_peer_memory service manually on each GPU-based node from the OCP4 UPI Helper Node.

GPUDirect is currently a Technology Preview feature.

The Helper Node will receive a list of pods with a NVIDIA GPU driver:

# oc get pod -n openshift-sro -o wide | grep nvidia-driver
nvidia-driver-daemonset-2pmh5                1/1     Running     0          1h   10.254.4.8     worker4.ocp4.ocp.labs.mlnx   <none>           <none>
nvidia-driver-daemonset-5qzww                1/1     Running     0          1h   10.254.7.8     worker3.ocp4.ocp.labs.mlnx   <none>           <none>
nvidia-driver-daemonset-72bgb                1/1     Running     0          1h   10.254.5.11    worker1.ocp4.ocp.labs.mlnx   <none>           <none>
nvidia-driver-daemonset-qvsnj                1/1     Running     0          1h   10.254.6.9     worker2.ocp4.ocp.labs.mlnx   <none>           <none>

For each pod in daemoset, execute the next commands:

# oc -n openshift-sro rsh nvidia-driver-daemonset-5qzww 
sh-4.2# bash
[root@nvidia-driver-daemonset-5qzww /]# modprobe nv_peer_mem
[root@nvidia-driver-daemonset-5qzww /]# lsmod | grep nv_peer_mem
[root@nvidia-driver-daemonset-5qzww /]# exit

This step must be executed if Worker Node has been rebooted


Deployment of InfiniBand and KubeFlow Kubernetes Components 

Copy the Openshift-rdma.zip to the OCP4 UPI Helper Node and extract the files. The archive contains the following files:

  • device-plugin.yaml – Daemonset for deployment RDMA device plugin with shared InfiniBand HCA
  • mpijob-gpud.yaml - MPI Job example
  • mpi-operator.yaml - KubeFlow/mpi-operator full installation(no need to install KubeFlow)
  • rdma-hca-node-config.yaml - configmap configuration file for RDMA device plugin
  1. Install the RDMA device plugin:

    # oc apply -f rdma-hca-node-config.yaml
    # oc apply -f device-plugin.yaml
  2. KubeFlow MPI-operator installation command:

    # oc apply -f mpi-operator.yaml


Application Deployment and Configuration

Application deployment example is provided in the mpijob-gpud.yaml file. This example describes how to run a distributed TensorFlow benchmark with Horovod framework using a KubeFlow MPI-Operator over a high-performance InfiniBand fabric.

Below are the environment variable settings used in the mpijob-gpud.yaml file to run the TensorFlow benchmark:

  • TCP mode
    NCCL_IB_DISABLE=1
    NCCL_NET_GDR_LEVEL=0
  • Without GPUDirect
    NCCL_IB_DISABLE=0
    NCCL_NET_GDR_LEVEL=0
  • With GPUDirect
    NCCL_IB_DISABLE=0
    NCCL_NET_GDR_LEVEL=1

Deploy the application with the below command by running it on Helper Node.

# oc apply -f mpijob-gpud.yaml

Performance Testing

Below are the logs for the distributed TensorFlow benchmark tests with KubeFlow/mpi-operator.

Using GPUDirect (GDR) with the hardware used in our POC environment which includes 4 servers with 4 x P100 PCI GPU in each, will add 6.26% boost in performance.

Higher results are expected for more servers with more powerful GPUs. 

Below is the log of running a distributed TensorFlow benchmark tests with KubeFlow/mpi-operator in TCP mode:

+ POD_NAME=tensorflow-benchmarks-worker-1
+ shift
+ /opt/kube/kubectl exec tensorflow-benchmarks-worker-1 -- /bin/sh -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "764542976" -mca ess_base_vpid 2 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[1:4]xnr8,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "764542976.0;tcp://10.254.5.30:36799" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "764542976.0;tcp://10.254.5.30:36799" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated"
+ POD_NAME=tensorflow-benchmarks-worker-3
+ shift
+ /opt/kube/kubectl exec tensorflow-benchmarks-worker-3 -- /bin/sh -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "764542976" -mca ess_base_vpid 4 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[1:4]xnr8,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "764542976.0;tcp://10.254.5.30:36799" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "764542976.0;tcp://10.254.5.30:36799" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated"
+ POD_NAME=tensorflow-benchmarks-worker-0
+ + shift
POD_NAME=tensorflow-benchmarks-worker-2
+ shift
+ /opt/kube/kubectl exec tensorflow-benchmarks-worker-0 -- /bin/sh+  -c/opt/kube/kubectl exec tensorflow-benchmarks-worker-2 -- /bin/sh -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "764542976" -mca ess_base_vpid 1 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[1:4]xnr8,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "764542976.0;tcp://10.254.5.30:36799" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "764542976.0;tcp://10.254.5.30:36799" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated"
     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "764542976" -mca ess_base_vpid 3 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[1:4]xnr8,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "764542976.0;tcp://10.254.5.30:36799" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "764542976.0;tcp://10.254.5.30:36799" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated"
2019-10-31 09:33:41.078094: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078419: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078447: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078269: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078241: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078241: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078247: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078278: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078279: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078698: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078761: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078744: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078571: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078689: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.078881: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:41.079080: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:33:42.823273: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x62b6630 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.823346: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.823358: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.823367: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.823376: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.824362: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x50898f0 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.824414: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.824427: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.824436: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.824445: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.824714: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5a3b870 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.824749: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.824760: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.824769: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.824777: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.825649: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x561c1d0 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.825702: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.825717: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.825726: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.825734: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.825804: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4e2b4c0 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.825865: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.825877: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.825885: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.825893: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.827898: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x64b12f0 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.827962: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.827975: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.827984: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.827994: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.828199: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:33:42.828199: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:33:42.829567: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:33:42.830766: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:33:42.831848: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5b190c0 executing computations on platform Host. Devices:
2019-10-31 09:33:42.831879: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.831938: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5167140 executing computations on platform Host. Devices:
2019-10-31 09:33:42.831970: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.832258: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:33:42.832352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.832397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:33:42.832746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0c:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.832718: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:33:42.832784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:33:42.832751: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56f9a10 executing computations on platform Host. Devices:
2019-10-31 09:33:42.832785: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.833106: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5068d70 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.833193: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.833205: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.833214: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.833222: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.833201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0e:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.833249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:33:42.833835: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6393e60 executing computations on platform Host. Devices:
2019-10-31 09:33:42.833866: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.833940: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x48efc30 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.833980: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.833992: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.834000: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.834008: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.834212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0c:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.834244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:33:42.835262: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x658eb50 executing computations on platform Host. Devices:
2019-10-31 09:33:42.835297: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.835954: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4f08d30 executing computations on platform Host. Devices:
2019-10-31 09:33:42.835987: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.836263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.836294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:33:42.836375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:06:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.836416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:33:42.837365: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x659ea10 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.837434: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.837447: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.837457: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.837466: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.838824: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz
2019-10-31 09:33:42.839060: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x47f0990 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.839099: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.839112: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.839122: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.839131: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.839143: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x52bad20 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.839181: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.839196: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.839205: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.839213: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.839459: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz
2019-10-31 09:33:42.840115: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4900010 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.840155: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.840166: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.840175: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.840183: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.840820: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5b34010 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.840855: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.840866: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.840875: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.840884: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.842132: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x51465a0 executing computations on platform Host. Devices:
2019-10-31 09:33:42.842164: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.842428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0e:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.842458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:33:42.842500: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x49cd490 executing computations on platform Host. Devices:
2019-10-31 09:33:42.842530: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.842959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.842997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:33:42.843562: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:33:42.843520: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x54497e0 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.843583: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.843605: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.843619: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.843627: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.843992: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz
2019-10-31 09:33:42.844902: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz
2019-10-31 09:33:42.845210: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:33:42.846111: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:33:42.846665: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5398550 executing computations on platform Host. Devices:
2019-10-31 09:33:42.846694: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.847238: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x667c220 executing computations on platform Host. Devices:
2019-10-31 09:33:42.847271: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.847787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0e:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.847818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:33:42.848053: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x49dd850 executing computations on platform Host. Devices:
2019-10-31 09:33:42.848090: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.848369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:06:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.848410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:33:42.848673: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5c11850 executing computations on platform Host. Devices:
2019-10-31 09:33:42.848705: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.849019: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x48ce1a0 executing computations on platform Host. Devices:
2019-10-31 09:33:42.849050: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.849469: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:06:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.849498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:33:42.849226: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4845c50 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.849265: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.849278: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.849260: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:33:42.849287: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.849297: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.849810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0e:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.849843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0c:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.849849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:33:42.849878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:33:42.850540: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x577e2a0 executing computations on platform CUDA. Devices:
2019-10-31 09:33:42.850575: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.850595: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.850605: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.850613: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:33:42.852293: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5527030 executing computations on platform Host. Devices:
2019-10-31 09:33:42.852325: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.853152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.853196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:33:42.854239: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:33:42.855184: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:33:42.857681: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x49234b0 executing computations on platform Host. Devices:
2019-10-31 09:33:42.857711: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.858149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0c:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.858191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:33:42.858353: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x585baf0 executing computations on platform Host. Devices:
2019-10-31 09:33:42.858387: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:33:42.858874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:06:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:33:42.858942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:33:42.967915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.967994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:33:42.968008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:33:42.968318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:33:42.969387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.969425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:33:42.969437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:33:42.969687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:33:42.973332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.973402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:33:42.973416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:33:42.975511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:33:42.979394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.979472: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:33:42.979486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:33:42.979557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.979628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:33:42.979642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:33:42.980554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.980595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:33:42.980608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:33:42.982272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:33:42.982341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:33:42.983720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.983805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:33:42.983819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:33:42.984121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.984161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:33:42.984174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:33:42.984355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:33:42.984174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:33:42.984643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.984682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:33:42.984695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:33:42.984881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:33:42.985547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.985590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:33:42.985602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:33:42.985879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
2019-10-31 09:33:42.980779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.980842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:33:42.980855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:33:42.981085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
W1031 09:33:42.994619 140599664752384 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:33:42.978768 139735270668032 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:33:43.000526 139735270668032 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:42.982844 139681104721664 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-31 09:33:42.982173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.982206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:33:42.982217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:33:42.982395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
W1031 09:33:42.995758 140556463261440 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:33:43.005537 139681104721664 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:33:42.984108 140153269229312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:33:43.007307 140153269229312 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
2019-10-31 09:33:42.982481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.982517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:33:42.982529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:33:42.982724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
W1031 09:33:42.991235 140519229642496 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-31 09:33:42.981610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.981648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:33:42.981659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:33:42.981865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
W1031 09:33:42.995507 139737117169408 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
2019-10-31 09:33:42.979862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
W1031 09:33:42.993835 140336927958784 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:33:42.997029 139829226764032 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:33:42.995286 140195360659200 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:33:42.994239 140271702304512 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:33:42.994754 139677737731840 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:33:43.013096 140519229642496 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:43.016547 139677737731840 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:43.016844 140336927958784 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:43.018044 140195360659200 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:43.018192 140271702304512 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:43.017959 140599664752384 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:43.018513 139737117169408 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:43.018520 140556463261440 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:43.020489 139829226764032 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
2019-10-31 09:33:42.993818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.993855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:33:42.993867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:33:42.994212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
W1031 09:33:43.011168 140693077554944 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:33:43.003916 139625134778112 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-31 09:33:42.993204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:42.993275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:33:42.993288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:33:42.993502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
W1031 09:33:43.006052 140530158384896 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:33:42.999568 140519047988992 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:33:43.022650 140519047988992 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:43.027833 139625134778112 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:43.028671 140530158384896 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:43.034708 140693077554944 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:33:43.045008 139735270668032 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.054826 139681104721664 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.059210 140153269229312 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.059533 140519229642496 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.061650 139677737731840 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.064337 140195360659200 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.064880 140556463261440 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.065628 139737117169408 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.065797 140336927958784 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.067691 140599664752384 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.068659 139829226764032 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.071913 140519047988992 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.073436 140271702304512 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.074522 140530158384896 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.075871 139625134778112 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:43.082001 140693077554944 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:33:45.444138 139677737731840 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.460240 140153269229312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.460233 140693077554944 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.461824 140195360659200 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.500418 139735270668032 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.500960 139625134778112 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.502329 140530158384896 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.504304 140519047988992 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.519560 140599664752384 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.531743 139681104721664 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.533167 140556463261440 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.546535 140519229642496 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.572159 140336927958784 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.576735 139829226764032 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.601868 139737117169408 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.606902 139677737731840 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.621120 140693077554944 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.625215 140271702304512 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.627010 140195360659200 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.627309 140153269229312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.665405 139625134778112 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.667484 139735270668032 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.668427 140519047988992 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.668923 140530158384896 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.684920 140599664752384 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.711060 140556463261440 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.714397 139681104721664 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.717874 140519229642496 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.739548 140336927958784 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.746958 139829226764032 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.773261 139737117169408 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:33:45.799637 140271702304512 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
W1031 09:33:47.557883 139677737731840 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.560198 140693077554944 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.624082 140195360659200 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.636107 140153269229312 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.643582 139625134778112 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.651612 139735270668032 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.680140 140530158384896 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.724147 140599664752384 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.775033 140519229642496 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.785529 139829226764032 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.788232 139681104721664 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.798910 139737117169408 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.816028 140336927958784 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.823199 140556463261440 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:47.911291 140271702304512 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:33:48.041676 140519047988992 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2019-10-31 09:33:48.216494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:33:48.216593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.216609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:33:48.216620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:33:48.216880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:33:48.242691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:33:48.242809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.242824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:33:48.242834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:33:48.243075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:33:48.323609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:33:48.323757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.323776: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:33:48.323786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:33:48.324072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:33:48.340712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:33:48.340826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.340841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:33:48.340850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:33:48.341093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:33:48.351236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:33:48.351341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.351354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:33:48.351364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:33:48.351623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:33:48.381960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:33:48.382084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.382099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:33:48.382109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:33:48.382446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:33:48.390172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:33:48.390323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.390339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:33:48.390349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:33:48.390635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:33:48.427686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:33:48.427803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.427818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:33:48.427828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:33:48.428213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:33:48.490283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:33:48.490391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.490407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:33:48.490416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:33:48.490670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:33:48.507249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:33:48.507402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.507436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:33:48.507446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:33:48.507933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:33:48.556344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:33:48.556481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.556495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:33:48.556506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:33:48.556947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:33:48.574927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:33:48.575046: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.575062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:33:48.575072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:33:48.575334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:33:48.576334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:33:48.576457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.576475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:33:48.576485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:33:48.577031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:33:48.621957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:33:48.622073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.622088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:33:48.622098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:33:48.622460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:33:48.648510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:33:48.648617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.648632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:33:48.648642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:33:48.648908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:33:48.660298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:33:48.660445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:33:48.660466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:33:48.660477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:33:48.660728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
I1031 09:33:51.565660 139677737731840 session_manager.py:491] Running local_init_op.
I1031 09:33:51.679857 140693077554944 session_manager.py:491] Running local_init_op.
I1031 09:33:51.698377 140195360659200 session_manager.py:491] Running local_init_op.
I1031 09:33:51.745176 139677737731840 session_manager.py:493] Done running local_init_op.
I1031 09:33:51.749810 139625134778112 session_manager.py:491] Running local_init_op.
I1031 09:33:51.770334 139735270668032 session_manager.py:491] Running local_init_op.
I1031 09:33:51.828540 140153269229312 session_manager.py:491] Running local_init_op.
I1031 09:33:51.859666 139829226764032 session_manager.py:491] Running local_init_op.
I1031 09:33:51.862105 140530158384896 session_manager.py:491] Running local_init_op.
I1031 09:33:51.864490 140693077554944 session_manager.py:493] Done running local_init_op.
I1031 09:33:51.870752 140599664752384 session_manager.py:491] Running local_init_op.
I1031 09:33:51.883411 140195360659200 session_manager.py:493] Done running local_init_op.
I1031 09:33:51.935793 139625134778112 session_manager.py:493] Done running local_init_op.
I1031 09:33:51.940668 140519229642496 session_manager.py:491] Running local_init_op.
I1031 09:33:51.945004 139737117169408 session_manager.py:491] Running local_init_op.
I1031 09:33:51.967892 139735270668032 session_manager.py:493] Done running local_init_op.
I1031 09:33:51.979897 140336927958784 session_manager.py:491] Running local_init_op.
I1031 09:33:52.010399 140153269229312 session_manager.py:493] Done running local_init_op.
I1031 09:33:52.041698 140519047988992 session_manager.py:491] Running local_init_op.
I1031 09:33:52.052779 139829226764032 session_manager.py:493] Done running local_init_op.
I1031 09:33:52.060791 140530158384896 session_manager.py:493] Done running local_init_op.
I1031 09:33:52.067558 140599664752384 session_manager.py:493] Done running local_init_op.
I1031 09:33:52.077525 140556463261440 session_manager.py:491] Running local_init_op.
I1031 09:33:52.088519 139681104721664 session_manager.py:491] Running local_init_op.
I1031 09:33:52.091564 140271702304512 session_manager.py:491] Running local_init_op.
I1031 09:33:52.144148 140519229642496 session_manager.py:493] Done running local_init_op.
I1031 09:33:52.144645 139737117169408 session_manager.py:493] Done running local_init_op.
I1031 09:33:52.170006 140336927958784 session_manager.py:493] Done running local_init_op.
I1031 09:33:52.232106 140519047988992 session_manager.py:493] Done running local_init_op.
I1031 09:33:52.265110 140556463261440 session_manager.py:493] Done running local_init_op.
I1031 09:33:52.277443 140271702304512 session_manager.py:493] Done running local_init_op.
I1031 09:33:52.277595 139681104721664 session_manager.py:493] Done running local_init_op.
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
2019-10-31 09:34:20.901503: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:20.901916: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:20.985326: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:21.108461: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:21.122823: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:21.298203: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:21.334801: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:21.336122: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:21.388912: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:21.416389: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:21.620958: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:21.845133: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:21.907949: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:21.945960: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:21.994445: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:34:22.360291: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.19<0>
tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.19<0>
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.19<0>
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.19<0>
tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.27<0>
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.27<0>
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.27<0>
tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.27<0>
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
NCCL version 2.4.2+cuda10.0
tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.18<0>
tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.18<0>
tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.18<0>
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.18<0>
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.31<0>
tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.31<0>
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.31<0>
tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.31<0>
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO comm 0x7fcc343cd5e0 rank 0 nranks 16 cudaDev 0 nvmlDev 0
tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO Setting affinity for GPU 2 to 0fff
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO comm 0x7f929c35d880 rank 6 nranks 16 cudaDev 2 nvmlDev 2
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO comm 0x7f09183cac80 rank 9 nranks 16 cudaDev 1 nvmlDev 1
tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO Setting affinity for GPU 3 to 0fff
tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO comm 0x7fa1cc378dd0 rank 15 nranks 16 cudaDev 3 nvmlDev 3
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Setting affinity for GPU 3 to 0fff
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO comm 0x7efc10383a70 rank 3 nranks 16 cudaDev 3 nvmlDev 3
tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO Setting affinity for GPU 2 to 0fff
tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO comm 0x7f1624349ad0 rank 14 nranks 16 cudaDev 2 nvmlDev 2
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Setting affinity for GPU 2 to 0fff
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO comm 0x7ff4b835d910 rank 2 nranks 16 cudaDev 2 nvmlDev 2
tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO comm 0x7fcec83557f0 rank 1 nranks 16 cudaDev 1 nvmlDev 1
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO Setting affinity for GPU 3 to 0fff
tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO comm 0x7f15b837ba20 rank 11 nranks 16 cudaDev 3 nvmlDev 3
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO comm 0x7fcc3c33bb70 rank 13 nranks 16 cudaDev 1 nvmlDev 1
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO comm 0x7f770837e260 rank 12 nranks 16 cudaDev 0 nvmlDev 0
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO comm 0x7f2b985654d0 rank 4 nranks 16 cudaDev 0 nvmlDev 0
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO comm 0x7f08503a4be0 rank 5 nranks 16 cudaDev 1 nvmlDev 1
tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO Setting affinity for GPU 3 to 0fff
tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO comm 0x7f80d438fcb0 rank 7 nranks 16 cudaDev 3 nvmlDev 3
tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO Setting affinity for GPU 2 to 0fff
tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO comm 0x7fd4e83c3530 rank 10 nranks 16 cudaDev 2 nvmlDev 2
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO comm 0x7fdef84c17c0 rank 8 nranks 16 cudaDev 0 nvmlDev 0
tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO CUDA Dev 3[3], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO CUDA Dev 2[2], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO CUDA Dev 1[1], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO CUDA Dev 0[0], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO CUDA Dev 1[1], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO CUDA Dev 0[0], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO CUDA Dev 2[2], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO CUDA Dev 3[3], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO CUDA Dev 3[3], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO CUDA Dev 2[2], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO CUDA Dev 0[0], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO CUDA Dev 1[1], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO CUDA Dev 2[2], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO CUDA Dev 0[0], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO CUDA Dev 3[3], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO CUDA Dev 1[1], Socket NIC distance :  SOC
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Channel 00 :    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0.
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0)
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0.
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0)
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0.
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0)
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0.
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0)
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 7 -> 8 [receive] via NET/Socket/0
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Ring 00 : 11 -> 12 [receive] via NET/Socket/0
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Ring 00 : 15 -> 0 [receive] via NET/Socket/0
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Ring 00 : 3 -> 4 [receive] via NET/Socket/0
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Ring 00 : 13[1] -> 14[2] via P2P/IPC
tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO Ring 00 : 14[2] -> 15[3] via P2P/IPC
tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO Ring 00 : 6[2] -> 7[3] via P2P/IPC
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Ring 00 : 5[1] -> 6[2] via P2P/IPC
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Ring 00 : 4[0] -> 5[1] via P2P/IPC
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 00 : 2[2] -> 3[3] via P2P/IPC
tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO Ring 00 : 1[1] -> 2[2] via P2P/IPC
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via P2P/IPC
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Ring 00 : 12[0] -> 13[1] via P2P/IPC
tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO Ring 00 : 7 -> 8 [send] via NET/Socket/0
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 8[0] -> 9[1] via P2P/IPC
tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO Ring 00 : 10[2] -> 11[3] via P2P/IPC
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Ring 00 : 9[1] -> 10[2] via P2P/IPC
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Ring 00 : 3 -> 4 [send] via NET/Socket/0
tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO Ring 00 : 15 -> 0 [send] via NET/Socket/0
tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO Ring 00 : 11 -> 12 [send] via NET/Socket/0
tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO Ring 00 : 7[3] -> 6[2] via P2P/IPC
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Ring 00 : 3[3] -> 2[2] via P2P/IPC
tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO Ring 00 : 15[3] -> 14[2] via P2P/IPC
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0)
tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO Ring 00 : 11[3] -> 10[2] via P2P/IPC
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Trees [0] 2->3->-1/-1/-1
tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO Trees [0] 14->15->-1/-1/-1
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 4 -> 8 [receive] via NET/Socket/0
tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO Trees [0] 10->11->-1/-1/-1
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0)
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 00 : 2[2] -> 1[1] via P2P/IPC
tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO Ring 00 : 1[1] -> 0[0] via P2P/IPC
tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO Ring 00 : 14[2] -> 13[1] via P2P/IPC
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Ring 00 : 13[1] -> 12[0] via P2P/IPC
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Trees [0] 1->2->3/-1/-1
tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO Ring 00 : 10[2] -> 9[1] via P2P/IPC
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0)
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Ring 00 : 5[1] -> 4[0] via P2P/IPC
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Ring 00 : 9[1] -> 8[0] via P2P/IPC
tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO Trees [0] 6->7->-1/-1/-1
tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO Trees [0] 13->14->15/-1/-1
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Ring 00 : 4 -> 8 [send] via NET/Socket/0
tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO Ring 00 : 6[2] -> 5[1] via P2P/IPC
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Trees [0] 4->5->6/-1/-1
tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO comm 0x7f15b837ba20 rank 11 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE
tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO Trees [0] 5->6->7/-1/-1
tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO comm 0x7f929c35d880 rank 6 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE
tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO Trees [0] 9->10->11/-1/-1
tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO comm 0x7fd4e83c3530 rank 10 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE
tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO comm 0x7f80d438fcb0 rank 7 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Trees [0] 8->9->10/-1/-1
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO comm 0x7f09183cac80 rank 9 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0)
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Ring 00 : 8 -> 4 [receive] via NET/Socket/0
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 12 -> 8 [receive] via NET/Socket/0
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 8 -> 0 [send] via NET/Socket/0
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0)
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO comm 0x7f08503a4be0 rank 5 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 0 -> 8 [receive] via NET/Socket/0
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 8 -> 4 [send] via NET/Socket/0
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 8 -> 12 [send] via NET/Socket/0
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Trees [0] 8->4->5/-1/-1
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Trees [0] 12->13->14/-1/-1
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO comm 0x7fcc3c33bb70 rank 13 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE
tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO comm 0x7fa1cc378dd0 rank 15 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO comm 0x7efc10383a70 rank 3 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE
tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO Trees [0] 0->1->2/-1/-1
tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO comm 0x7fcec83557f0 rank 1 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Ring 00 : 8 -> 0 [receive] via NET/Socket/0
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Ring 00 : 0 -> 8 [send] via NET/Socket/0
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Trees [0] -1->0->1/8/-1
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Using 256 threads, Min Comp Cap 6, Trees enabled up to size 479999
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO comm 0x7fcc343cd5e0 rank 0 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO comm 0x7ff4b835d910 rank 2 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE
tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO comm 0x7f2b985654d0 rank 4 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Ring 00 : 12 -> 8 [send] via NET/Socket/0
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO include/net.h:24 -> 2
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0)
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Ring 00 : 8 -> 12 [receive] via NET/Socket/0
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Trees [0] 8->12->13/-1/-1
tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO comm 0x7f770837e260 rank 12 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE
tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO comm 0x7f1624349ad0 rank 14 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE
tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Launch mode Parallel
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Trees [0] 0->8->9/4/12
tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO comm 0x7fdef84c17c0 rank 8 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
1	images/sec: 17.4 +/- 0.0 (jitter = 0.0)	7.801
1	images/sec: 17.4 +/- 0.0 (jitter = 0.0)	8.004
1	images/sec: 17.4 +/- 0.0 (jitter = 0.0)	8.178
1	images/sec: 17.4 +/- 0.0 (jitter = 0.0)	7.886
1	images/sec: 17.3 +/- 0.0 (jitter = 0.0)	7.729
1	images/sec: 17.3 +/- 0.0 (jitter = 0.0)	7.780
1	images/sec: 17.4 +/- 0.0 (jitter = 0.0)	7.869
1	images/sec: 17.4 +/- 0.0 (jitter = 0.0)	7.788
1	images/sec: 17.4 +/- 0.0 (jitter = 0.0)	7.565
1	images/sec: 17.4 +/- 0.0 (jitter = 0.0)	7.802
1	images/sec: 17.3 +/- 0.0 (jitter = 0.0)	7.806
1	images/sec: 17.4 +/- 0.0 (jitter = 0.0)	8.090
1	images/sec: 17.4 +/- 0.0 (jitter = 0.0)	7.888
1	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.952
1	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.684
1	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.636
10	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.569
10	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.651
10	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.585
10	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.729
10	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.601
10	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.738
10	images/sec: 17.5 +/- 0.0 (jitter = 0.2)	7.696
10	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.547
10	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.879
10	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	8.061
10	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	8.019
10	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.723
10	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.627
10	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.714
10	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.731
10	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.839
20	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.591
20	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.617
20	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.645
20	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.505
20	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.639
20	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.702
20	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.593
20	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.661
20	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.767
20	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.730
20	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.756
20	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.601
20	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.814
20	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.580
20	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.423
20	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.555
30	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.609
30	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.679
30	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.558
30	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.614
30	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.722
30	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.654
30	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.851
30	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.626
30	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.861
30	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.539
30	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.639
30	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.513
30	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.762
30	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.889
30	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.560
30	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.547
40	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.613
40	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.563
40	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.683
40	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.570
40	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.455
40	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.594
40	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.625
40	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.509
40	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.672
40	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.411
40	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.389
40	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.605
40	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.414
40	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.686
40	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.588
40	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.592
50	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.581
50	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.572
50	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.593
50	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.643
50	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.531
50	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.526
50	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.584
50	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.487
50	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.552
50	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.607
50	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.526
50	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.468
50	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.433
50	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.512
50	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.561
50	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.533
60	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.523
60	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.493
60	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.499
60	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.484
60	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.466
60	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.476
60	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.351
60	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.602
60	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.460
60	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.718
60	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.592
60	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.530
60	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.527
60	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.387
60	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.481
60	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.484
70	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.508
70	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.475
70	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.500
70	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.561
70	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.528
70	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.542
70	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.406
70	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.521
70	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.523
70	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.468
70	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.482
70	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.617
70	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.523
70	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.522
70	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.490
70	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.572
80	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.454
80	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.464
80	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.474
80	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.409
80	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.402
80	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.527
80	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.470
80	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.401
80	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.533
80	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.439
80	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.451
80	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.481
80	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.485
80	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.559
80	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.541
80	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.437
90	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.394
90	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.500
90	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.425
90	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.489
90	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.531
90	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.444
90	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.404
90	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.478
90	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.525
90	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.533
90	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.505
90	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.459
90	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.434
90	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.473
90	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.494
90	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.479
100	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.416
----------------------------------------------------------------
total images/sec: 280.34
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.449
----------------------------------------------------------------
total images/sec: 280.34
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.474
----------------------------------------------------------------
total images/sec: 280.36
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.545
----------------------------------------------------------------
total images/sec: 280.32
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.480
----------------------------------------------------------------
total images/sec: 280.32
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.571
----------------------------------------------------------------
total images/sec: 280.33
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.542
----------------------------------------------------------------
total images/sec: 280.32
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.644
----------------------------------------------------------------
total images/sec: 280.34
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.576
----------------------------------------------------------------
total images/sec: 280.31
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.508
----------------------------------------------------------------
total images/sec: 280.32
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.569
----------------------------------------------------------------
total images/sec: 280.34
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.466
----------------------------------------------------------------
total images/sec: 280.31
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.608
----------------------------------------------------------------
total images/sec: 280.32
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.479
----------------------------------------------------------------
total images/sec: 280.30
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.1)	7.457
----------------------------------------------------------------
total images/sec: 280.30
----------------------------------------------------------------
100	images/sec: 17.5 +/- 0.0 (jitter = 0.0)	7.603
----------------------------------------------------------------
total images/sec: 280.33
----------------------------------------------------------------


Log of running distributed TensorFlow benchmark tests with KubeFlow/mpi-operator without GPUDirect:

+ POD_NAME=tensorflow-benchmarks-worker-0
+ shift
+ /opt/kube/kubectl exec tensorflow-benchmarks-worker-0 -- /bin/sh -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "943194112" -mca ess_base_vpid 1 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[2:99]pq5,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "943194112.0;tcp://10.254.7.17:57414" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "943194112.0;tcp://10.254.7.17:57414" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated"
+ POD_NAME=tensorflow-benchmarks-worker-2
+ shift
+ /opt/kube/kubectl exec tensorflow-benchmarks-worker-2 -- /bin/sh -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "943194112" -mca ess_base_vpid 3 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[2:99]pq5,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "943194112.0;tcp://10.254.7.17:57414" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "943194112.0;tcp://10.254.7.17:57414" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated"
+ POD_NAME=tensorflow-benchmarks-worker-1
+ shift
+ /opt/kube/kubectl exec tensorflow-benchmarks-worker-1 -- /bin/sh -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "943194112" -mca ess_base_vpid 2 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[2:99]pq5,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "943194112.0;tcp://10.254.7.17:57414" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "943194112.0;tcp://10.254.7.17:57414" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated"
+ POD_NAME=tensorflow-benchmarks-worker-3
+ shift
+ /opt/kube/kubectl exec tensorflow-benchmarks-worker-3 -- /bin/sh -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "943194112" -mca ess_base_vpid 4 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[2:99]pq5,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "943194112.0;tcp://10.254.7.17:57414" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "943194112.0;tcp://10.254.7.17:57414" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated"
2019-10-31 09:22:54.631963: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632290: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632290: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632337: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632513: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632527: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632533: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632586: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632550: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632322: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632709: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632709: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632441: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632877: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632446: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:54.632745: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:22:56.367912: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5cc7a70 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.367966: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.367978: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.367987: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.367995: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.369297: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x530d070 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.369376: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.369390: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.369400: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.369408: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.373515: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5b49380 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.373552: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.373563: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.373973: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:22:56.373572: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.373580: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.373574: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4bf80a0 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.373627: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.373641: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.373650: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.373658: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.375236: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:22:56.376547: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5da52a0 executing computations on platform Host. Devices:
2019-10-31 09:22:56.376576: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.376964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:06:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.376995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:22:56.376762: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:22:56.377652: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:22:56.378087: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x53ea8c0 executing computations on platform Host. Devices:
2019-10-31 09:22:56.378116: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.378480: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0c:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.378519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:22:56.379111: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5c26bd0 executing computations on platform Host. Devices:
2019-10-31 09:22:56.379146: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.379513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0e:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.379556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:22:56.380668: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4cd5910 executing computations on platform Host. Devices:
2019-10-31 09:22:56.380699: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.382276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.382317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:22:56.382339: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x61aa7c0 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.382406: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.382422: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.382431: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.382439: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.383896: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x529ddf0 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.383958: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.383970: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.383979: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.383988: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.386278: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4c07240 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.386353: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.386366: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.386375: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.386383: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.387767: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:22:56.388707: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:22:56.390507: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6288000 executing computations on platform Host. Devices:
2019-10-31 09:22:56.390536: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.391321: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x537b630 executing computations on platform Host. Devices:
2019-10-31 09:22:56.391351: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.391016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:06:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.391057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:22:56.391610: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5ef7e10 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.391685: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.391697: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.391707: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.391715: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.391836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0c:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.391866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:22:56.392037: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:22:56.393860: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4f09e80 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.393941: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.393954: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.393963: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.393971: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.394344: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x60fc6c0 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.394384: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.394396: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.394406: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.394415: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.395273: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4ce4a90 executing computations on platform Host. Devices:
2019-10-31 09:22:56.395316: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.396847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:06:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.396893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:22:56.397416: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:22:56.397783: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x588fcd0 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.397826: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.397839: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.397848: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.397857: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.398054: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:22:56.399777: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz
2019-10-31 09:22:56.400107: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5fd5650 executing computations on platform Host. Devices:
2019-10-31 09:22:56.400143: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.400499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0c:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.400522: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x61d9ef0 executing computations on platform Host. Devices:
2019-10-31 09:22:56.400539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:22:56.400550: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.400889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0e:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.400930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:22:56.401425: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x602c630 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.401478: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.401491: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.401499: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.401508: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.401740: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:22:56.403194: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4fe76d0 executing computations on platform Host. Devices:
2019-10-31 09:22:56.403226: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.403824: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4ae4ee0 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.403884: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.403897: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.403906: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.403917: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.404350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:06:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.404381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:22:56.405827: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x596d4f0 executing computations on platform Host. Devices:
2019-10-31 09:22:56.405854: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.405909: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz
2019-10-31 09:22:56.406370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.406412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:22:56.406964: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4e72300 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.407025: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.407037: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.407046: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.407054: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.409017: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6109e60 executing computations on platform Host. Devices:
2019-10-31 09:22:56.409050: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.409434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0e:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.409474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:22:56.410072: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz
2019-10-31 09:22:56.411539: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz
2019-10-31 09:22:56.412921: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4bc2720 executing computations on platform Host. Devices:
2019-10-31 09:22:56.412950: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.413430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.413476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:22:56.413961: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4f4fb60 executing computations on platform Host. Devices:
2019-10-31 09:22:56.413995: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.416716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0c:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.416780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:22:56.417767: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5457b40 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.417842: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.417854: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.417863: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.417871: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.419904: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5abfd50 executing computations on platform CUDA. Devices:
2019-10-31 09:22:56.419943: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.419955: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.419964: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.419973: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:22:56.423879: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:22:56.424970: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:22:56.426821: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55353a0 executing computations on platform Host. Devices:
2019-10-31 09:22:56.426854: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.427604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.427637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:22:56.428632: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5b9d580 executing computations on platform Host. Devices:
2019-10-31 09:22:56.428663: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:22:56.429065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0e:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:22:56.429095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:22:56.514977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.515063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:22:56.515075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:22:56.515311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:22:56.521886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.521957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:22:56.521972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:22:56.522219: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:22:56.523146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.523185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:22:56.523198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:22:56.523406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:22:56.524817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.524852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:22:56.524865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:22:56.525049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:22:56.527885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.527965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:22:56.527979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:22:56.528264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:22:56.525635 139738184173312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-31 09:22:56.530555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.530624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:22:56.530637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:22:56.530876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:22:56.533018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.533090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:22:56.533104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:22:56.533667: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.533777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:22:56.533792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:22:56.534101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:22:56.535090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:22:56.541262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.541337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:22:56.541351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:22:56.541767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.541804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:22:56.541818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
2019-10-31 09:22:56.547678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.547769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:22:56.547784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:22:56.548011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:22:56.549959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.550024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:22:56.550054: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:22:56.550310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:22:56.540767 139688259663616 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:22:56.549264 139738184173312 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
2019-10-31 09:22:56.553952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.553995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:22:56.554008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:22:56.554209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:22:56.555930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.555980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:22:56.555992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:22:56.536351 140029635917568 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:22:56.539637 140418172688128 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:22:56.544073 139855535986432 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
2019-10-31 09:22:56.558225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
W1031 09:22:56.540405 140677311760128 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:22:56.559299 140029635917568 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.562481 140418172688128 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.563024 140677311760128 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.568738 139855535986432 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:22:56.563279 139688259663616 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.571310 140081232451328 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
2019-10-31 09:22:56.542218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.542256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:22:56.542269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:22:56.543952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
W1031 09:22:56.566590 140056334747392 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:22:56.553285 140399337785088 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-31 09:22:56.543732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
W1031 09:22:56.561551 140037793744640 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
2019-10-31 09:22:56.543864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
W1031 09:22:56.558358 140208799237888 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:22:56.575853 140313852450560 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-31 09:22:56.554008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:22:56.554068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:22:56.554081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:22:56.554325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
W1031 09:22:56.569016 140458139940608 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:22:56.560127 140119947085568 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:22:56.577635 140399337785088 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:22:56.559716 140509980870400 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:22:56.556494 140335919838976 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:22:56.581333 140208799237888 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.581812 140509980870400 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.582422 140335919838976 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.582604 140119947085568 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.585775 140037793744640 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.592101 140458139940608 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.592391 140056334747392 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.595070 140081232451328 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.597523 139738184173312 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.601429 140313852450560 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:22:56.605411 140029635917568 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.608743 140677311760128 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.610531 140418172688128 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.615458 139688259663616 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.621234 139855535986432 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.626380 140399337785088 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.628083 140509980870400 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.628359 140119947085568 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.630022 140208799237888 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.634344 140335919838976 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.636383 140037793744640 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.638491 140458139940608 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.642560 140081232451328 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.644791 140056334747392 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:56.653468 140313852450560 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:22:58.918779 139738184173312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:58.993772 139688259663616 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.006589 140509980870400 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.037595 140458139940608 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.040095 140677311760128 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.048176 140208799237888 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.066786 140056334747392 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.072578 140399337785088 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.074004 140418172688128 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.077747 139738184173312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.084980 140081232451328 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.085981 140037793744640 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.091650 140029635917568 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.107937 140335919838976 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.113702 139855535986432 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.128616 140313852450560 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.157861 139688259663616 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.172701 140509980870400 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.176086 140119947085568 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.203059 140677311760128 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.208686 140458139940608 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.214540 140208799237888 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.240148 140056334747392 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.241494 140418172688128 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.245378 140399337785088 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.253228 140037793744640 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.257912 140081232451328 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.264204 140029635917568 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.278820 140335919838976 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.285089 139855535986432 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.302716 140313852450560 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:22:59.361849 140119947085568 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
W1031 09:23:01.022134 139738184173312 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
Initializing graph
W1031 09:23:01.126116 140509980870400 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.158821 139688259663616 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.161875 140458139940608 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.191108 140208799237888 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.192427 140677311760128 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.226984 140399337785088 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.229759 140418172688128 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.262723 140037793744640 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.277419 140335919838976 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.291485 140029635917568 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.304083 140081232451328 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.315738 140313852450560 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.316774 139855535986432 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.547414 140119947085568 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:23:01.630776 140056334747392 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2019-10-31 09:23:01.712294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:23:01.712394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:01.712408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:23:01.712418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:23:01.712657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:23:01.801686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:23:01.801803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:01.801819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:23:01.801829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:23:01.805672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:23:01.845457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:23:01.845549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:01.845563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:23:01.845573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:23:01.846712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:23:01.849217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:23:01.849313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:01.849327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:23:01.849337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:23:01.849595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:23:01.873104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:23:01.873235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:01.873251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:23:01.873261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:23:01.873513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:23:01.875778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:23:01.875880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:01.875896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:23:01.875905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:23:01.876185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:23:01.915287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:23:01.915416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:01.915432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:23:01.915443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:23:01.915764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:23:01.929700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:23:01.929808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:01.929823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:23:01.929833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:23:01.930084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:23:01.949855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:23:01.949970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:01.949986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:23:01.949996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:23:01.950398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:23:01.989288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:23:01.989427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:01.989445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:23:01.989455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:23:01.989732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:23:02.007311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:23:02.007427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:02.007441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:23:02.007451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:23:02.007695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:23:02.018650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:23:02.018750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:02.018764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:23:02.018773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:23:02.019026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:23:02.058512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:23:02.058620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:02.058636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:23:02.058645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:23:02.058895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:23:02.087659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:23:02.087769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:02.087786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:23:02.087796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:23:02.088065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:23:02.263137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:23:02.263242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:02.263257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:23:02.263267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:23:02.263647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:23:02.311255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:23:02.311366: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:23:02.311382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:23:02.311407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:23:02.311675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
I1031 09:23:05.001815 139738184173312 session_manager.py:491] Running local_init_op.
I1031 09:23:05.110615 140509980870400 session_manager.py:491] Running local_init_op.
I1031 09:23:05.175470 139738184173312 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.175794 139688259663616 session_manager.py:491] Running local_init_op.
I1031 09:23:05.194576 140458139940608 session_manager.py:491] Running local_init_op.
I1031 09:23:05.218982 140677311760128 session_manager.py:491] Running local_init_op.
I1031 09:23:05.245636 140208799237888 session_manager.py:491] Running local_init_op.
I1031 09:23:05.287847 140418172688128 session_manager.py:491] Running local_init_op.
I1031 09:23:05.289076 140509980870400 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.352205 140399337785088 session_manager.py:491] Running local_init_op.
I1031 09:23:05.366151 139688259663616 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.377948 140458139940608 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.383546 140335919838976 session_manager.py:491] Running local_init_op.
I1031 09:23:05.389096 140081232451328 session_manager.py:491] Running local_init_op.
I1031 09:23:05.403619 140677311760128 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.410839 140029635917568 session_manager.py:491] Running local_init_op.
I1031 09:23:05.417464 140037793744640 session_manager.py:491] Running local_init_op.
I1031 09:23:05.419884 140208799237888 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.487072 140313852450560 session_manager.py:491] Running local_init_op.
I1031 09:23:05.487062 140418172688128 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.528172 139855535986432 session_manager.py:491] Running local_init_op.
I1031 09:23:05.562064 140335919838976 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.564437 140399337785088 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.572010 140081232451328 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.591631 140029635917568 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.607434 140037793744640 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.667109 140056334747392 session_manager.py:491] Running local_init_op.
I1031 09:23:05.671607 140313852450560 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.704953 139855535986432 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.706148 140119947085568 session_manager.py:491] Running local_init_op.
I1031 09:23:05.865507 140056334747392 session_manager.py:493] Done running local_init_op.
I1031 09:23:05.876705 140119947085568 session_manager.py:493] Done running local_init_op.
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
2019-10-31 09:23:34.466658: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:34.697844: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:34.730197: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:34.755108: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:34.909374: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:34.920007: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:34.939428: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:35.025650: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:35.061560: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:35.124756: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:35.156766: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:35.232465: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:35.394387: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:35.593960: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:35.635274: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:23:36.004079: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.27<0>
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.16<0>
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.16<0>
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.16<0>
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.16<0>
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.16<0>
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.16<0>
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.27<0>
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.27<0>
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.27<0>
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.27<0>
NCCL version 2.4.2+cuda10.0
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.16<0>
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.16<0>
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.25<0>
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.25<0>
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.25<0>
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.25<0>
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.27<0>
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.27<0>
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.27<0>
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.16<0>
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.16<0>
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.25<0>
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.16<0>
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.16<0>
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.25<0>
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.25<0>
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.25<0>
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.16<0>
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.16<0>
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.16<0>
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.16<0>
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO comm 0x7f60783f8e30 rank 0 nranks 16 cudaDev 0 nvmlDev 0
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Setting affinity for GPU 2 to 0fff
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO comm 0x7f5c283d9250 rank 2 nranks 16 cudaDev 2 nvmlDev 2
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO Setting affinity for GPU 3 to 0fff
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO comm 0x7fb0543bc410 rank 1 nranks 16 cudaDev 1 nvmlDev 1
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO comm 0x7f83f83e3580 rank 3 nranks 16 cudaDev 3 nvmlDev 3
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO comm 0x7f6f4846aa90 rank 12 nranks 16 cudaDev 0 nvmlDev 0
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO Setting affinity for GPU 3 to 0fff
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO comm 0x7f66443edf60 rank 7 nranks 16 cudaDev 3 nvmlDev 3
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO comm 0x7f5a4048d110 rank 8 nranks 16 cudaDev 0 nvmlDev 0
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Setting affinity for GPU 2 to 0fff
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO comm 0x7fbe043e2de0 rank 14 nranks 16 cudaDev 2 nvmlDev 2
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO comm 0x7fa1903dd250 rank 13 nranks 16 cudaDev 1 nvmlDev 1
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO Setting affinity for GPU 3 to 0fff
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO comm 0x7fca18409ac0 rank 15 nranks 16 cudaDev 3 nvmlDev 3
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Setting affinity for GPU 2 to 0fff
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO comm 0x7f16643ea9b0 rank 5 nranks 16 cudaDev 1 nvmlDev 1
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO comm 0x7f9c6c47b310 rank 4 nranks 16 cudaDev 0 nvmlDev 0
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO comm 0x7f0ac43ead00 rank 6 nranks 16 cudaDev 2 nvmlDev 2
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO Setting affinity for GPU 3 to 0fff
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Setting affinity for GPU 2 to 0fff
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO comm 0x7ff10c3ee8d0 rank 11 nranks 16 cudaDev 3 nvmlDev 3
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO comm 0x7f31b83e57d0 rank 10 nranks 16 cudaDev 2 nvmlDev 2
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO comm 0x7fb4b83eb6d0 rank 9 nranks 16 cudaDev 1 nvmlDev 1
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Channel 00 :    0   1   3   6   4   5   7  10   8   9  11  14  12  13  15   2
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Channel 01 :    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0.
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0)
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0.
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0)
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0.
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0)
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0.
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0)
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 3 -> 6 [receive] via NET/IB/0
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 00 : 15 -> 2 [receive] via NET/IB/0
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Ring 00 : 11 -> 14 [receive] via NET/IB/0
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Ring 00 : 5[1] -> 7[3] via P2P/IPC
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO Ring 00 : 1[1] -> 3[3] via P2P/IPC
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Ring 00 : 4[0] -> 5[1] via P2P/IPC
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 6[2] -> 4[0] via P2P/IPC
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via P2P/IPC
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO Ring 00 : 13[1] -> 15[3] via P2P/IPC
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 7 -> 10 [receive] via NET/IB/0
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 00 : 2[2] -> 0[0] via P2P/IPC
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Ring 00 : 14[2] -> 12[0] via P2P/IPC
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 00 : 12[0] -> 13[1] via P2P/IPC
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO Ring 00 : 7 -> 10 [send] via NET/IB/0
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO Ring 00 : 15 -> 2 [send] via NET/IB/0
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO Ring 00 : 3 -> 6 [send] via NET/IB/0
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 263 mtu 5 LID 12
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 263 mtu 5 LID 16
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 263 mtu 5 LID 9
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO Ring 00 : 9[1] -> 11[3] via P2P/IPC
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Ring 00 : 8[0] -> 9[1] via P2P/IPC
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 10[2] -> 8[0] via P2P/IPC
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO Ring 00 : 15[3] -> 13[1] via P2P/IPC
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO Ring 00 : 11 -> 14 [send] via NET/IB/0
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO Ring 00 : 13[1] -> 12[0] via P2P/IPC
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 263 mtu 5 LID 14
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO Ring 00 : 7[3] -> 5[1] via P2P/IPC
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 00 : 12[0] -> 14[2] via P2P/IPC
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO Ring 00 : 3[3] -> 1[1] via P2P/IPC
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Ring 00 : 0[0] -> 2[2] via P2P/IPC
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO Ring 00 : 1[1] -> 0[0] via P2P/IPC
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0)
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Ring 00 : 4[0] -> 6[2] via P2P/IPC
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Ring 00 : 5[1] -> 4[0] via P2P/IPC
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO Ring 01 : 13[1] -> 14[2] via P2P/IPC
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO Ring 00 : 11[3] -> 9[1] via P2P/IPC
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0.
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0)
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 00 : 10 -> 2 [receive] via NET/IB/0
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO Ring 01 : 15 -> 0 [send] via NET/IB/0
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Ring 01 : 15 -> 0 [receive] via NET/IB/1
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO Ring 01 : 1[1] -> 2[2] via P2P/IPC
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Ring 00 : 8[0] -> 10[2] via P2P/IPC
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO Ring 00 : 9[1] -> 8[0] via P2P/IPC
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0.
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0)
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 266 mtu 5 LID 16
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Ring 01 : 0[0] -> 1[1] via P2P/IPC
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Ring 00 : 14 -> 10 [send] via NET/IB/0
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0)
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO Ring 01 : 3 -> 4 [send] via NET/IB/0
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 11 -> 12 [receive] via NET/IB/1
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 12[0] -> 13[1] via P2P/IPC
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0.
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0)
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 6 -> 10 [receive] via NET/IB/0
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0.
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0)
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0)
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Ring 01 : 7 -> 8 [receive] via NET/IB/1
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 14 -> 10 [receive] via NET/IB/0
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 267 mtu 5 LID 16
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO Ring 01 : 9[1] -> 10[2] via P2P/IPC
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Ring 01 : 0 -> 12 [send] via NET/IB/1
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO Ring 01 : 11 -> 12 [send] via NET/IB/0
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Ring 01 : 8[0] -> 9[1] via P2P/IPC
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0)
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Ring 01 : 3 -> 4 [receive] via NET/IB/1
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 6 -> 10 [send] via NET/IB/0
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Ring 00 : 10 -> 14 [receive] via NET/IB/0
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 266 mtu 5 LID 14
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 266 mtu 5 LID 12
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 266 mtu 5 LID 9
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 10 -> 2 [send] via NET/IB/0
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 267 mtu 5 LID 14
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Ring 01 : 5[1] -> 6[2] via P2P/IPC
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0)
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO Ring 01 : 7 -> 8 [send] via NET/IB/0
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Ring 01 : 4[0] -> 5[1] via P2P/IPC
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 267 mtu 5 LID 12
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 10 -> 6 [receive] via NET/IB/0
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0)
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 8 -> 12 [receive] via NET/IB/1
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 00 : 2 -> 10 [send] via NET/IB/0
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0)
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 0 -> 12 [receive] via NET/IB/1
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0)
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 267 mtu 5 LID 8
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 2 -> 10 [receive] via NET/IB/0
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0)
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 269 mtu 5 LID 9
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 12 -> 4 [send] via NET/IB/1
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Ring 01 : 8 -> 12 [send] via NET/IB/1
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Ring 01 : 12 -> 4 [receive] via NET/IB/1
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 270 mtu 5 LID 13
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 01 : 2[2] -> 3[3] via P2P/IPC
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 10 -> 6 [send] via NET/IB/0
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 272 mtu 5 LID 10
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 10 -> 14 [send] via NET/IB/0
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 272 mtu 5 LID 14
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO Ring 01 : 3[3] -> 2[2] via P2P/IPC
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 273 mtu 5 LID 14
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO Ring 01 : 1[1] -> 0[0] via P2P/IPC
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0)
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO Trees [0] 1->3->-1/-1/-1 [1] 2->3->-1/-1/-1
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 01 : 2[2] -> 1[1] via P2P/IPC
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Ring 01 : 12 -> 0 [receive] via NET/IB/1
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO Trees [0] 0->1->3/-1/-1 [1] 0->1->2/-1/-1
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 01 : 6[2] -> 7[3] via P2P/IPC
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Ring 01 : 14[2] -> 15[3] via P2P/IPC
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 01 : 10[2] -> 11[3] via P2P/IPC
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO Ring 01 : 15[3] -> 14[2] via P2P/IPC
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO Ring 01 : 11[3] -> 10[2] via P2P/IPC
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO Ring 01 : 7[3] -> 6[2] via P2P/IPC
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Ring 01 : 5[1] -> 4[0] via P2P/IPC
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO Ring 01 : 13[1] -> 12[0] via P2P/IPC
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO Ring 01 : 9[1] -> 8[0] via P2P/IPC
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0)
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO Trees [0] 9->11->-1/-1/-1 [1] 10->11->-1/-1/-1
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 01 : 10[2] -> 9[1] via P2P/IPC
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO Trees [0] 13->15->-1/-1/-1 [1] 14->15->-1/-1/-1
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Ring 01 : 12 -> 8 [receive] via NET/IB/1
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO Trees [0] 8->9->11/-1/-1 [1] 8->9->10/-1/-1
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO Trees [0] 5->7->-1/-1/-1 [1] 6->7->-1/-1/-1
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Ring 01 : 14[2] -> 13[1] via P2P/IPC
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO Trees [0] 12->13->15/-1/-1 [1] 12->13->14/-1/-1
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 01 : 6[2] -> 5[1] via P2P/IPC
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Trees [0] 4->5->7/-1/-1 [1] 4->5->6/-1/-1
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Trees [0] 10->6->4/-1/-1 [1] 5->6->7/-1/-1
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Trees [0] -1->2->0/10/-1 [1] 1->2->3/-1/-1
tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO comm 0x7f5c283d9250 rank 2 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE
tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO comm 0x7f83f83e3580 rank 3 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE
tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO comm 0x7fb0543bc410 rank 1 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Ring 01 : 4 -> 12 [send] via NET/IB/1
tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO comm 0x7f16643ea9b0 rank 5 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE
tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO comm 0x7f66443edf60 rank 7 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO comm 0x7f0ac43ead00 rank 6 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 267 mtu 5 LID 11
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Trees [0] 6->4->5/-1/-1 [1] -1->4->5/12/-1
tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO comm 0x7f9c6c47b310 rank 4 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Trees [0] 2->10->8/6/14 [1] 9->10->11/-1/-1
tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO comm 0x7f31b83e57d0 rank 10 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE
tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO comm 0x7ff10c3ee8d0 rank 11 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE
tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO comm 0x7fb4b83eb6d0 rank 9 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Trees [0] 10->14->12/-1/-1 [1] 13->14->15/-1/-1
tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO comm 0x7fbe043e2de0 rank 14 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE
tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO comm 0x7fca18409ac0 rank 15 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE
tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO comm 0x7fa1903dd250 rank 13 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0)
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 4 -> 12 [receive] via NET/IB/1
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 12 -> 8 [send] via NET/IB/1
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 12 -> 0 [send] via NET/IB/1
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 277 mtu 5 LID 10
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 278 mtu 5 LID 10
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Trees [0] 2->0->1/-1/-1 [1] 12->0->1/-1/-1
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Using 256 threads, Min Comp Cap 6, Trees enabled up to size 479999
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Trees [0] 10->8->9/-1/-1 [1] 12->8->9/-1/-1
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO comm 0x7f60783f8e30 rank 0 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE
tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Launch mode Parallel
tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO comm 0x7f5a4048d110 rank 8 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Trees [0] 14->12->13/-1/-1 [1] 4->12->13/8/0
tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO comm 0x7f6f4846aa90 rank 12 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
1	images/sec: 186.8 +/- 0.0 (jitter = 0.0)	7.892
1	images/sec: 186.7 +/- 0.0 (jitter = 0.0)	7.650
1	images/sec: 186.5 +/- 0.0 (jitter = 0.0)	7.778
1	images/sec: 186.2 +/- 0.0 (jitter = 0.0)	7.585
1	images/sec: 186.4 +/- 0.0 (jitter = 0.0)	7.568
1	images/sec: 187.4 +/- 0.0 (jitter = 0.0)	7.965
1	images/sec: 186.5 +/- 0.0 (jitter = 0.0)	7.828
1	images/sec: 186.4 +/- 0.0 (jitter = 0.0)	8.063
1	images/sec: 186.5 +/- 0.0 (jitter = 0.0)	8.154
1	images/sec: 186.7 +/- 0.0 (jitter = 0.0)	7.896
1	images/sec: 186.1 +/- 0.0 (jitter = 0.0)	7.792
1	images/sec: 186.3 +/- 0.0 (jitter = 0.0)	7.707
1	images/sec: 186.6 +/- 0.0 (jitter = 0.0)	7.768
1	images/sec: 186.0 +/- 0.0 (jitter = 0.0)	7.985
1	images/sec: 186.6 +/- 0.0 (jitter = 0.0)	7.752
1	images/sec: 186.2 +/- 0.0 (jitter = 0.0)	7.902
10	images/sec: 185.8 +/- 0.5 (jitter = 0.7)	7.585
10	images/sec: 185.9 +/- 0.4 (jitter = 0.4)	7.623
10	images/sec: 186.0 +/- 0.4 (jitter = 0.3)	7.543
10	images/sec: 185.9 +/- 0.3 (jitter = 0.6)	7.645
10	images/sec: 185.7 +/- 0.5 (jitter = 0.8)	7.742
10	images/sec: 185.7 +/- 0.5 (jitter = 0.8)	7.718
10	images/sec: 185.7 +/- 0.5 (jitter = 0.9)	7.731
10	images/sec: 185.8 +/- 0.3 (jitter = 0.6)	7.557
10	images/sec: 185.7 +/- 0.4 (jitter = 1.0)	7.771
10	images/sec: 185.6 +/- 0.6 (jitter = 0.8)	7.869
10	images/sec: 185.8 +/- 0.4 (jitter = 0.6)	8.020
10	images/sec: 185.7 +/- 0.4 (jitter = 1.2)	7.594
10	images/sec: 185.7 +/- 0.5 (jitter = 0.3)	7.700
10	images/sec: 185.8 +/- 0.5 (jitter = 0.8)	7.648
10	images/sec: 185.8 +/- 0.3 (jitter = 0.7)	8.038
10	images/sec: 185.7 +/- 0.5 (jitter = 0.8)	7.800
20	images/sec: 186.1 +/- 0.3 (jitter = 0.5)	7.636
20	images/sec: 186.1 +/- 0.3 (jitter = 0.8)	7.773
20	images/sec: 186.1 +/- 0.2 (jitter = 0.5)	7.665
20	images/sec: 186.1 +/- 0.3 (jitter = 0.7)	7.570
20	images/sec: 186.2 +/- 0.2 (jitter = 0.4)	7.657
20	images/sec: 186.1 +/- 0.3 (jitter = 0.5)	7.419
20	images/sec: 186.1 +/- 0.3 (jitter = 0.8)	7.733
20	images/sec: 186.1 +/- 0.2 (jitter = 0.5)	7.520
20	images/sec: 186.1 +/- 0.3 (jitter = 0.7)	7.608
20	images/sec: 186.1 +/- 0.2 (jitter = 0.6)	7.645
20	images/sec: 186.1 +/- 0.2 (jitter = 0.4)	7.771
20	images/sec: 186.2 +/- 0.2 (jitter = 0.5)	7.590
20	images/sec: 186.1 +/- 0.2 (jitter = 0.5)	7.814
20	images/sec: 186.1 +/- 0.2 (jitter = 0.3)	7.683
20	images/sec: 186.1 +/- 0.3 (jitter = 0.5)	7.587
20	images/sec: 186.2 +/- 0.3 (jitter = 1.1)	7.582
30	images/sec: 186.0 +/- 0.2 (jitter = 0.6)	7.602
30	images/sec: 186.0 +/- 0.2 (jitter = 0.5)	7.587
30	images/sec: 186.0 +/- 0.2 (jitter = 0.6)	7.590
30	images/sec: 185.9 +/- 0.3 (jitter = 0.5)	7.559
30	images/sec: 185.9 +/- 0.2 (jitter = 0.6)	7.515
30	images/sec: 185.9 +/- 0.2 (jitter = 0.6)	7.636
30	images/sec: 185.9 +/- 0.2 (jitter = 0.8)	7.850
30	images/sec: 185.9 +/- 0.2 (jitter = 0.8)	7.856
30	images/sec: 185.9 +/- 0.3 (jitter = 0.8)	7.596
30	images/sec: 185.9 +/- 0.3 (jitter = 0.6)	7.750
30	images/sec: 185.9 +/- 0.3 (jitter = 0.9)	7.683
30	images/sec: 185.9 +/- 0.3 (jitter = 0.7)	7.593
30	images/sec: 185.9 +/- 0.3 (jitter = 0.8)	7.630
30	images/sec: 185.9 +/- 0.3 (jitter = 0.8)	7.518
30	images/sec: 185.9 +/- 0.3 (jitter = 0.9)	7.754
30	images/sec: 185.9 +/- 0.3 (jitter = 0.7)	7.876
40	images/sec: 185.6 +/- 0.2 (jitter = 0.7)	7.408
40	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.657
40	images/sec: 185.6 +/- 0.2 (jitter = 0.8)	7.647
40	images/sec: 185.6 +/- 0.3 (jitter = 0.6)	7.562
40	images/sec: 185.7 +/- 0.3 (jitter = 0.6)	7.656
40	images/sec: 185.6 +/- 0.3 (jitter = 0.8)	7.656
40	images/sec: 185.7 +/- 0.3 (jitter = 0.9)	7.432
40	images/sec: 185.6 +/- 0.3 (jitter = 0.8)	7.660
40	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.581
40	images/sec: 185.6 +/- 0.2 (jitter = 0.7)	7.613
40	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.540
40	images/sec: 185.6 +/- 0.3 (jitter = 0.9)	7.680
40	images/sec: 185.7 +/- 0.2 (jitter = 0.6)	7.548
40	images/sec: 185.6 +/- 0.3 (jitter = 1.1)	7.512
40	images/sec: 185.7 +/- 0.3 (jitter = 0.8)	7.367
40	images/sec: 185.7 +/- 0.3 (jitter = 0.8)	7.629
50	images/sec: 185.7 +/- 0.3 (jitter = 0.7)	7.626
50	images/sec: 185.6 +/- 0.3 (jitter = 0.8)	7.551
50	images/sec: 185.7 +/- 0.3 (jitter = 0.8)	7.625
50	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.580
50	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.506
50	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.540
50	images/sec: 185.6 +/- 0.2 (jitter = 0.8)	7.560
50	images/sec: 185.6 +/- 0.2 (jitter = 0.7)	7.497
50	images/sec: 185.7 +/- 0.2 (jitter = 1.0)	7.445
50	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.505
50	images/sec: 185.6 +/- 0.2 (jitter = 0.8)	7.550
50	images/sec: 185.6 +/- 0.3 (jitter = 0.9)	7.512
50	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.578
50	images/sec: 185.7 +/- 0.2 (jitter = 1.0)	7.489
50	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.441
50	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.583
60	images/sec: 185.8 +/- 0.2 (jitter = 0.6)	7.435
60	images/sec: 185.8 +/- 0.2 (jitter = 0.7)	7.470
60	images/sec: 185.8 +/- 0.2 (jitter = 0.8)	7.473
60	images/sec: 185.8 +/- 0.2 (jitter = 0.6)	7.435
60	images/sec: 185.8 +/- 0.2 (jitter = 0.8)	7.376
60	images/sec: 185.8 +/- 0.2 (jitter = 0.6)	7.497
60	images/sec: 185.8 +/- 0.2 (jitter = 0.9)	7.574
60	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.555
60	images/sec: 185.8 +/- 0.2 (jitter = 0.7)	7.465
60	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.562
60	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.473
60	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.632
60	images/sec: 185.8 +/- 0.2 (jitter = 0.6)	7.509
60	images/sec: 185.8 +/- 0.2 (jitter = 0.7)	7.473
60	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.532
60	images/sec: 185.8 +/- 0.2 (jitter = 0.8)	7.550
70	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.471
70	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.461
70	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.517
70	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.493
70	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.585
70	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.462
70	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.520
70	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.530
70	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.535
70	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.426
70	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.560
70	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.475
70	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.464
70	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.511
70	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.503
70	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.524
80	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.381
80	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.453
80	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.487
80	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.471
80	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.425
80	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.466
80	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.464
80	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.436
80	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.453
80	images/sec: 185.7 +/- 0.2 (jitter = 0.6)	7.540
80	images/sec: 185.8 +/- 0.2 (jitter = 0.8)	7.503
80	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.479
80	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.384
80	images/sec: 185.7 +/- 0.1 (jitter = 0.7)	7.501
80	images/sec: 185.8 +/- 0.2 (jitter = 0.8)	7.512
80	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.410
90	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.518
90	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.459
90	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.506
90	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.395
90	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.444
90	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.447
90	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.428
90	images/sec: 185.8 +/- 0.2 (jitter = 0.8)	7.527
90	images/sec: 185.8 +/- 0.1 (jitter = 0.8)	7.485
90	images/sec: 185.7 +/- 0.1 (jitter = 0.7)	7.475
90	images/sec: 185.8 +/- 0.2 (jitter = 0.8)	7.450
90	images/sec: 185.7 +/- 0.1 (jitter = 0.6)	7.500
90	images/sec: 185.7 +/- 0.1 (jitter = 0.7)	7.424
90	images/sec: 185.7 +/- 0.1 (jitter = 0.8)	7.404
90	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.528
90	images/sec: 185.8 +/- 0.1 (jitter = 0.8)	7.553
100	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.489
----------------------------------------------------------------
total images/sec: 2969.87
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.606
----------------------------------------------------------------
total images/sec: 2969.87
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.2 (jitter = 0.9)	7.510
----------------------------------------------------------------
total images/sec: 2969.87
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.1 (jitter = 0.8)	7.630
----------------------------------------------------------------
total images/sec: 2969.83
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.2 (jitter = 0.7)	7.440
----------------------------------------------------------------
total images/sec: 2969.91
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.614
----------------------------------------------------------------
total images/sec: 2970.02
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.420
----------------------------------------------------------------
total images/sec: 2969.89
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.2 (jitter = 0.8)	7.548
----------------------------------------------------------------
total images/sec: 2969.84
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.1 (jitter = 0.7)	7.416
----------------------------------------------------------------
total images/sec: 2969.87
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.1 (jitter = 0.8)	7.449
----------------------------------------------------------------
total images/sec: 2970.00
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.1 (jitter = 0.8)	7.457
----------------------------------------------------------------
total images/sec: 2969.87
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.1 (jitter = 0.8)	7.480
----------------------------------------------------------------
total images/sec: 2969.98
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.1 (jitter = 0.6)	7.553
----------------------------------------------------------------
total images/sec: 2969.87
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.1 (jitter = 0.8)	7.476
----------------------------------------------------------------
total images/sec: 2969.95
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.1 (jitter = 0.7)	7.432
----------------------------------------------------------------
total images/sec: 2969.89
----------------------------------------------------------------
100	images/sec: 185.7 +/- 0.1 (jitter = 0.7)	7.514
----------------------------------------------------------------
total images/sec: 2969.75
----------------------------------------------------------------


Log of running distributed TensorFlow benchmark tests with KubeFlow/mpi-operator with GPUDirect:

+ POD_NAME=tensorflow-benchmarks-worker-2
+ shift
+ /opt/kube/kubectl exec tensorflow-benchmarks-worker-2 -- /bin/sh -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "1488060416" -mca ess_base_vpid 3 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-zc[2:68]w,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "1488060416.0;tcp://10.254.5.28:54316" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "1488060416.0;tcp://10.254.5.28:54316" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated"
+ POD_NAME=tensorflow-benchmarks-worker-0
+ shift
+ /opt/kube/kubectl exec tensorflow-benchmarks-worker-0 -- /bin/sh -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "1488060416" -mca ess_base_vpid 1 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-zc[2:68]w,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "1488060416.0;tcp://10.254.5.28:54316" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "1488060416.0;tcp://10.254.5.28:54316" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated"
+ POD_NAME=tensorflow-benchmarks-worker-1
+ shift
+ /opt/kube/kubectl exec tensorflow-benchmarks-worker-1 -- /bin/sh -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "1488060416" -mca ess_base_vpid 2 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-zc[2:68]w,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "1488060416.0;tcp://10.254.5.28:54316" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "1488060416.0;tcp://10.254.5.28:54316" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated"
+ POD_NAME=tensorflow-benchmarks-worker-3
+ shift
+ /opt/kube/kubectl exec tensorflow-benchmarks-worker-3 -- /bin/sh -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ;   /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "1488060416" -mca ess_base_vpid 4 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-zc[2:68]w,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "1488060416.0;tcp://10.254.5.28:54316" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "1488060416.0;tcp://10.254.5.28:54316" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated"
2019-10-31 09:28:00.800628: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.800997: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.800997: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801030: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801030: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801282: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801376: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801302: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801649: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801601: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801636: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801388: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801705: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801389: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801536: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:00.801461: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 09:28:02.552940: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x577afe0 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.553014: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.553027: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.553035: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.553043: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.558257: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz
2019-10-31 09:28:02.558331: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x571fd40 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.558380: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.558401: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.558409: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.558418: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.560581: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5858810 executing computations on platform Host. Devices:
2019-10-31 09:28:02.560613: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.560953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0c:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.560997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:28:02.562364: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz
2019-10-31 09:28:02.565507: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x57fd570 executing computations on platform Host. Devices:
2019-10-31 09:28:02.565538: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.566019: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4feb450 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.566090: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.566103: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.566112: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.566120: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.566436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:06:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.566477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:28:02.568316: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x61a8df0 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.568370: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.568382: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.568390: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.568398: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.569013: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x63f9d80 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.569074: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.569088: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.569097: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.569105: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.569386: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x61242b0 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.569435: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.569448: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.569457: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.569465: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.572749: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz
2019-10-31 09:28:02.573019: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x599ee80 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.573059: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.573071: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.573080: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.573088: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.575052: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:28:02.576129: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x50c8cb0 executing computations on platform Host. Devices:
2019-10-31 09:28:02.576165: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.576414: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5c5e400 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.576478: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:28:02.576456: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.576467: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.576482: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.576490: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.576516: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:28:02.576594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.576640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:28:02.577872: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:28:02.578533: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x580b140 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.578627: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.578640: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.578649: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.578657: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.579048: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6286630 executing computations on platform Host. Devices:
2019-10-31 09:28:02.579134: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.579233: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6201b00 executing computations on platform Host. Devices:
2019-10-31 09:28:02.579274: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.579465: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x64d75b0 executing computations on platform Host. Devices:
2019-10-31 09:28:02.579501: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.579561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:06:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.579598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:28:02.579699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0c:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.579745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:28:02.579795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.579824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:28:02.580795: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5a7c6d0 executing computations on platform Host. Devices:
2019-10-31 09:28:02.580832: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.581184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0e:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.581222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:28:02.582929: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz
2019-10-31 09:28:02.584555: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x504b780 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.584604: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.584619: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.584627: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.584635: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.586058: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5d3bc70 executing computations on platform Host. Devices:
2019-10-31 09:28:02.586090: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.586869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0e:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.586912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:28:02.587132: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:28:02.588489: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x53cfb70 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.588534: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.588547: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.588556: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.588565: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.589801: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:28:02.589983: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x58e89b0 executing computations on platform Host. Devices:
2019-10-31 09:28:02.590017: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.591058: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:06:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.591099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:28:02.592938: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5128fc0 executing computations on platform Host. Devices:
2019-10-31 09:28:02.592970: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.593754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0e:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.593793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:28:02.597195: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:28:02.600271: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5524250 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.600348: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.600359: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.600368: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.600376: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.600478: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x54ad3d0 executing computations on platform Host. Devices:
2019-10-31 09:28:02.600511: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.601636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0c:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.601681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:28:02.604752: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:28:02.608007: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5601aa0 executing computations on platform Host. Devices:
2019-10-31 09:28:02.608058: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.608527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0e:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.608557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:28:02.609073: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x58ca930 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.609136: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.609150: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.609159: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.609168: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.610637: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4b03cf0 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.610699: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.610712: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.610721: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.610729: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.613057: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4ff4940 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.613092: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.613104: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.613114: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.613123: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.614904: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:28:02.616644: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:28:02.617349: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:28:02.617666: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x59a8170 executing computations on platform Host. Devices:
2019-10-31 09:28:02.617696: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.618133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.618177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:28:02.619870: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x50d2180 executing computations on platform Host. Devices:
2019-10-31 09:28:02.619901: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.620285: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4be1520 executing computations on platform Host. Devices:
2019-10-31 09:28:02.620314: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.620599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0c:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.620630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:28:02.620790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:04:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.620819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:28:02.621544: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6531630 executing computations on platform CUDA. Devices:
2019-10-31 09:28:02.621602: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.621615: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.621625: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.621634: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-10-31 09:28:02.629231: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2019-10-31 09:28:02.632316: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x660ee60 executing computations on platform Host. Devices:
2019-10-31 09:28:02.632347: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 09:28:02.633159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:06:00.0
totalMemory: 15.90GiB freeMemory: 14.90GiB
2019-10-31 09:28:02.633189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:28:02.694075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.694155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:28:02.694169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:28:02.694406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:28:02.701335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.701383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:28:02.701396: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:28:02.723552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.723631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:28:02.723645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:28:02.725370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.725442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:28:02.725455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:28:02.725764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.725830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:28:02.725844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:28:02.725697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:28:02.725870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:28:02.726036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.726079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:28:02.726091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:28:02.726137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:28:02.726303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:28:02.726337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.726375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:28:02.726391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:28:02.726585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:28:02.730960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.731001: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:28:02.731014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:28:02.731236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:28:02.738230 139856166496000 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-31 09:28:02.738561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.738615: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:28:02.738628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:28:02.738826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
2019-10-31 09:28:02.712095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.712132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:28:02.712144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:28:02.712327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
W1031 09:28:02.731517 139818715670272 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-31 09:28:02.701587: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
W1031 09:28:02.727308 140163584431872 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
2019-10-31 09:28:02.709091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.709152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:28:02.709164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:28:02.709379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
W1031 09:28:02.732473 140586624427776 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:28:02.710931 140554276726528 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:28:02.735715 140554276726528 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
2019-10-31 09:28:02.744438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.744511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:28:02.744524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:28:02.744780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
W1031 09:28:02.754183 140163584431872 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:28:02.753435 139908811421440 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:28:02.745885 139841235764992 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:28:02.742853 139665852200704 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:28:02.737485 139662100997888 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:28:02.743636 139982112827136 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:28:02.756311 139818715670272 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:28:02.757796 140586624427776 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
2019-10-31 09:28:02.758983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.759041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:28:02.759055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:28:02.759335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
W1031 09:28:02.759724 139662100997888 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
2019-10-31 09:28:02.745103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.745149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:28:02.745162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:28:02.746129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
W1031 09:28:02.753757 140131597530880 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:28:02.766823 139665852200704 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:28:02.767652 139982112827136 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
2019-10-31 09:28:02.767762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.767823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:28:02.767837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:28:02.768104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:28:02.769409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:02.769470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:28:02.769483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:28:02.769816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
W1031 09:28:02.770942 139841235764992 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:28:02.756439 139799656847104 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:28:02.761564 139856166496000 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:28:02.767498 140718879745792 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:28:02.777062 139853571868416 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:28:02.776549 140131597530880 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
TensorFlow:  1.13
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  512 global
             32 per device
Num batches: 100
Num epochs:  0.04
Devices:     ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   horovod
==========
Generating training model
W1031 09:28:02.779106 139928895440640 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:28:02.779887 139908811421440 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:28:02.782153 139799656847104 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:28:02.787169 140554276726528 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.791988 140017035462400 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W1031 09:28:02.792617 140718879745792 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:28:02.801528 139853571868416 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:28:02.802262 139928895440640 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:28:02.805729 139662100997888 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.806548 140163584431872 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.807551 140586624427776 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.808814 139818715670272 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.808916 139856166496000 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.815933 139982112827136 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.817563 139665852200704 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.818260 140017035462400 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W1031 09:28:02.821896 139841235764992 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.826172 140131597530880 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.828052 139908811421440 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.835323 139799656847104 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.841519 140718879745792 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.849115 139928895440640 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.852732 139853571868416 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:02.871250 140017035462400 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W1031 09:28:05.200214 140554276726528 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.218427 139662100997888 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.223022 140586624427776 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.231928 140131597530880 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.257417 139908811421440 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.267662 139982112827136 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.278401 140163584431872 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.293057 140718879745792 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.301071 139853571868416 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.304152 139665852200704 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.308290 139841235764992 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.311634 139856166496000 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.324692 139799656847104 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.337773 139928895440640 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.370465 140554276726528 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.386629 139662100997888 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.389393 140586624427776 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.395435 140131597530880 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.402461 140017035462400 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.422204 139908811421440 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.434711 139982112827136 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.452119 140163584431872 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.464861 139853571868416 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.472254 140718879745792 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.472842 139665852200704 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.478122 139856166496000 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.478706 139841235764992 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.488337 139818715670272 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.490337 139799656847104 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.516188 139928895440640 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.572691 140017035462400 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W1031 09:28:05.681944 139818715670272 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
Initializing graph
W1031 09:28:07.374773 140554276726528 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.388925 140131597530880 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.394137 139662100997888 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.403966 139982112827136 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.430644 139853571868416 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.430839 139908811421440 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.458472 139799656847104 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.483218 139856166496000 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
Initializing graph
W1031 09:28:07.488722 139841235764992 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.499745 140163584431872 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.507745 139665852200704 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.528688 140718879745792 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.568386 139928895440640 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.604121 140017035462400 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.769806 140586624427776 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1031 09:28:07.904350 139818715670272 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2019-10-31 09:28:08.072869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:28:08.072974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.072988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:28:08.072998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:28:08.073250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:28:08.081420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:28:08.081544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.081559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:28:08.081569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:28:08.081911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:28:08.086161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:28:08.086268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.086283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:28:08.086292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:28:08.086530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:28:08.112899: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:28:08.113008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.113023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:28:08.113033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:28:08.113283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:28:08.123983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:28:08.124095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.124112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:28:08.124123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:28:08.124359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:28:08.165104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:28:08.165240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.165260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:28:08.165270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:28:08.165595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:28:08.194105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:28:08.194211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.194226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:28:08.194236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:28:08.194487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:28:08.198259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:28:08.198364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.198379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:28:08.198389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:28:08.198660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:28:08.208208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:28:08.208331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.208346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:28:08.208356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:28:08.208692: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
2019-10-31 09:28:08.226755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:28:08.226854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.226869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:28:08.226893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:28:08.227156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:28:08.228849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:28:08.228959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.228974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:28:08.228984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:28:08.229243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:28:08.270501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:28:08.270605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.270621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:28:08.270630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:28:08.270879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:28:08.318952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1
2019-10-31 09:28:08.319104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.319121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      1 
2019-10-31 09:28:08.319132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   N 
2019-10-31 09:28:08.319487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0)
2019-10-31 09:28:08.328654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2
2019-10-31 09:28:08.328751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.328766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      2 
2019-10-31 09:28:08.328777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2:   N 
2019-10-31 09:28:08.329034: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0)
2019-10-31 09:28:08.365747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 09:28:08.365852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.365867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 09:28:08.365876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 09:28:08.366110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2019-10-31 09:28:08.678376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3
2019-10-31 09:28:08.678506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 09:28:08.678522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      3 
2019-10-31 09:28:08.678532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3:   N 
2019-10-31 09:28:08.679667: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0)
I1031 09:28:11.440287 139982112827136 session_manager.py:491] Running local_init_op.
I1031 09:28:11.446637 140554276726528 session_manager.py:491] Running local_init_op.
I1031 09:28:11.481120 139853571868416 session_manager.py:491] Running local_init_op.
I1031 09:28:11.490534 139662100997888 session_manager.py:491] Running local_init_op.
I1031 09:28:11.541015 139908811421440 session_manager.py:491] Running local_init_op.
I1031 09:28:11.559617 140131597530880 session_manager.py:491] Running local_init_op.
I1031 09:28:11.591317 139799656847104 session_manager.py:491] Running local_init_op.
I1031 09:28:11.606469 139841235764992 session_manager.py:491] Running local_init_op.
I1031 09:28:11.627890 139982112827136 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.633262 140554276726528 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.646192 139665852200704 session_manager.py:491] Running local_init_op.
I1031 09:28:11.661424 140718879745792 session_manager.py:491] Running local_init_op.
I1031 09:28:11.665061 139853571868416 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.668709 139856166496000 session_manager.py:491] Running local_init_op.
I1031 09:28:11.677753 139662100997888 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.721434 140163584431872 session_manager.py:491] Running local_init_op.
I1031 09:28:11.736203 140017035462400 session_manager.py:491] Running local_init_op.
I1031 09:28:11.739211 139908811421440 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.763015 140131597530880 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.778822 140586624427776 session_manager.py:491] Running local_init_op.
I1031 09:28:11.791004 139799656847104 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.792444 139841235764992 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.840747 139665852200704 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.841266 139928895440640 session_manager.py:491] Running local_init_op.
I1031 09:28:11.859958 140718879745792 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.861997 139856166496000 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.902634 140163584431872 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.913870 140017035462400 session_manager.py:493] Done running local_init_op.
I1031 09:28:11.981170 140586624427776 session_manager.py:493] Done running local_init_op.
I1031 09:28:12.039165 139928895440640 session_manager.py:493] Done running local_init_op.
I1031 09:28:12.076092 139818715670272 session_manager.py:491] Running local_init_op.
I1031 09:28:12.256900 139818715670272 session_manager.py:493] Done running local_init_op.
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
Running warm up
2019-10-31 09:28:40.740985: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:40.810479: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:40.975794: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.009451: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.137098: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.326322: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.340050: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.355355: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.369061: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.440759: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.484446: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.532443: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.560915: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.690003: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.727085: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-31 09:28:41.974115: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.26<0>
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.29<0>
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.29<0>
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.29<0>
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.29<0>
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.17<0>
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.17<0>
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.17<0>
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.17<0>
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.26<0>
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.26<0>
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.26<0>
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.26<0>
NCCL version 2.4.2+cuda10.0
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.18<0>
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.18<0>
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.18<0>
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.18<0>
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.26<0>
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.26<0>
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.26<0>
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.18<0>
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.18<0>
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.18<0>
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.18<0>
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.29<0>
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.29<0>
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.29<0>
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.29<0>
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.17<0>
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.17<0>
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.17<0>
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.17<0>
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO comm 0x7fdbf03f2980 rank 0 nranks 16 cudaDev 0 nvmlDev 0
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Setting affinity for GPU 2 to 0fff
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO comm 0x7fd4683eab00 rank 2 nranks 16 cudaDev 2 nvmlDev 2
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO comm 0x7f57503a6770 rank 9 nranks 16 cudaDev 1 nvmlDev 1
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Setting affinity for GPU 3 to 0fff
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO comm 0x7f29243fcf30 rank 3 nranks 16 cudaDev 3 nvmlDev 3
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO comm 0x7f79703ef4d0 rank 1 nranks 16 cudaDev 1 nvmlDev 1
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO comm 0x7f04ac3dc080 rank 12 nranks 16 cudaDev 0 nvmlDev 0
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Setting affinity for GPU 2 to 0fff
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO comm 0x7f42cc3d4910 rank 10 nranks 16 cudaDev 2 nvmlDev 2
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO Setting affinity for GPU 3 to 0fff
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO Setting affinity for GPU 3 to 0fff
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO comm 0x7f31404045c0 rank 8 nranks 16 cudaDev 0 nvmlDev 0
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO comm 0x7f3e203c5400 rank 11 nranks 16 cudaDev 3 nvmlDev 3
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Setting affinity for GPU 2 to 0fff
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO comm 0x7f4f303a9880 rank 14 nranks 16 cudaDev 2 nvmlDev 2
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO comm 0x7f058c3a3740 rank 13 nranks 16 cudaDev 1 nvmlDev 1
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO comm 0x7f2e643dd590 rank 15 nranks 16 cudaDev 3 nvmlDev 3
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO Setting affinity for GPU 1 to 0fff
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO comm 0x7f31dc403440 rank 5 nranks 16 cudaDev 1 nvmlDev 1
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Setting affinity for GPU 0 to 0fff
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO comm 0x7ffab84792e0 rank 4 nranks 16 cudaDev 0 nvmlDev 0
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO Setting affinity for GPU 3 to 0fff
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Setting affinity for GPU 2 to 0fff
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO comm 0x7f71fc4035a0 rank 6 nranks 16 cudaDev 2 nvmlDev 2
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO comm 0x7f24b43e5c10 rank 7 nranks 16 cudaDev 3 nvmlDev 3
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  PIX PHB
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  PHB PIX
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Channel 00 :    0   1   3   6   4   5   7  10   8   9  11  14  12  13  15   2
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Channel 01 :    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1.
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1.
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1.
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1.
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Ring 00 : 11 -> 14 [receive] via NET/IB/0/GDRDMA
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Ring 00 : 15 -> 2 [receive] via NET/IB/0/GDRDMA
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 7 -> 10 [receive] via NET/IB/0/GDRDMA
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 3 -> 6 [receive] via NET/IB/0/GDRDMA
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO Ring 00 : 1[1] -> 3[3] via P2P/IPC
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via P2P/IPC
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Ring 00 : 8[0] -> 9[1] via P2P/IPC
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Ring 00 : 9[1] -> 11[3] via P2P/IPC
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Ring 00 : 13[1] -> 15[3] via P2P/IPC
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 00 : 12[0] -> 13[1] via P2P/IPC
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Ring 00 : 2[2] -> 0[0] via P2P/IPC
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO Ring 00 : 5[1] -> 7[3] via P2P/IPC
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Ring 00 : 4[0] -> 5[1] via P2P/IPC
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Ring 00 : 14[2] -> 12[0] via P2P/IPC
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 10[2] -> 8[0] via P2P/IPC
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 6[2] -> 4[0] via P2P/IPC
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO Ring 00 : 11 -> 14 [send] via NET/IB/0
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO Ring 00 : 15 -> 2 [send] via NET/IB/0
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Ring 00 : 3 -> 6 [send] via NET/IB/0
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO Ring 00 : 7 -> 10 [send] via NET/IB/0
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 270 mtu 5 LID 9
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 270 mtu 5 LID 12
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 270 mtu 5 LID 16
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 276 mtu 5 LID 14
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO Ring 00 : 15[3] -> 13[1] via P2P/IPC
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO Ring 00 : 11[3] -> 9[1] via P2P/IPC
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Ring 00 : 3[3] -> 1[1] via P2P/IPC
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Ring 00 : 8[0] -> 10[2] via P2P/IPC
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Ring 00 : 13[1] -> 12[0] via P2P/IPC
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO Ring 00 : 1[1] -> 0[0] via P2P/IPC
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Ring 00 : 0[0] -> 2[2] via P2P/IPC
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO Ring 00 : 7[3] -> 5[1] via P2P/IPC
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Ring 00 : 9[1] -> 8[0] via P2P/IPC
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 00 : 12[0] -> 14[2] via P2P/IPC
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO Ring 00 : 5[1] -> 4[0] via P2P/IPC
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1.
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1.
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Ring 00 : 4[0] -> 6[2] via P2P/IPC
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1.
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 6 -> 10 [receive] via NET/IB/0/GDRDMA
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Ring 00 : 10 -> 2 [receive] via NET/IB/0/GDRDMA
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1.
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Ring 00 : 14 -> 10 [send] via NET/IB/0
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Ring 01 : 13[1] -> 14[2] via P2P/IPC
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO Ring 01 : 1[1] -> 2[2] via P2P/IPC
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO Ring 01 : 15 -> 0 [send] via NET/IB/0
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 11 -> 12 [receive] via NET/IB/1/GDRDMA
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Ring 01 : 15 -> 0 [receive] via NET/IB/1/GDRDMA
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Ring 01 : 3 -> 4 [send] via NET/IB/0
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 273 mtu 5 LID 12
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Ring 01 : 0[0] -> 1[1] via P2P/IPC
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 12[0] -> 13[1] via P2P/IPC
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Ring 01 : 7 -> 8 [receive] via NET/IB/1/GDRDMA
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 6 -> 10 [send] via NET/IB/0
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO Ring 01 : 5[1] -> 6[2] via P2P/IPC
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO Ring 01 : 11 -> 12 [send] via NET/IB/0
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO Ring 01 : 7 -> 8 [send] via NET/IB/0
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 279 mtu 5 LID 14
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Ring 01 : 9[1] -> 10[2] via P2P/IPC
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Ring 01 : 3 -> 4 [receive] via NET/IB/1/GDRDMA
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 273 mtu 5 LID 9
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 280 mtu 5 LID 14
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Ring 01 : 8[0] -> 9[1] via P2P/IPC
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 14 -> 10 [receive] via NET/IB/0/GDRDMA
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 273 mtu 5 LID 16
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 274 mtu 5 LID 12
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Ring 01 : 4[0] -> 5[1] via P2P/IPC
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 10 -> 6 [receive] via NET/IB/0/GDRDMA
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 10 -> 2 [send] via NET/IB/0
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 274 mtu 5 LID 9
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Ring 01 : 0 -> 12 [send] via NET/IB/1
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Ring 00 : 10 -> 14 [receive] via NET/IB/0/GDRDMA
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Ring 01 : 12 -> 4 [receive] via NET/IB/1/GDRDMA
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 8 -> 12 [receive] via NET/IB/1/GDRDMA
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Ring 00 : 2 -> 10 [send] via NET/IB/0
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 0 -> 12 [receive] via NET/IB/1/GDRDMA
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Ring 01 : 8 -> 12 [send] via NET/IB/1
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 283 mtu 5 LID 10
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 272 mtu 5 LID 8
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 12 -> 4 [send] via NET/IB/1
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 270 mtu 5 LID 11
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 2 -> 10 [receive] via NET/IB/0/GDRDMA
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 276 mtu 5 LID 16
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 10 -> 6 [send] via NET/IB/0
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Ring 01 : 2[2] -> 3[3] via P2P/IPC
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 10 -> 14 [send] via NET/IB/0
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 279 mtu 5 LID 9
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Ring 01 : 3[3] -> 2[2] via P2P/IPC
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 280 mtu 5 LID 9
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO Ring 01 : 1[1] -> 0[0] via P2P/IPC
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Trees [0] 1->3->-1/-1/-1 [1] 2->3->-1/-1/-1
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Ring 01 : 2[2] -> 1[1] via P2P/IPC
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO Trees [0] 0->1->3/-1/-1 [1] 0->1->2/-1/-1
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Ring 01 : 14[2] -> 15[3] via P2P/IPC
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 01 : 6[2] -> 7[3] via P2P/IPC
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 01 : 10[2] -> 11[3] via P2P/IPC
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO Ring 01 : 15[3] -> 14[2] via P2P/IPC
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO Ring 01 : 11[3] -> 10[2] via P2P/IPC
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Ring 01 : 13[1] -> 12[0] via P2P/IPC
tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO comm 0x7f29243fcf30 rank 3 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO Ring 01 : 7[3] -> 6[2] via P2P/IPC
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Trees [0] -1->2->0/10/-1 [1] 1->2->3/-1/-1
tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO comm 0x7fd4683eab00 rank 2 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE
tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO comm 0x7f79703ef4d0 rank 1 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Ring 01 : 12 -> 0 [receive] via NET/IB/1/GDRDMA
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Ring 01 : 9[1] -> 8[0] via P2P/IPC
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO Ring 01 : 5[1] -> 4[0] via P2P/IPC
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO Trees [0] 13->15->-1/-1/-1 [1] 14->15->-1/-1/-1
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Ring 01 : 14[2] -> 13[1] via P2P/IPC
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO Trees [0] 9->11->-1/-1/-1 [1] 10->11->-1/-1/-1
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 01 : 10[2] -> 9[1] via P2P/IPC
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Trees [0] 8->9->11/-1/-1 [1] 8->9->10/-1/-1
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO Trees [0] 5->7->-1/-1/-1 [1] 6->7->-1/-1/-1
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Trees [0] 12->13->15/-1/-1 [1] 12->13->14/-1/-1
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Trees [0] 10->14->12/-1/-1 [1] 13->14->15/-1/-1
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 01 : 6[2] -> 5[1] via P2P/IPC
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO Trees [0] 4->5->7/-1/-1 [1] 4->5->6/-1/-1
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Trees [0] 2->10->8/6/14 [1] 9->10->11/-1/-1
tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO comm 0x7f42cc3d4910 rank 10 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE
tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO comm 0x7f3e203c5400 rank 11 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Ring 01 : 12 -> 8 [receive] via NET/IB/1/GDRDMA
tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO comm 0x7f57503a6770 rank 9 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE
tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO comm 0x7f2e643dd590 rank 15 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 4 -> 12 [receive] via NET/IB/1/GDRDMA
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 12 -> 8 [send] via NET/IB/1
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 12 -> 0 [send] via NET/IB/1
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 275 mtu 5 LID 11
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 276 mtu 5 LID 11
tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO comm 0x7f058c3a3740 rank 13 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE
tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO comm 0x7f4f303a9880 rank 14 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE
tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO comm 0x7f24b43e5c10 rank 7 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Trees [0] 10->6->4/-1/-1 [1] 5->6->7/-1/-1
tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO comm 0x7f71fc4035a0 rank 6 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE
tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO comm 0x7f31dc403440 rank 5 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Ring 01 : 4 -> 12 [send] via NET/IB/1
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 277 mtu 5 LID 13
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Trees [0] 6->4->5/-1/-1 [1] -1->4->5/12/-1
tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO comm 0x7ffab84792e0 rank 4 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Trees [0] 14->12->13/-1/-1 [1] 4->12->13/8/0
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Trees [0] 10->8->9/-1/-1 [1] 12->8->9/-1/-1
tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO comm 0x7f04ac3dc080 rank 12 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE
tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO comm 0x7f31404045c0 rank 8 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Trees [0] 2->0->1/-1/-1 [1] 12->0->1/-1/-1
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Using 256 threads, Min Comp Cap 6, Trees enabled up to size 479999
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO comm 0x7fdbf03f2980 rank 0 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE
tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Launch mode Parallel
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
Done warm up
Step	Img/sec	total_loss
1	images/sec: 198.5 +/- 0.0 (jitter = 0.0)	7.672
1	images/sec: 198.9 +/- 0.0 (jitter = 0.0)	7.889
1	images/sec: 198.4 +/- 0.0 (jitter = 0.0)	7.782
1	images/sec: 198.5 +/- 0.0 (jitter = 0.0)	7.790
1	images/sec: 196.8 +/- 0.0 (jitter = 0.0)	7.873
1	images/sec: 197.4 +/- 0.0 (jitter = 0.0)	7.645
1	images/sec: 197.3 +/- 0.0 (jitter = 0.0)	8.005
1	images/sec: 197.5 +/- 0.0 (jitter = 0.0)	7.615
1	images/sec: 196.8 +/- 0.0 (jitter = 0.0)	7.785
1	images/sec: 196.7 +/- 0.0 (jitter = 0.0)	7.579
1	images/sec: 196.9 +/- 0.0 (jitter = 0.0)	7.843
1	images/sec: 197.9 +/- 0.0 (jitter = 0.0)	7.785
1	images/sec: 196.8 +/- 0.0 (jitter = 0.0)	8.085
1	images/sec: 198.0 +/- 0.0 (jitter = 0.0)	8.152
1	images/sec: 198.1 +/- 0.0 (jitter = 0.0)	8.017
1	images/sec: 198.5 +/- 0.0 (jitter = 0.0)	7.888
10	images/sec: 197.9 +/- 0.6 (jitter = 1.1)	7.675
10	images/sec: 198.1 +/- 0.5 (jitter = 0.6)	7.622
10	images/sec: 198.1 +/- 0.5 (jitter = 0.9)	7.662
10	images/sec: 197.9 +/- 0.6 (jitter = 1.3)	7.735
10	images/sec: 198.1 +/- 0.4 (jitter = 0.6)	7.623
10	images/sec: 198.1 +/- 0.4 (jitter = 0.7)	7.702
10	images/sec: 198.1 +/- 0.4 (jitter = 0.7)	7.734
10	images/sec: 198.1 +/- 0.4 (jitter = 0.8)	8.034
10	images/sec: 198.1 +/- 0.5 (jitter = 1.0)	8.060
10	images/sec: 197.9 +/- 0.6 (jitter = 1.3)	7.607
10	images/sec: 198.0 +/- 0.4 (jitter = 0.9)	7.706
10	images/sec: 197.9 +/- 0.5 (jitter = 1.4)	7.846
10	images/sec: 197.9 +/- 0.4 (jitter = 0.9)	7.738
10	images/sec: 197.9 +/- 0.4 (jitter = 0.9)	7.844
10	images/sec: 198.1 +/- 0.4 (jitter = 0.8)	7.562
10	images/sec: 197.6 +/- 0.7 (jitter = 1.6)	7.721
20	images/sec: 197.9 +/- 0.3 (jitter = 1.1)	7.660
20	images/sec: 197.9 +/- 0.3 (jitter = 0.8)	7.641
20	images/sec: 198.0 +/- 0.3 (jitter = 0.6)	7.684
20	images/sec: 197.9 +/- 0.3 (jitter = 1.4)	7.777
20	images/sec: 197.8 +/- 0.4 (jitter = 1.5)	7.606
20	images/sec: 197.9 +/- 0.3 (jitter = 1.4)	7.548
20	images/sec: 197.9 +/- 0.4 (jitter = 1.5)	7.615
20	images/sec: 197.9 +/- 0.4 (jitter = 1.4)	7.811
20	images/sec: 197.9 +/- 0.4 (jitter = 1.1)	7.711
20	images/sec: 197.9 +/- 0.4 (jitter = 0.9)	7.582
20	images/sec: 197.9 +/- 0.3 (jitter = 0.7)	7.611
20	images/sec: 197.8 +/- 0.2 (jitter = 1.1)	7.465
20	images/sec: 197.8 +/- 0.3 (jitter = 1.3)	7.557
20	images/sec: 197.8 +/- 0.4 (jitter = 1.1)	7.632
20	images/sec: 197.8 +/- 0.3 (jitter = 1.5)	7.757
20	images/sec: 197.8 +/- 0.3 (jitter = 1.0)	7.591
30	images/sec: 197.8 +/- 0.3 (jitter = 1.0)	7.578
30	images/sec: 197.9 +/- 0.3 (jitter = 1.5)	7.533
30	images/sec: 197.8 +/- 0.3 (jitter = 1.4)	7.672
30	images/sec: 197.8 +/- 0.3 (jitter = 1.3)	7.692
30	images/sec: 197.9 +/- 0.3 (jitter = 1.4)	7.661
30	images/sec: 197.8 +/- 0.3 (jitter = 1.1)	7.545
30	images/sec: 197.8 +/- 0.3 (jitter = 1.0)	7.874
30	images/sec: 197.8 +/- 0.2 (jitter = 1.4)	7.641
30	images/sec: 197.8 +/- 0.2 (jitter = 1.1)	7.705
30	images/sec: 197.8 +/- 0.3 (jitter = 1.1)	7.622
30	images/sec: 197.9 +/- 0.2 (jitter = 0.8)	7.624
30	images/sec: 197.8 +/- 0.3 (jitter = 0.9)	7.643
30	images/sec: 197.8 +/- 0.2 (jitter = 1.1)	7.876
30	images/sec: 197.8 +/- 0.3 (jitter = 1.0)	7.605
30	images/sec: 197.8 +/- 0.3 (jitter = 1.1)	7.880
30	images/sec: 197.8 +/- 0.3 (jitter = 1.0)	7.626
40	images/sec: 197.4 +/- 0.3 (jitter = 1.4)	7.569
40	images/sec: 197.4 +/- 0.3 (jitter = 1.9)	7.676
40	images/sec: 197.3 +/- 0.3 (jitter = 1.5)	7.450
40	images/sec: 197.4 +/- 0.4 (jitter = 1.5)	7.642
40	images/sec: 197.4 +/- 0.3 (jitter = 1.4)	7.400
40	images/sec: 197.4 +/- 0.4 (jitter = 1.7)	7.366
40	images/sec: 197.4 +/- 0.4 (jitter = 1.3)	7.579
40	images/sec: 197.3 +/- 0.3 (jitter = 1.7)	7.630
40	images/sec: 197.3 +/- 0.3 (jitter = 1.4)	7.679
40	images/sec: 197.4 +/- 0.3 (jitter = 1.6)	7.562
40	images/sec: 197.3 +/- 0.3 (jitter = 1.6)	7.412
40	images/sec: 197.3 +/- 0.3 (jitter = 1.4)	7.684
40	images/sec: 197.4 +/- 0.3 (jitter = 1.2)	7.548
40	images/sec: 197.4 +/- 0.3 (jitter = 1.1)	7.539
40	images/sec: 197.3 +/- 0.3 (jitter = 1.4)	7.652
40	images/sec: 197.4 +/- 0.3 (jitter = 1.3)	7.646
50	images/sec: 197.3 +/- 0.3 (jitter = 1.4)	7.481
50	images/sec: 197.3 +/- 0.3 (jitter = 1.3)	7.574
50	images/sec: 197.4 +/- 0.3 (jitter = 1.3)	7.538
50	images/sec: 197.4 +/- 0.3 (jitter = 1.2)	7.579
50	images/sec: 197.4 +/- 0.3 (jitter = 1.3)	7.576
50	images/sec: 197.4 +/- 0.3 (jitter = 1.7)	7.537
50	images/sec: 197.4 +/- 0.3 (jitter = 1.5)	7.528
50	images/sec: 197.4 +/- 0.3 (jitter = 1.2)	7.516
50	images/sec: 197.3 +/- 0.3 (jitter = 1.4)	7.566
50	images/sec: 197.3 +/- 0.3 (jitter = 1.3)	7.477
50	images/sec: 197.4 +/- 0.3 (jitter = 1.1)	7.448
50	images/sec: 197.4 +/- 0.3 (jitter = 1.2)	7.606
50	images/sec: 197.4 +/- 0.3 (jitter = 1.6)	7.614
50	images/sec: 197.3 +/- 0.3 (jitter = 1.6)	7.643
50	images/sec: 197.3 +/- 0.3 (jitter = 1.6)	7.450
50	images/sec: 197.3 +/- 0.3 (jitter = 1.4)	7.605
60	images/sec: 197.5 +/- 0.3 (jitter = 1.2)	7.633
60	images/sec: 197.4 +/- 0.3 (jitter = 1.2)	7.443
60	images/sec: 197.5 +/- 0.3 (jitter = 1.2)	7.491
60	images/sec: 197.5 +/- 0.3 (jitter = 1.1)	7.517
60	images/sec: 197.5 +/- 0.3 (jitter = 1.3)	7.413
60	images/sec: 197.5 +/- 0.3 (jitter = 1.3)	7.464
60	images/sec: 197.5 +/- 0.2 (jitter = 1.3)	7.496
60	images/sec: 197.5 +/- 0.2 (jitter = 1.0)	7.521
60	images/sec: 197.5 +/- 0.3 (jitter = 1.2)	7.455
60	images/sec: 197.5 +/- 0.3 (jitter = 1.5)	7.493
60	images/sec: 197.5 +/- 0.3 (jitter = 1.2)	7.586
60	images/sec: 197.4 +/- 0.3 (jitter = 1.5)	7.541
60	images/sec: 197.5 +/- 0.3 (jitter = 1.5)	7.501
60	images/sec: 197.5 +/- 0.3 (jitter = 1.2)	7.556
60	images/sec: 197.5 +/- 0.2 (jitter = 1.1)	7.412
60	images/sec: 197.5 +/- 0.3 (jitter = 1.4)	7.350
70	images/sec: 197.3 +/- 0.3 (jitter = 1.5)	7.535
70	images/sec: 197.3 +/- 0.3 (jitter = 1.6)	7.489
70	images/sec: 197.3 +/- 0.3 (jitter = 1.3)	7.553
70	images/sec: 197.3 +/- 0.3 (jitter = 1.2)	7.506
70	images/sec: 197.3 +/- 0.3 (jitter = 1.4)	7.520
70	images/sec: 197.3 +/- 0.3 (jitter = 1.5)	7.498
70	images/sec: 197.3 +/- 0.2 (jitter = 1.2)	7.594
70	images/sec: 197.3 +/- 0.2 (jitter = 1.4)	7.428
70	images/sec: 197.3 +/- 0.3 (jitter = 1.3)	7.550
70	images/sec: 197.3 +/- 0.3 (jitter = 1.3)	7.542
70	images/sec: 197.3 +/- 0.2 (jitter = 1.2)	7.498
70	images/sec: 197.3 +/- 0.3 (jitter = 1.4)	7.477
70	images/sec: 197.3 +/- 0.3 (jitter = 1.3)	7.528
70	images/sec: 197.3 +/- 0.3 (jitter = 1.5)	7.506
70	images/sec: 197.3 +/- 0.3 (jitter = 1.5)	7.552
70	images/sec: 197.3 +/- 0.3 (jitter = 1.5)	7.500
80	images/sec: 197.4 +/- 0.2 (jitter = 1.2)	7.454
80	images/sec: 197.3 +/- 0.2 (jitter = 1.4)	7.504
80	images/sec: 197.4 +/- 0.2 (jitter = 1.2)	7.388
80	images/sec: 197.4 +/- 0.2 (jitter = 1.3)	7.528
80	images/sec: 197.4 +/- 0.2 (jitter = 1.5)	7.426
80	images/sec: 197.4 +/- 0.2 (jitter = 1.5)	7.422
80	images/sec: 197.4 +/- 0.2 (jitter = 1.7)	7.467
80	images/sec: 197.4 +/- 0.2 (jitter = 1.3)	7.447
80	images/sec: 197.4 +/- 0.2 (jitter = 1.5)	7.459
80	images/sec: 197.4 +/- 0.2 (jitter = 1.4)	7.460
80	images/sec: 197.4 +/- 0.2 (jitter = 1.5)	7.471
80	images/sec: 197.4 +/- 0.2 (jitter = 1.2)	7.472
80	images/sec: 197.4 +/- 0.2 (jitter = 1.4)	7.513
80	images/sec: 197.4 +/- 0.2 (jitter = 1.2)	7.517
80	images/sec: 197.4 +/- 0.2 (jitter = 1.4)	7.409
80	images/sec: 197.4 +/- 0.2 (jitter = 1.3)	7.527
90	images/sec: 197.5 +/- 0.2 (jitter = 1.2)	7.434
90	images/sec: 197.4 +/- 0.2 (jitter = 1.4)	7.472
90	images/sec: 197.4 +/- 0.2 (jitter = 1.2)	7.498
90	images/sec: 197.4 +/- 0.2 (jitter = 1.4)	7.432
90	images/sec: 197.5 +/- 0.2 (jitter = 1.4)	7.462
90	images/sec: 197.5 +/- 0.2 (jitter = 1.3)	7.451
90	images/sec: 197.5 +/- 0.2 (jitter = 1.4)	7.456
90	images/sec: 197.5 +/- 0.2 (jitter = 1.6)	7.374
90	images/sec: 197.4 +/- 0.2 (jitter = 1.6)	7.439
90	images/sec: 197.4 +/- 0.2 (jitter = 1.5)	7.499
90	images/sec: 197.4 +/- 0.2 (jitter = 1.3)	7.519
90	images/sec: 197.4 +/- 0.2 (jitter = 1.4)	7.467
90	images/sec: 197.4 +/- 0.2 (jitter = 1.1)	7.366
90	images/sec: 197.5 +/- 0.2 (jitter = 1.2)	7.441
90	images/sec: 197.4 +/- 0.2 (jitter = 1.3)	7.527
90	images/sec: 197.4 +/- 0.2 (jitter = 1.3)	7.459
100	images/sec: 197.4 +/- 0.2 (jitter = 1.4)	7.483
----------------------------------------------------------------
total images/sec: 3156.37
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.5)	7.469
----------------------------------------------------------------
total images/sec: 3156.67
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.3)	7.441
----------------------------------------------------------------
total images/sec: 3156.58
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.2)	7.486
----------------------------------------------------------------
total images/sec: 3156.68
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.4)	7.421
----------------------------------------------------------------
total images/sec: 3156.25
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.3)	7.500
----------------------------------------------------------------
total images/sec: 3156.61
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.3)	7.592
----------------------------------------------------------------
total images/sec: 3156.35
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.4)	7.418
----------------------------------------------------------------
total images/sec: 3156.62
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.4)	7.563
----------------------------------------------------------------
total images/sec: 3156.42
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.2)	7.506
----------------------------------------------------------------
total images/sec: 3156.61
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.5)	7.466
----------------------------------------------------------------
total images/sec: 3156.46
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.6)	7.532
----------------------------------------------------------------
total images/sec: 3156.47
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.4)	7.452
----------------------------------------------------------------
total images/sec: 3156.24
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.6)	7.613
----------------------------------------------------------------
total images/sec: 3156.26
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.3)	7.470
----------------------------------------------------------------
total images/sec: 3156.22
----------------------------------------------------------------
100	images/sec: 197.4 +/- 0.2 (jitter = 1.3)	7.569
----------------------------------------------------------------
total images/sec: 3156.20
----------------------------------------------------------------


Appendix

Customized Mellanox OFED image for kernel-3.10.0-1062.1.2.el7.

SELinux InfiniBand patch - infiniband.zip

Additional OCP components - Openshift-rdma.zip





Notice

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. Neither NVIDIA Corporation nor any of its direct or indirect subsidiaries and affiliates (collectively: “NVIDIA”) make any representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.
Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.
Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.
THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein shall be limited in accordance with the Terms of Sale for the product.

Trademarks
NVIDIA, the NVIDIA logo, and Mellanox are trademarks and/or registered trademarks of NVIDIA Corporation and/or Mellanox Technologies Ltd. in the U.S. and in other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

Copyright
© 2022 NVIDIA Corporation & affiliates. All Rights Reserved.