Created on Nov 1, 2019
Introduction
Red Hat, NVIDIA and Mellanox are collaborating to provide a high-performance platform for HPC, Artificial Intelligence and Machine Learning workloads.
This is a Reference Deployment Guide (RDG) for Red Hat OpenShift Container Platform (RH OCP) v4.1 deployment over a bare metal user-provisioned infrastructure (UPI) designed for RDMA accelerated Machine Learning (ML) and Deep Learning(DL) applications over Mellanox InfiniBand fabric.
In this document we will go through the following:
- How to deploy RH OCP v4.1 over bare metal GPU-enabled nodes running RHEL 7.6
- How to run distributed TensorFlow benchmarks with Horovod framework over high-performance InfiniBand fabric.
High-performance InfiniBand fabric for RH OCP is currently a Technology Preview feature.
Red Hat does not recommend using Technology Preview features for production. These features are not supported with Red Hat production service level agreements (SLAs) and might not be final or functional. These features provide early access to upcoming product features and enable customers to test functions and provide feedback during the development process.
See the Red Hat Technology Preview features support scope for more information.
References
- OpenShift 4.1 Bare Metal Install Quickstart
- OCP4 UPI Helper Node Playbook
- RH OCP 4.1. Installing a cluster on bare metal
QM8700 Series - Mellanox Quantum™ HDR 200Gb/s InfiniBand Smart Switches
ConnectX®-5 Single/Dual-Port Adapter supporting 100Gb/s with VPI
- MLNX-OS® InfiniBand/VPI Switch-based Operating System
- MLNX_OFED
- Uber Horovod github
MPI_Operator - Kubernetes Operator for Allreduce-style Distributed Training
Components Overview
- NVIDIA GPU
NVIDIA GPUs for servers are designed for the most demanding workloads of HPC, AI and Deep Learning applications. GPUs accelerate server computation capabilities while driving costs down. GPU-accelerated deep learning frameworks offer the flexibility to design and train custom deep neural networks.
Every major deep learning framework such as TensorFlow, PyTorch and more, are already GPU-accelerated so that data scientists and researchers can get productive instantly without the need to program the GPU. - Red Hat OpenShift Container Platform (RH OCP)
Red Hat OpenShift Container Platform (RH OCP) provides developers and IT organizations with a hybrid cloud application platform for deploying both new and existing applications on secure, scalable resources with minimal configuration and management overhead.
Built on Red Hat Enterprise Linux platform and Kubernetes, the OCP provides a more secure and scalable multi-tenant operating system for today’s enterprise-class applications, while delivering integrated application runtimes and libraries. Kubeflow
Kubeflow is a cloud-native platform for machine learning applications which is based on Google’s internal machine learning pipelines. Find out more at kubeflow.org.Horovod
Horovod is a distributed training framework for TensorFlow, Keras, PyTorch, and MXNet. The goal of Horovod is to make distributed Deep Learning fast and easy to use.TensorFlow
TensorFlow is an open source software library developed by the Google Brain team to conduct machine learning and deep neural networks research.The library performs numerical computation by using data flow graphs, where the nodes in the graph represent mathematical operations and the graph edges represent the multi-dimensional data arrays (tensors) which communicate between the nodes. TensorFlow supports Cuda & cuDNN(req. registration).
This guide uses sources from TensorFlow website for an easier installation procedure.- Kubernetes RDMA device plugin
The RDMA device plugin provides access to Kubernetes Worker node to share a single RDMA device (HCA) among multiple Pods running in a Kubernetes Worker node. GPUDirect RDMA
GPUDirect RDMA enables a direct P2P (Peer-to-Peer) path for data exchange between GPUs on the same or different hosts directly to/from Mellanox devices which utilize the RDMA protocol. This allows for a significant decrease in GPU-to-GPU communication latency and offloads the CPU completely, removing it from all GPU-to-GPU communications across the network.The GPUDirect RDMA technology works seamlessly with Mellanox ConnectX®-4 adapter cards (and later generations).
Solution Overview
Equipment
The below hardware specifications are used in this solution:
Logical Design
The logical design includes the following layers:
- Two separate networking layers:
- Management
- High-speed InfiniBand Network
- Compute layer:
- UPI Network Gateway node
- OCP4 UPI Helper Node
- Bootstarp Node
- 3xMaster node
- Worker0 Node(without GPU)
- 4 x Worker Nodes with Nvidia Tesla P100 GPU cards and Mellanox ConnectX-5 adapter.
GPU-based Node Logical Design
The following illustrates the GPU-based Worker node's components:
Bill of Materials
The below table specifies the hardware components used in this deployment guide:
This deployment guide does not cover server/hypervisor virtualization installation and virtual machine creation steps.
Server Wiring
In this GPU-based servers setup, only the first port from each HCA will be wired to an InfiniBand switch using EDR cables:
Network and Fabric Configuration
Network Configuration
Each GPU-based server is connected to a Mellanox QM8700 InfiniBand switch using an EDR InfiniBand copper cable.
Below is a table detailing the server and switch names with the network configuration:
| Server/Switch name | IP and NICS | |
InfiniBand network | Management network | ||
Gateway Node | clx-ocp-gwc | none | eno0: Static (Wan) eno1: Static (UPI Lan) |
OCP4 UPI node | ocp-helper | none | eno0: Static (UPI Lan) |
Master Node 1-3 | master[0-2] | none | eno0: From DHCP (reserved by UPI) |
Worker Node 0 | worker0 | none | eno0: From DHCP (reserved by UPI) |
Worker Node 1 | worker-p1 | ib0: auto ib1: auto | eno0: From DHCP (reserved by UPI) |
Worker Node 2 | worker-p2 | ib0: auto ib1: auto | eno0: From DHCP (reserved by UPI) |
Worker Node 3 | worker-p3 | ib0: auto ib1: auto | eno0: From DHCP (reserved by UPI) |
Worker Node 4 | worker-p4 | ib0: auto ib1: auto | eno0: From DHCP (reserved by UPI) |
InfiniBand switch | swx-mld-s01 | none | mgmt0: From DHCP (reserved by UPI) |
InfiniBand Fabric Network Topology
Initial Setup for a One Switch Solution
In this deployment scenario you can connect up to 20 servers by using Mellanox Quantum™ HDR 200Gb/s QM8700 InfiniBand Smart Switch.
Scaled Setup for a Two-Layer Fat-Tree Topology
In this deployment scenario you can scale up to 20 Spine switches and 40 Leaf switches (single connectivity between Spine and Leaf Switches) and supports up to 400 servers.
InfiniBand Fabric Configuration
Below is a list of recommendations and prerequisites that are important for the configuration process:
- Refer to the MLNX-OS User Manual to become familiar with switch software (located at support.mellanox.com)
- Upgrade the switch to the latest MLNX-OS version
- InfiniBand Subnet Manager (SM) is required to configure InfiniBand fabric properly
There are three ways to run InfiniBand Subnet Manager (SM) in your InfiniBand fabric:
- Start the SM on one or more managed switches. This is a very convenient and quick operation which allows for an easier InfiniBand ‘plug & play'.
- Run OpenSM daemon on one or more servers by executing the /etc/init.d/opensmd command. It is recommended to run the SM on a server in case there are 648 nodes or more.
- Use Unified Fabric Management (UFM®).
Mellanox’s Unified Fabric Manager (UFM®) is a powerful platform for scale-out computing, eliminates the complexity of fabric management, provides deep visibility into traffic and optimizes fabric performance.
In this guide, we will use the method with launching the InfiniBand SM on the InfiniBand switch.
Below are the configuration steps for the chosen method.
To enable the SM on one of the managed switches please do following:
Login to the switch and enter the next configuration commands (swx-mld-s01 is our switch name):
Mellanox MLNX-OS Switch Management switch login: admin Password: swx-mld-s01 [standalone: master] > enable swx-mld-s01 [standalone: master] # configure terminal swx-mld-s01 [standalone: master] (config) # ib smnode swx-mld-s01 enable swx-mld-s01 [standalone: master] (config) # ib smnode swx-mld-s01 sm-priority 0 swx-mld-s01 [standalone: master] (config) # ib sm virt enable swx-mld-s01 [standalone: master] (config) # write memory swx-mld-s01 [standalone: master] (config) # reload
After the switch reboots, check the switch configuration. It should look like the following:
Mellanox MLNX-OS Switch Management switch login: admin Password: swx-mld-s01 [standalone: master] > enable swx-mld-s01 [standalone: master] # configure terminal swx-mld-s01 [standalone: master] (config) # show running-config ## ## Running database "initial" ## Generated at 2019/03/19 17:58:53 +0200 ## Hostname: swx-mld-s01 ## ## ## Running-config temporary prefix mode setting ## no cli default prefix-modes enable ## ## Subnet Manager configuration ## ib sm virt enable ## ## Other IPv6 configuration ## no ipv6 enable ## ## AAA remote server configuration ## # ldap bind-password ******** # radius-server key ******** # tacacs-server key ******** ## ## Network management configuration ## # web proxy auth basic password ******** clock timezone Asia Middle_East Jerusalem no ntp server 192.114.62.250 disable ntp server 192.114.62.250 keyID 0 no ntp server 192.114.62.250 trusted-enable ntp server 192.114.62.250 version 4 ## ## X.509 certificates configuration ## # # Certificate name system-self-signed, ID 0cd5b6a0da88a0e68b8f3b49408b361afc73289d # (public-cert config omitted since private-key config is hidden) ## ## IB nodename to GUID mapping ## ib smnode swx-mld-s01 create ib smnode swx-mld-s01 enable ib smnode swx-mld-s01 sm-priority 0 ## ## Persistent prefix mode setting ## cli default prefix-modes enable
Deployment Steps
Gateway Node Configuration
For the Gateway Node we will use virtual machine with two NICs and CentOS 7 as the operating system.
Steps for configuring CentOS 7 as a NAT Router:
- Configure the NICs as follows:
- ens224 - public with DHCP
ens192 - static UPI Lan
# cat /etc/sysconfig/network-scripts/ifcfg-ens192 TYPE=Ethernet BOOTPROTO=static NAME=ens192 DEVICE=ens192 ONBOOT=yes IPADDR=192.168.7.1 NETMASK=255.255.255.0 DNS1=192.168.7.254
Enable IP forwarding:
# sysctl -w net.ipv4.ip_forward=1 # echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.d/ip_forward.conf # sysctl -p
Enable NAT:
# firewall-cmd --permanent --direct --passthrough ipv4 -t nat -I POSTROUTING -o ens224 -j MASQUERADE -s 192.168.7.0/24 # firewall-cmd --change-interface=ens224 --zone=external --permanent # firewall-cmd --change-interface=ens192 --zone=internal --permanent # firewall-cmd --set-default-zone=internal # firewall-cmd --complete-reload
Check the configuration:
# firewall-cmd --get-active-zones internal interfaces: ens192 external interfaces: ens224
OCP4 UPI Helper Node Configuration
OCP4 UPI Helper Node deployment requires a separate network with internet access and includes the following components:
- DNS server
- 2x Load balancer
- Web server
- DHCP server
- PXE server
- TFTP server
- NFSv4 server
- Bastion Host
The steps for OCP4 UPI Helper Node configuration are as follows:
- Installing the operating system
- Checking the prerequisites
- preparing the UPI
- Create the Ignition Configs
Installing the OCP4 Helper Node Operating System
CentOS 7 operating system is recommended by UPI Helper Node installation guide with EPEL repo.
For our setup, we need RHEL 7.6 OS for the OCP4 UPI Helper Node. This will enable us to add Bare Metal GPU-based nodes and scale-out the OpenShift cluster.
For RHEL 7.6, you will need to enable the following repo rhel-7-server-rpms, rhel-7-server-extras-rpms, rhel-7-server-ansible-2.7-rpms and rhel-7-server-ose-4.1-rpms. For more info please refer to OpenShift User guide.
Checking the OCP4 UPI Prerequisites
Clone the github repository and install the additional packages from https://github.com/christianh814/ocp4-upi-helpernode.
# yum -y install ansible git # git clone https://github.com/christianh814/ocp4-upi-helpernode # cd ocp4-upi-helpernode
Preparing the UPI
After finishing the preparation steps, your working directory will be ocp4-upi-helpernode.
Copy the vars.yaml file from the docs/examples folder and modify it to match your network configuration.
Below is an example of vars.yaml file:
--- disk: sda helper: name: "ocp-helper" ipaddr: "192.168.7.254" networkifacename: "ens192" dns: domain: "ocp.labs.mlnx" clusterid: "ocp4" forwarder1: "8.8.8.8" forwarder2: "8.8.4.4" dhcp: router: "192.168.7.1" bcast: "192.168.7.255" netmask: "255.255.255.0" poolstart: "192.168.7.10" poolend: "192.168.7.30" ipid: "192.168.7.0" netmaskid: "255.255.255.0" bootstrap: name: "bootstrap" ipaddr: "192.168.7.20" macaddr: "00:0c:29:cc:87:b6" masters: - name: "master0" ipaddr: "192.168.7.21" macaddr: "00:0c:29:82:0f:6c" - name: "master1" ipaddr: "192.168.7.22" macaddr: "00:0c:29:f0:f5:11" - name: "master2" ipaddr: "192.168.7.23" macaddr: "00:0c:29:19:75:42" workers: - name: "worker0" ipaddr: "192.168.7.11" macaddr: "00:0c:29:c9:b9:c6" - name: "worker1" ipaddr: "192.168.7.12" macaddr: "ac:1f:6b:25:1f:f0" - name: "worker2" ipaddr: "192.168.7.13" macaddr: "ac:1f:6b:25:85:ec" - name: "worker3" ipaddr: "192.168.7.14" macaddr: "ac:1f:6b:25:20:12" - name: "worker4" ipaddr: "192.168.7.15" macaddr: "ac:1f:6b:25:1f:dc"
Review and set the desired OCP installation components in vars/main.yml. For example:
--- staticips: false force_ocp_download: true ocp_bios: "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.1/4.1.0/rhcos-4.1.0-x86_64-metal-bios.raw.gz" ocp_initramfs: "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.1/4.1.0/rhcos-4.1.0-x86_64-installer-initramfs.img" ocp_install_kernel: "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.1/4.1.0/rhcos-4.1.0-x86_64-installer-kernel" ocp_client: "https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest-4.1/openshift-client-linux-4.1.20.tar.gz" ocp_installer: "https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest-4.1/openshift-install-linux-4.1.20.tar.gz"
Run Ansible Playbook to setup your OCP UPI Helper Node:
# ansible-playbook -e @vars.yaml tasks/main.yml
For OCP UPI Helper Node verification, run the /usr/local/bin/helpernodecheck command with the following parameters {dns-masters|dns-workers|dns-etcd|install-info|haproxy|services|nfs-info}. For example:
[root@ocp-helper ocp4-upi-helpernode]# /usr/local/bin/helpernodecheck dns-workers ====================== DNS Config for Workers ====================== ; Create entries for the worker hosts worker0 IN A 192.168.7.11 worker1 IN A 192.168.7.12 worker2 IN A 192.168.7.13 worker3 IN A 192.168.7.14 worker4 IN A 192.168.7.15 ====================== DNS Lookup for Workers ====================== worker0.ocp4.ocp.labs.mlnx ------------------------------------------------- IP: 192.168.7.11 Reverse: worker0.ocp4.ocp.labs.mlnx. worker1.ocp4.ocp.labs.mlnx ------------------------------------------------- IP: 192.168.7.12 Reverse: worker1.ocp4.ocp.labs.mlnx. worker2.ocp4.ocp.labs.mlnx ------------------------------------------------- IP: 192.168.7.13 Reverse: worker2.ocp4.ocp.labs.mlnx. worker3.ocp4.ocp.labs.mlnx ------------------------------------------------- IP: 192.168.7.14 Reverse: worker3.ocp4.ocp.labs.mlnx. worker4.ocp4.ocp.labs.mlnx ------------------------------------------------- IP: 192.168.7.15 Reverse: worker4.ocp4.ocp.labs.mlnx.
Creating the Ignition Configuration File
Creating the Ignition configuration file install-config.yaml is required for the RH OCP installation.
To create the Ignition configuration file, we will first create an installation folder:
mkdir ~/ocp4 cd ~/ocp4
For the complete configuration of the install-config.yaml , we will need two additional parameters: pullSecret and sshKey.
pullSecret can be obtained from cloud.redhat.com
- Login with your Red Hat account
- Click on “Bare Metal”
- Click on “Download Pull Secret” or “Copy Pull Secret”
sshKey is your public SSH key (e.g. ~/.ssh/id_rsa.pub)
Below is an example of the install-config.yaml file:
apiVersion: v1 baseDomain: ocp.labs.mlnx compute: - hyperthreading: Enabled name: worker replicas: 1 controlPlane: hyperthreading: Enabled name: master replicas: 3 metadata: name: ocp4 networking: clusterNetworks: - cidr: 10.254.0.0/16 hostPrefix: 24 networkType: OpenShiftSDN serviceNetwork: - 172.30.0.0/16 platform: none: {} pullSecret: '{"auths":{"cloud.openshift.com":{"auth":....}}}' sshKey: 'ssh-rsa AAAA... root@ocp-helper'
Generate the ignition configs by running the following command:
# openshift-install create ignition-configs
Now copy the ignition files to the websever ignition directory:
# cd ~/ocp4/ # cp *.ign /var/www/html/ignition/ # restorecon -vR /var/www/html/
The OCP4 UPI Helper node is now ready for the RH OCP installation process.
RH OCP Deployment
Creating the OpenShift Container Platform Cluster
In the following steps we will install the OCP bootstrap, OCP management and OCP monitoring components over RHEL CoreOS-based virtual machines.
Before starting the installation process, make sure the ssh-agent is configured on your OCP4 UPI Helper Node. You can follow this guide for a step-by-step configuration process.
Boot the virtual machines we prepared using a PXE boot in the following order:
- Bootstrap
- Masters
- Workers
For additional information about installing RH OCP 4.1 over bare metal please refer to the OCP 4.1 installation guide.
To monitor the installation process use the following commands:
bootstrap stage: openshift-install wait-for bootstrap-complete --log-level debug
# openshift-install wait-for bootstrap-complete --log-level debug DEBUG OpenShift Installer v4.1.20-201910102034-dirty DEBUG Built from commit e4708ece20e3f03947e9f5f460f1d5cbcd401249 INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.ocp.labs.mlnx:6443... INFO API v1.13.4+520769a up INFO Waiting up to 30m0s for bootstrapping to complete... DEBUG Bootstrap status: complete INFO It is now safe to remove the bootstrap resources
Remove the bootstrap resources from the load-balancer configuration /etc/haproxy/haproxy.cfg and restart the haproxy service:
/etc/haproxy/haproxy.cfg Expand source#--------------------------------------------------------------------- # Example configuration for a possible web application. See the # full configuration options online. # # http://haproxy.1wt.eu/download/1.4/doc/configuration.txt # #--------------------------------------------------------------------- #--------------------------------------------------------------------- # Global settings #--------------------------------------------------------------------- global # to have these messages end up in /var/log/haproxy.log you will # need to: # # 1) configure syslog to accept network log events. This is done # by adding the '-r' option to the SYSLOGD_OPTIONS in # /etc/sysconfig/syslog # # 2) configure local2 events to go to the /var/log/haproxy.log # file. A line like the following can be added to # /etc/sysconfig/syslog # # local2.* /var/log/haproxy.log # log 127.0.0.1 local2 chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon # turn on stats unix socket stats socket /var/lib/haproxy/stats #--------------------------------------------------------------------- # common defaults that all the 'listen' and 'backend' sections will # use if not designated in their block #--------------------------------------------------------------------- defaults mode http log global option httplog option dontlognull option http-server-close option forwardfor except 127.0.0.0/8 option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 10s timeout check 10s maxconn 3000 #--------------------------------------------------------------------- listen stats bind :9000 mode http stats enable stats uri / monitor-uri /healthz frontend openshift-api-server bind *:6443 default_backend openshift-api-server mode tcp option tcplog backend openshift-api-server balance source mode tcp # server bootstrap 192.168.7.20:6443 check # remark after finish bootstarp server master0 192.168.7.21:6443 check server master1 192.168.7.22:6443 check server master2 192.168.7.23:6443 check frontend machine-config-server bind *:22623 default_backend machine-config-server mode tcp option tcplog backend machine-config-server balance source mode tcp # server bootstrap 192.168.7.20:22623 check # remark after finish bootstarp server master0 192.168.7.21:22623 check server master1 192.168.7.22:22623 check server master2 192.168.7.23:22623 check frontend ingress-http bind *:80 default_backend ingress-http mode tcp option tcplog backend ingress-http balance source mode tcp server worker0-http-router0 192.168.7.11:80 check server worker1-http-router1 192.168.7.12:80 check server worker2-http-router2 192.168.7.13:80 check server worker3-http-router3 192.168.7.14:80 check server worker4-http-router4 192.168.7.15:80 check frontend ingress-https bind *:443 default_backend ingress-https mode tcp option tcplog backend ingress-https balance source mode tcp server worker0-https-router0 192.168.7.11:443 check server worker1-https-router1 192.168.7.12:443 check server worker2-https-router2 192.168.7.13:443 check server worker3-https-router3 192.168.7.14:443 check server worker4-https-router4 192.168.7.15:443 check
# service haproxy restart
Finalize the cluster installation: openshift-install wait-for install-complete --log-level debug
# openshift-install wait-for install-complete --log-level debug DEBUG OpenShift Installer v4.1.20-201910102034-dirty DEBUG Built from commit e4708ece20e3f03947e9f5f460f1d5cbcd401249 INFO Waiting up to 30m0s for the cluster at https://api.ocp4.ocp.labs.mlnx:6443 to initialize... DEBUG Cluster is initialized INFO Waiting up to 10m0s for the openshift-console route to be created... DEBUG Route found in openshift-console namespace: console DEBUG Route found in openshift-console namespace: downloads DEBUG OpenShift console route is created INFO Install complete! INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/root/install/auth/kubeconfig' INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4.ocp.labs.mlnx INFO Login to the console with user: kubeadmin, password: *****-*****-*****-*****
OpenShift Cluster Scale-up using RHEL Compute Machine
Please install the required packages on the OCP4 UPI helper node to run the cluster scale-up playbook, including Openshift-Ansible:
# yum install openshift-ansible openshift-clients jq
The next steps is relevant for Kubernetes Worker Nodes - worker1, worker2, worker3 and worker4.
Preparing a GPU-based RHEL compute node
The Red Hat Enterprise Linux (RHEL) compute or worker node in your OpenShift Container Platform environment must meet the hardware specifications and system-level requirements. RHEL 7.6 "Minimal" installation option is required as the base OS.
Only RHEL 7.6 is supported in OpenShift Container Platform 4.1. You must not upgrade your compute machines to RHEL 8.
Enable only the repositories required by OpenShift Container Platform 4.1:
# subscription-manager repos \ --enable="rhel-7-server-rpms" \ --enable="rhel-7-server-extras-rpms" \ --enable="rhel-7-server-ose-4.1-rpms"
Stop and disable the firewall on the host:
# systemctl disable firewalld.service # systemctl stop firewalld.service
Install any additional packages that are required and lock the kernel version:
# yum -y install yum-plugin-versionlock # yum versionlock kernel-3.10.0-1062.1.2.el7 # yum -y update kernel # yum -y install perl gtk2 atk cairo tcl gcc-gfortran tcsh tk pciutils lsof # reboot
The installation of the NVIDIA GPU driver is validated only for kernel-3.10.0-1062.1.2.el7.
Disable the nouveau kernel module:
# echo 'blacklist nouveau' > /etc/modprobe.d/blacklist-nouveau.conf # echo 'options nouveau modeset=0' >> /etc/modprobe.d/blacklist-nouveau.conf # dracut --force # reboot
After rebooting, make sure that the nouveau module in not listed here:
# lsmod | grep nouveau
Installing Mellanox OFED
There are two methods to install Mellanox OFED for the above specified kernel version.
- Download Mellanox OFED v4.7-1.0.0.1 from Mellanox website. Download the installation package and run the command “mlnx_add_kernel_support.sh” to add support for your kernel. Refer to this User Guide for instructions.
- Alternatively you can download a pre-configured Mellanox OFED image from here and copy it to the root folder of your compute node. this comes with built-in support for kernel-3.10.0-1062.1.2.el7.
Installation steps:
After obtaining the image, run:
# mkdir /mnt/iso # mount -o loop /root/MLNX_OFED_LINUX-4.7-1.0.0.1-rhel7.6-x86_64-ext.iso /mnt/iso # /mnt/iso/mlnxofedinstall --force # reboot
Install the SELinux with the InfiniBand patch.
Extract infiniband.* from the attached archive (infiniband.zip) and copy it to each compute node, then execute from the local folder:
# semodule -i infiniband.pp
Now the compute node is ready join the OpenShift cluster.
Adding GPU-based RHEL Compute Nodes to the OpenShift Cluster
For additional information about adding RHEL compute nodes to the OpenShift cluster please refer to Adding a RHEL compute machine section in the OCP installation guide.
To scale-up an OpenShift cluster with RHEL compute nodes:
- Use the ssh-copy-id to install SSH keys from OCP4 UPI Helper Node on compute nodes as authorized keys for passwordless authentication
- Extract the "pull secret" for your OpenShift cluster
- Create an Ansible inventory file named hosts that defines your compute nodes and the required variables
- Run the Ansible playbook for the scale-up cluster with RHEL compute nodes
- Approve the CSRs for your RHEL compute nodes
Below is an example of our hosts file for OpenShift cluster scale-up:
[all:vars] ansible_user=root #ansible_become=True openshift_kubeconfig_path="~/.kube/config" openshift_pull_secret_path="~/pull-secret.txt" [workers] worker0.ocp4.ocp.labs.mlnx [new_workers] worker1.ocp4.ocp.labs.mlnx worker2.ocp4.ocp.labs.mlnx worker3.ocp4.ocp.labs.mlnx worker4.ocp4.ocp.labs.mlnx
Run the scale-up playbook:
# cd /usr/share/ansible/openshift-ansible # ansible-playbook -i ~/hosts playbooks/scaleup.yml
Scale-up playbook execution output:
[root@ocp-helper openshift-ansible # ansible-playbook -i ~/hosts playbooks/scaleup.yml PLAY [Pre-scaleup checks] ****************************************************************************************************************************************************************************************************************** TASK [fail] ******************************************************************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:31:43 +0200 (0:00:00.068) 0:00:00.068 ******* skipping: [localhost] PLAY [install nodes] *********************************************************************************************************************************************************************************************************************** TASK [Gathering Facts] ********************************************************************************************************************************************************************************************************************* Tuesday 29 October 2019 16:31:43 +0200 (0:00:00.039) 0:00:00.107 ******* ok: [worker3.ocp4.ocp.labs.mlnx] ok: [worker4.ocp4.ocp.labs.mlnx] ok: [worker1.ocp4.ocp.labs.mlnx] ok: [worker2.ocp4.ocp.labs.mlnx] TASK [openshift_node : include_tasks] ****************************************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:31:45 +0200 (0:00:02.000) 0:00:02.107 ******* included: /usr/share/ansible/openshift-ansible/playbooks/roles/openshift_node/tasks/install.yml for worker1.ocp4.ocp.labs.mlnx, worker2.ocp4.ocp.labs.mlnx, worker3.ocp4.ocp.labs.mlnx, worker4.ocp4.ocp.labs.mlnx TASK [openshift_node : Install openshift support packages] ********************************************************************************************************************************************************************************* Tuesday 29 October 2019 16:31:45 +0200 (0:00:00.642) 0:00:02.750 ******* lok: [worker2.ocp4.ocp.labs.mlnx] ok: [worker4.ocp4.ocp.labs.mlnx] ok: [worker1.ocp4.ocp.labs.mlnx] ok: [worker3.ocp4.ocp.labs.mlnx] TASK [openshift_node : Install openshift packages] ***************************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:34:47 +0200 (0:03:01.994) 0:03:04.744 ******* ok: [worker1.ocp4.ocp.labs.mlnx] ok: [worker2.ocp4.ocp.labs.mlnx] ok: [worker3.ocp4.ocp.labs.mlnx] ok: [worker4.ocp4.ocp.labs.mlnx] TASK [openshift_node : Enable the CRI-O service] ******************************************************************************************************************************************************************************************* Tuesday 29 October 2019 16:35:18 +0200 (0:00:30.724) 0:03:35.469 ******* ok: [worker2.ocp4.ocp.labs.mlnx] ok: [worker3.ocp4.ocp.labs.mlnx] ok: [worker4.ocp4.ocp.labs.mlnx] ok: [worker1.ocp4.ocp.labs.mlnx] TASK [openshift_node : include_tasks] ****************************************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:35:19 +0200 (0:00:00.820) 0:03:36.289 ******* included: /usr/share/ansible/openshift-ansible/playbooks/roles/openshift_node/tasks/config.yml for worker1.ocp4.ocp.labs.mlnx, worker2.ocp4.ocp.labs.mlnx, worker3.ocp4.ocp.labs.mlnx, worker4.ocp4.ocp.labs.mlnx TASK [openshift_node : Disable swap] ******************************************************************************************************************************************************************************************************* Tuesday 29 October 2019 16:35:19 +0200 (0:00:00.461) 0:03:36.751 ******* ok: [worker2.ocp4.ocp.labs.mlnx] ok: [worker3.ocp4.ocp.labs.mlnx] ok: [worker1.ocp4.ocp.labs.mlnx] ok: [worker4.ocp4.ocp.labs.mlnx] TASK [openshift_node : sysctl] ************************************************************************************************************************************************************************************************************* Tuesday 29 October 2019 16:35:20 +0200 (0:00:00.437) 0:03:37.188 ******* [WARNING]: The value 1 (type int) in a string field was converted to u'1' (type string). If this does not look like what you expect, quote the entire value to ensure it does not change. ok: [worker2.ocp4.ocp.labs.mlnx] ok: [worker1.ocp4.ocp.labs.mlnx] ok: [worker3.ocp4.ocp.labs.mlnx] ok: [worker4.ocp4.ocp.labs.mlnx] TASK [openshift_node : Disable firewalld service] ****************************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:35:20 +0200 (0:00:00.476) 0:03:37.665 ******* ok: [worker1.ocp4.ocp.labs.mlnx] ok: [worker2.ocp4.ocp.labs.mlnx] ok: [worker3.ocp4.ocp.labs.mlnx] ok: [worker4.ocp4.ocp.labs.mlnx] TASK [openshift_node : Setting sebool container_manage_cgroup] ***************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:35:21 +0200 (0:00:00.447) 0:03:38.112 ******* ok: [worker4.ocp4.ocp.labs.mlnx] ok: [worker1.ocp4.ocp.labs.mlnx] ok: [worker3.ocp4.ocp.labs.mlnx] ok: [worker2.ocp4.ocp.labs.mlnx] TASK [openshift_node : create temp directory] ********************************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:35:21 +0200 (0:00:00.601) 0:03:38.714 ******* changed: [worker1.ocp4.ocp.labs.mlnx] changed: [worker3.ocp4.ocp.labs.mlnx] changed: [worker2.ocp4.ocp.labs.mlnx] changed: [worker4.ocp4.ocp.labs.mlnx] TASK [openshift_node : Wait for bootstrap endpoint to show up] ***************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:35:22 +0200 (0:00:00.432) 0:03:39.147 ******* ok: [worker4.ocp4.ocp.labs.mlnx] ok: [worker2.ocp4.ocp.labs.mlnx] ok: [worker3.ocp4.ocp.labs.mlnx] ok: [worker1.ocp4.ocp.labs.mlnx] TASK [openshift_node : Fetch bootstrap ignition file locally] ****************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:35:23 +0200 (0:00:00.931) 0:03:40.078 ******* changed: [worker1.ocp4.ocp.labs.mlnx] changed: [worker2.ocp4.ocp.labs.mlnx] changed: [worker3.ocp4.ocp.labs.mlnx] changed: [worker4.ocp4.ocp.labs.mlnx] TASK [openshift_node : Copy pull secret in the directory] ********************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:35:23 +0200 (0:00:00.668) 0:03:40.747 ******* changed: [worker2.ocp4.ocp.labs.mlnx] changed: [worker1.ocp4.ocp.labs.mlnx] changed: [worker4.ocp4.ocp.labs.mlnx] changed: [worker3.ocp4.ocp.labs.mlnx] TASK [openshift_node : Get release image] ************************************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:35:24 +0200 (0:00:00.986) 0:03:41.733 ******* changed: [worker1.ocp4.ocp.labs.mlnx -> localhost] changed: [worker3.ocp4.ocp.labs.mlnx -> localhost] changed: [worker4.ocp4.ocp.labs.mlnx -> localhost] changed: [worker2.ocp4.ocp.labs.mlnx -> localhost] TASK [openshift_node : Set openshift_release_image fact] *********************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:35:25 +0200 (0:00:01.111) 0:03:42.845 ******* ok: [worker1.ocp4.ocp.labs.mlnx] ok: [worker2.ocp4.ocp.labs.mlnx] ok: [worker3.ocp4.ocp.labs.mlnx] ok: [worker4.ocp4.ocp.labs.mlnx] TASK [openshift_node : Pull release image] ************************************************************************************************************************************************************************************************* Tuesday 29 October 2019 16:35:26 +0200 (0:00:00.246) 0:03:43.091 ******* changed: [worker2.ocp4.ocp.labs.mlnx] changed: [worker3.ocp4.ocp.labs.mlnx] changed: [worker4.ocp4.ocp.labs.mlnx] changed: [worker1.ocp4.ocp.labs.mlnx] TASK [openshift_node : Get machine controller daemon image from release image] ************************************************************************************************************************************************************* Tuesday 29 October 2019 16:35:53 +0200 (0:00:27.708) 0:04:10.799 ******* changed: [worker1.ocp4.ocp.labs.mlnx] changed: [worker2.ocp4.ocp.labs.mlnx] changed: [worker3.ocp4.ocp.labs.mlnx] changed: [worker4.ocp4.ocp.labs.mlnx] TASK [openshift_node : Pull MCD image] ***************************************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:35:56 +0200 (0:00:02.139) 0:04:12.939 ******* changed: [worker4.ocp4.ocp.labs.mlnx] changed: [worker3.ocp4.ocp.labs.mlnx] changed: [worker1.ocp4.ocp.labs.mlnx] changed: [worker2.ocp4.ocp.labs.mlnx] TASK [openshift_node : Apply ignition manifest] ******************************************************************************************************************************************************************************************** Tuesday 29 October 2019 16:36:05 +0200 (0:00:09.036) 0:04:21.975 ******* changed: [worker1.ocp4.ocp.labs.mlnx] changed: [worker2.ocp4.ocp.labs.mlnx] changed: [worker3.ocp4.ocp.labs.mlnx] changed: [worker4.ocp4.ocp.labs.mlnx] TASK [openshift_node : Reboot the host and wait for it to come back] *********************************************************************************************************************************************************************** Tuesday 29 October 2019 16:36:06 +0200 (0:00:01.052) 0:04:23.028 ******* changed: [worker1.ocp4.ocp.labs.mlnx] changed: [worker2.ocp4.ocp.labs.mlnx] changed: [worker4.ocp4.ocp.labs.mlnx] changed: [worker3.ocp4.ocp.labs.mlnx] PLAY RECAP ********************************************************************************************************************************************************************************************************************************* localhost : ok=0 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 worker1.ocp4.ocp.labs.mlnx : ok=21 changed=9 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 worker2.ocp4.ocp.labs.mlnx : ok=21 changed=9 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 worker3.ocp4.ocp.labs.mlnx : ok=21 changed=9 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 worker4.ocp4.ocp.labs.mlnx : ok=21 changed=9 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 Tuesday 29 October 2019 16:38:36 +0200 (0:02:30.434) 0:06:53.462 ******* =============================================================================== openshift_node : Install openshift support packages ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 181.99s openshift_node : Reboot the host and wait for it to come back --------------------------------------------------------------------------------------------------------------------------------------------------------------------- 150.43s openshift_node : Install openshift packages ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 30.72s openshift_node : Pull release image ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 27.71s openshift_node : Pull MCD image ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 9.04s openshift_node : Get machine controller daemon image from release image ------------------------------------------------------------------------------------------------------------------------------------------------------------- 2.14s Gathering Facts --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2.00s openshift_node : Get release image -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.11s openshift_node : Apply ignition manifest -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.05s openshift_node : Copy pull secret in the directory ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.99s openshift_node : Wait for bootstrap endpoint to show up ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.93s openshift_node : Enable the CRI-O service ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.82s openshift_node : Fetch bootstrap ignition file locally ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.67s openshift_node : include_tasks ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.64s openshift_node : Setting sebool container_manage_cgroup ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.60s openshift_node : sysctl ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.48s openshift_node : include_tasks ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.46s openshift_node : Disable firewalld service ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.45s openshift_node : Disable swap ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.44s openshift_node : create temp directory ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.43s
After executing the Scale-up playbook, approve all pending certificate signing requests (CSRs) that were generated for each machine that you added:
# oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve # oc get csr NAME AGE REQUESTOR CONDITION csr-2pn2r 44s system:node:worker2.ocp4.ocp.labs.mlnx Approved,Issued csr-7fd8p 6m31s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-djzv6 6m30s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-h985k 6m21s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-j6rdh 6m32s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-lhvqq 46s system:node:worker3.ocp4.ocp.labs.mlnx Approved,Issued csr-m52kp 49s system:node:worker1.ocp4.ocp.labs.mlnx Approved,Issued csr-x47hg 55s system:node:worker4.ocp4.ocp.labs.mlnx Approved,Issued csr-x8cgl 6m21s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
Confirm that the cluster recognizes the machines:
# oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME master0.ocp4.ocp.labs.mlnx Ready master 1d v1.13.4+a80aad556 192.168.7.21 <none> Red Hat Enterprise Linux CoreOS 410.8.20191011.0 (Ootpa) 4.18.0-80.11.2.el8_0.x86_64 cri-o://1.13.11-0.13.dev.rhaos4.1.gitbdeb2ca.el8-dev master1.ocp4.ocp.labs.mlnx Ready master 1d v1.13.4+a80aad556 192.168.7.22 <none> Red Hat Enterprise Linux CoreOS 410.8.20191011.0 (Ootpa) 4.18.0-80.11.2.el8_0.x86_64 cri-o://1.13.11-0.13.dev.rhaos4.1.gitbdeb2ca.el8-dev master2.ocp4.ocp.labs.mlnx Ready master 1d v1.13.4+a80aad556 192.168.7.23 <none> Red Hat Enterprise Linux CoreOS 410.8.20191011.0 (Ootpa) 4.18.0-80.11.2.el8_0.x86_64 cri-o://1.13.11-0.13.dev.rhaos4.1.gitbdeb2ca.el8-dev worker0.ocp4.ocp.labs.mlnx Ready worker 1d v1.13.4+a80aad556 192.168.7.11 <none> Red Hat Enterprise Linux CoreOS 410.8.20191011.0 (Ootpa) 4.18.0-80.11.2.el8_0.x86_64 cri-o://1.13.11-0.13.dev.rhaos4.1.gitbdeb2ca.el8-dev worker1.ocp4.ocp.labs.mlnx Ready worker 1h v1.13.4+a80aad556 192.168.7.12 <none> OpenShift Enterprise 3.10.0-1062.1.2.el7.x86_64 cri-o://1.13.11-0.11.dev.rhaos4.1.git3338d4d.el7 worker2.ocp4.ocp.labs.mlnx Ready worker 1h v1.13.4+a80aad556 192.168.7.13 <none> OpenShift Enterprise 3.10.0-1062.1.2.el7.x86_64 cri-o://1.13.11-0.11.dev.rhaos4.1.git3338d4d.el7 worker3.ocp4.ocp.labs.mlnx Ready worker 1h v1.13.4+a80aad556 192.168.7.14 <none> OpenShift Enterprise 3.10.0-1062.1.2.el7.x86_64 cri-o://1.13.11-0.11.dev.rhaos4.1.git3338d4d.el7 worker4.ocp4.ocp.labs.mlnx Ready worker 1h v1.13.4+a80aad556 192.168.7.15 <none> OpenShift Enterprise 3.10.0-1062.1.2.el7.x86_64 cri-o://1.13.11-0.11.dev.rhaos4.1.git3338d4d.el7
NVIDIA GPU Driver and Plugin Deployment
The next step in our deployment is to install the NVIDIA components for the OCP.
This step must be executed on the OCP4 UPI Helper Node.
Deployment Node Feature Discovery (NFD)
Deploy NFD from github in OpenShift 4.X
# mkdir ~/install # cd ~/install # git clone https://github.com/openshift/cluster-nfd-operator # PULLPOLICY=Always make -C cluster-nfd-operator deploy
Verify that the GPU nodes are labelled correctly
# oc describe nodes | grep 10de feature.node.kubernetes.io/pci-10de.present=true feature.node.kubernetes.io/pci-10de.present=true # oc describe nodes | grep kernel feature.node.kubernetes.io/kernel-version.full=3.10.0-XXXXX-x86_64 feature.node.kubernetes.io/kernel-version.major=3 feature.node.kubernetes.io/kernel-version.minor=10 feature.node.kubernetes.io/kernel-version.revision=0
Special Resource Operator (SRO) Deployment
Execute on OCP4 UPI Helper Node
Deploy the SRO from github
# cd ~/install # git clone https://github.com/openshift-psap/special-resource-operator # cd special-resource-operator # git checkout release-4.2 # This works for OCP 4.0, 4.1, 4.2 # PULLPOLICY=Always make deploy
Verify that the GPUs are enabled, one will see the extended resource GPU and misc NVIDIA features:
# oc describe node worker1.ocp4.ocp.labs.mlnx | grep nvidia nvidia.com/cuda.driver.major=418 nvidia.com/cuda.driver.minor=87 nvidia.com/cuda.driver.rev=01 nvidia.com/cuda.runtime.major=10 nvidia.com/cuda.runtime.minor=1 nvidia.com/gfd.timestamp=1572712654 nvidia.com/gpu.compute.major=6 nvidia.com/gpu.compute.minor=0 nvidia.com/gpu.family=pascal nvidia.com/gpu.machine=SYS-4028GR-TR2 nvidia.com/gpu.memory=16280 nvidia.com/gpu.product=Tesla-P100-PCIE-16GB nvidia.com/gpu: 4
If SRO deployments hangs on the NVIDIA driver verification step, restart the CRI-O service on each GPU node with the following commands:
# systemctl restart crio
# systemctl status crio
Successful installation of the SRO looks like the following:
# oc get pod -n openshift-sro -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cuda-vector-add 0/1 Completed 0 1h 10.254.6.13 worker2.ocp4.ocp.labs.mlnx <none> <none> nvidia-dcgm-exporter-8w25q 2/2 Running 0 1h 192.168.7.15 worker4.ocp4.ocp.labs.mlnx <none> <none> nvidia-dcgm-exporter-k7nkr 2/2 Running 0 1h 192.168.7.14 worker3.ocp4.ocp.labs.mlnx <none> <none> nvidia-dcgm-exporter-pxb2b 2/2 Running 0 1h 192.168.7.12 worker1.ocp4.ocp.labs.mlnx <none> <none> nvidia-dcgm-exporter-t5xtf 2/2 Running 0 1h 192.168.7.13 worker2.ocp4.ocp.labs.mlnx <none> <none> nvidia-device-plugin-daemonset-52w7n 1/1 Running 0 1h 10.254.7.9 worker3.ocp4.ocp.labs.mlnx <none> <none> nvidia-device-plugin-daemonset-7hpwk 1/1 Running 0 1h 10.254.5.14 worker1.ocp4.ocp.labs.mlnx <none> <none> nvidia-device-plugin-daemonset-brk87 1/1 Running 0 1h 10.254.4.9 worker4.ocp4.ocp.labs.mlnx <none> <none> nvidia-device-plugin-daemonset-zcsv7 1/1 Running 0 1h 10.254.6.14 worker2.ocp4.ocp.labs.mlnx <none> <none> nvidia-device-plugin-validation 0/1 Completed 0 1h 10.254.7.10 worker3.ocp4.ocp.labs.mlnx <none> <none> nvidia-driver-daemonset-2pmh5 1/1 Running 0 1h 10.254.4.8 worker4.ocp4.ocp.labs.mlnx <none> <none> nvidia-driver-daemonset-5qzww 1/1 Running 0 1h 10.254.7.8 worker3.ocp4.ocp.labs.mlnx <none> <none> nvidia-driver-daemonset-72bgb 1/1 Running 0 1h 10.254.5.11 worker1.ocp4.ocp.labs.mlnx <none> <none> nvidia-driver-daemonset-qvsnj 1/1 Running 0 1h 10.254.6.9 worker2.ocp4.ocp.labs.mlnx <none> <none> nvidia-driver-validation 0/1 Completed 0 1h 10.254.5.13 worker1.ocp4.ocp.labs.mlnx <none> <none> nvidia-feature-discovery-54xt5 1/1 Running 0 1h 10.254.4.10 worker4.ocp4.ocp.labs.mlnx <none> <none> nvidia-feature-discovery-np6rj 1/1 Running 0 1h 10.254.5.16 worker1.ocp4.ocp.labs.mlnx <none> <none> nvidia-feature-discovery-t5lpl 1/1 Running 0 1h 10.254.7.11 worker3.ocp4.ocp.labs.mlnx <none> <none>
Enable the GPUDirect Kernel Module
Start the nv_peer_memory service manually on each GPU-based node from the OCP4 UPI Helper Node.
GPUDirect is currently a Technology Preview feature.
The Helper Node will receive a list of pods with a NVIDIA GPU driver:
# oc get pod -n openshift-sro -o wide | grep nvidia-driver nvidia-driver-daemonset-2pmh5 1/1 Running 0 1h 10.254.4.8 worker4.ocp4.ocp.labs.mlnx <none> <none> nvidia-driver-daemonset-5qzww 1/1 Running 0 1h 10.254.7.8 worker3.ocp4.ocp.labs.mlnx <none> <none> nvidia-driver-daemonset-72bgb 1/1 Running 0 1h 10.254.5.11 worker1.ocp4.ocp.labs.mlnx <none> <none> nvidia-driver-daemonset-qvsnj 1/1 Running 0 1h 10.254.6.9 worker2.ocp4.ocp.labs.mlnx <none> <none>
For each pod in daemoset, execute the next commands:
# oc -n openshift-sro rsh nvidia-driver-daemonset-5qzww sh-4.2# bash [root@nvidia-driver-daemonset-5qzww /]# modprobe nv_peer_mem [root@nvidia-driver-daemonset-5qzww /]# lsmod | grep nv_peer_mem [root@nvidia-driver-daemonset-5qzww /]# exit
This step must be executed if Worker Node has been rebooted
Deployment of InfiniBand and KubeFlow Kubernetes Components
Copy the Openshift-rdma.zip to the OCP4 UPI Helper Node and extract the files. The archive contains the following files:
- device-plugin.yaml – Daemonset for deployment RDMA device plugin with shared InfiniBand HCA
- mpijob-gpud.yaml - MPI Job example
- mpi-operator.yaml - KubeFlow/mpi-operator full installation(no need to install KubeFlow)
- rdma-hca-node-config.yaml - configmap configuration file for RDMA device plugin
Install the RDMA device plugin:
# oc apply -f rdma-hca-node-config.yaml # oc apply -f device-plugin.yaml
KubeFlow MPI-operator installation command:
# oc apply -f mpi-operator.yaml
Application Deployment and Configuration
Application deployment example is provided in the mpijob-gpud.yaml file. This example describes how to run a distributed TensorFlow benchmark with Horovod framework using a KubeFlow MPI-Operator over a high-performance InfiniBand fabric.
Below are the environment variable settings used in the mpijob-gpud.yaml file to run the TensorFlow benchmark:
- TCP mode
NCCL_IB_DISABLE=1
NCCL_NET_GDR_LEVEL=0 - Without GPUDirect
NCCL_IB_DISABLE=0
NCCL_NET_GDR_LEVEL=0 - With GPUDirect
NCCL_IB_DISABLE=0
NCCL_NET_GDR_LEVEL=1
Deploy the application with the below command by running it on Helper Node.
# oc apply -f mpijob-gpud.yaml
Performance Testing
Below are the logs for the distributed TensorFlow benchmark tests with KubeFlow/mpi-operator.
Using GPUDirect (GDR) with the hardware used in our POC environment which includes 4 servers with 4 x P100 PCI GPU in each, will add 6.26% boost in performance.
Higher results are expected for more servers with more powerful GPUs.
Below is the log of running a distributed TensorFlow benchmark tests with KubeFlow/mpi-operator in TCP mode:
+ POD_NAME=tensorflow-benchmarks-worker-1 + shift + /opt/kube/kubectl exec tensorflow-benchmarks-worker-1 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "764542976" -mca ess_base_vpid 2 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[1:4]xnr8,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "764542976.0;tcp://10.254.5.30:36799" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "764542976.0;tcp://10.254.5.30:36799" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated" + POD_NAME=tensorflow-benchmarks-worker-3 + shift + /opt/kube/kubectl exec tensorflow-benchmarks-worker-3 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "764542976" -mca ess_base_vpid 4 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[1:4]xnr8,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "764542976.0;tcp://10.254.5.30:36799" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "764542976.0;tcp://10.254.5.30:36799" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated" + POD_NAME=tensorflow-benchmarks-worker-0 + + shift POD_NAME=tensorflow-benchmarks-worker-2 + shift + /opt/kube/kubectl exec tensorflow-benchmarks-worker-0 -- /bin/sh+ -c/opt/kube/kubectl exec tensorflow-benchmarks-worker-2 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "764542976" -mca ess_base_vpid 1 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[1:4]xnr8,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "764542976.0;tcp://10.254.5.30:36799" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "764542976.0;tcp://10.254.5.30:36799" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated" PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "764542976" -mca ess_base_vpid 3 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[1:4]xnr8,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "764542976.0;tcp://10.254.5.30:36799" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "764542976.0;tcp://10.254.5.30:36799" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated" 2019-10-31 09:33:41.078094: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078419: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078447: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078269: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078241: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078241: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078247: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078278: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078279: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078698: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078761: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078744: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078571: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078689: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.078881: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:41.079080: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:33:42.823273: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x62b6630 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.823346: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.823358: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.823367: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.823376: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.824362: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x50898f0 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.824414: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.824427: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.824436: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.824445: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.824714: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5a3b870 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.824749: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.824760: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.824769: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.824777: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.825649: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x561c1d0 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.825702: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.825717: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.825726: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.825734: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.825804: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4e2b4c0 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.825865: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.825877: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.825885: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.825893: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.827898: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x64b12f0 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.827962: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.827975: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.827984: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.827994: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.828199: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:33:42.828199: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:33:42.829567: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:33:42.830766: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:33:42.831848: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5b190c0 executing computations on platform Host. Devices: 2019-10-31 09:33:42.831879: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.831938: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5167140 executing computations on platform Host. Devices: 2019-10-31 09:33:42.831970: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.832258: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:33:42.832352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:04:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.832397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:33:42.832746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0c:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.832718: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:33:42.832784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:33:42.832751: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56f9a10 executing computations on platform Host. Devices: 2019-10-31 09:33:42.832785: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.833106: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5068d70 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.833193: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.833205: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.833214: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.833222: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.833201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0e:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.833249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:33:42.833835: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6393e60 executing computations on platform Host. Devices: 2019-10-31 09:33:42.833866: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.833940: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x48efc30 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.833980: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.833992: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.834000: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.834008: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.834212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0c:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.834244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:33:42.835262: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x658eb50 executing computations on platform Host. Devices: 2019-10-31 09:33:42.835297: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.835954: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4f08d30 executing computations on platform Host. Devices: 2019-10-31 09:33:42.835987: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.836263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:04:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.836294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:33:42.836375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:06:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.836416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:33:42.837365: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x659ea10 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.837434: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.837447: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.837457: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.837466: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.838824: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz 2019-10-31 09:33:42.839060: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x47f0990 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.839099: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.839112: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.839122: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.839131: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.839143: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x52bad20 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.839181: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.839196: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.839205: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.839213: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.839459: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz 2019-10-31 09:33:42.840115: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4900010 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.840155: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.840166: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.840175: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.840183: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.840820: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5b34010 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.840855: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.840866: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.840875: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.840884: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.842132: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x51465a0 executing computations on platform Host. Devices: 2019-10-31 09:33:42.842164: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.842428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0e:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.842458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:33:42.842500: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x49cd490 executing computations on platform Host. Devices: 2019-10-31 09:33:42.842530: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.842959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:04:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.842997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:33:42.843562: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:33:42.843520: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x54497e0 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.843583: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.843605: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.843619: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.843627: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.843992: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz 2019-10-31 09:33:42.844902: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz 2019-10-31 09:33:42.845210: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:33:42.846111: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:33:42.846665: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5398550 executing computations on platform Host. Devices: 2019-10-31 09:33:42.846694: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.847238: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x667c220 executing computations on platform Host. Devices: 2019-10-31 09:33:42.847271: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.847787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0e:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.847818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:33:42.848053: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x49dd850 executing computations on platform Host. Devices: 2019-10-31 09:33:42.848090: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.848369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:06:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.848410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:33:42.848673: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5c11850 executing computations on platform Host. Devices: 2019-10-31 09:33:42.848705: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.849019: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x48ce1a0 executing computations on platform Host. Devices: 2019-10-31 09:33:42.849050: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.849469: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:06:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.849498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:33:42.849226: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4845c50 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.849265: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.849278: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.849260: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:33:42.849287: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.849297: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.849810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0e:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.849843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0c:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.849849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:33:42.849878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:33:42.850540: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x577e2a0 executing computations on platform CUDA. Devices: 2019-10-31 09:33:42.850575: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.850595: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.850605: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.850613: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:33:42.852293: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5527030 executing computations on platform Host. Devices: 2019-10-31 09:33:42.852325: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.853152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:04:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.853196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:33:42.854239: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:33:42.855184: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:33:42.857681: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x49234b0 executing computations on platform Host. Devices: 2019-10-31 09:33:42.857711: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.858149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0c:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.858191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:33:42.858353: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x585baf0 executing computations on platform Host. Devices: 2019-10-31 09:33:42.858387: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:33:42.858874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:06:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:33:42.858942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:33:42.967915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.967994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:33:42.968008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:33:42.968318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:33:42.969387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.969425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:33:42.969437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:33:42.969687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:33:42.973332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.973402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:33:42.973416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:33:42.975511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:33:42.979394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.979472: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:33:42.979486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:33:42.979557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.979628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:33:42.979642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:33:42.980554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.980595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:33:42.980608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:33:42.982272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:33:42.982341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:33:42.983720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.983805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:33:42.983819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:33:42.984121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.984161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:33:42.984174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:33:42.984355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:33:42.984174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:33:42.984643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.984682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:33:42.984695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:33:42.984881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:33:42.985547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.985590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:33:42.985602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:33:42.985879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model 2019-10-31 09:33:42.980779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.980842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:33:42.980855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:33:42.981085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) W1031 09:33:42.994619 140599664752384 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:33:42.978768 139735270668032 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:33:43.000526 139735270668032 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:42.982844 139681104721664 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2019-10-31 09:33:42.982173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.982206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:33:42.982217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:33:42.982395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) W1031 09:33:42.995758 140556463261440 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:33:43.005537 139681104721664 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:33:42.984108 140153269229312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:33:43.007307 140153269229312 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. 2019-10-31 09:33:42.982481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.982517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:33:42.982529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:33:42.982724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) W1031 09:33:42.991235 140519229642496 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2019-10-31 09:33:42.981610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.981648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:33:42.981659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:33:42.981865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) W1031 09:33:42.995507 139737117169408 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model 2019-10-31 09:33:42.979862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) W1031 09:33:42.993835 140336927958784 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:33:42.997029 139829226764032 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:33:42.995286 140195360659200 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:33:42.994239 140271702304512 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:33:42.994754 139677737731840 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:33:43.013096 140519229642496 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:43.016547 139677737731840 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:43.016844 140336927958784 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:43.018044 140195360659200 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:43.018192 140271702304512 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:43.017959 140599664752384 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:43.018513 139737117169408 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:43.018520 140556463261440 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:43.020489 139829226764032 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model 2019-10-31 09:33:42.993818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.993855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:33:42.993867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:33:42.994212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) W1031 09:33:43.011168 140693077554944 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:33:43.003916 139625134778112 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2019-10-31 09:33:42.993204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:42.993275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:33:42.993288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:33:42.993502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) W1031 09:33:43.006052 140530158384896 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:33:42.999568 140519047988992 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:33:43.022650 140519047988992 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:43.027833 139625134778112 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:43.028671 140530158384896 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:43.034708 140693077554944 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:33:43.045008 139735270668032 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.054826 139681104721664 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.059210 140153269229312 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.059533 140519229642496 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.061650 139677737731840 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.064337 140195360659200 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.064880 140556463261440 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.065628 139737117169408 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.065797 140336927958784 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.067691 140599664752384 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.068659 139829226764032 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.071913 140519047988992 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.073436 140271702304512 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.074522 140530158384896 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.075871 139625134778112 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:43.082001 140693077554944 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:33:45.444138 139677737731840 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.460240 140153269229312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.460233 140693077554944 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.461824 140195360659200 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.500418 139735270668032 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.500960 139625134778112 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.502329 140530158384896 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.504304 140519047988992 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.519560 140599664752384 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.531743 139681104721664 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.533167 140556463261440 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.546535 140519229642496 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.572159 140336927958784 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.576735 139829226764032 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.601868 139737117169408 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.606902 139677737731840 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.621120 140693077554944 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.625215 140271702304512 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.627010 140195360659200 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.627309 140153269229312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.665405 139625134778112 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.667484 139735270668032 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.668427 140519047988992 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.668923 140530158384896 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.684920 140599664752384 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.711060 140556463261440 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.714397 139681104721664 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.717874 140519229642496 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.739548 140336927958784 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.746958 139829226764032 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.773261 139737117169408 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:33:45.799637 140271702304512 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph W1031 09:33:47.557883 139677737731840 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.560198 140693077554944 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.624082 140195360659200 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.636107 140153269229312 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.643582 139625134778112 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.651612 139735270668032 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.680140 140530158384896 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.724147 140599664752384 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.775033 140519229642496 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.785529 139829226764032 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.788232 139681104721664 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.798910 139737117169408 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.816028 140336927958784 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.823199 140556463261440 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:47.911291 140271702304512 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:33:48.041676 140519047988992 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-10-31 09:33:48.216494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:33:48.216593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.216609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:33:48.216620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:33:48.216880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:33:48.242691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:33:48.242809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.242824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:33:48.242834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:33:48.243075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:33:48.323609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:33:48.323757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.323776: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:33:48.323786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:33:48.324072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:33:48.340712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:33:48.340826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.340841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:33:48.340850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:33:48.341093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:33:48.351236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:33:48.351341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.351354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:33:48.351364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:33:48.351623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:33:48.381960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:33:48.382084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.382099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:33:48.382109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:33:48.382446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:33:48.390172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:33:48.390323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.390339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:33:48.390349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:33:48.390635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:33:48.427686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:33:48.427803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.427818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:33:48.427828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:33:48.428213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:33:48.490283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:33:48.490391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.490407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:33:48.490416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:33:48.490670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:33:48.507249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:33:48.507402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.507436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:33:48.507446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:33:48.507933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:33:48.556344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:33:48.556481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.556495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:33:48.556506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:33:48.556947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:33:48.574927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:33:48.575046: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.575062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:33:48.575072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:33:48.575334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:33:48.576334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:33:48.576457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.576475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:33:48.576485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:33:48.577031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:33:48.621957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:33:48.622073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.622088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:33:48.622098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:33:48.622460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:33:48.648510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:33:48.648617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.648632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:33:48.648642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:33:48.648908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:33:48.660298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:33:48.660445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:33:48.660466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:33:48.660477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:33:48.660728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) I1031 09:33:51.565660 139677737731840 session_manager.py:491] Running local_init_op. I1031 09:33:51.679857 140693077554944 session_manager.py:491] Running local_init_op. I1031 09:33:51.698377 140195360659200 session_manager.py:491] Running local_init_op. I1031 09:33:51.745176 139677737731840 session_manager.py:493] Done running local_init_op. I1031 09:33:51.749810 139625134778112 session_manager.py:491] Running local_init_op. I1031 09:33:51.770334 139735270668032 session_manager.py:491] Running local_init_op. I1031 09:33:51.828540 140153269229312 session_manager.py:491] Running local_init_op. I1031 09:33:51.859666 139829226764032 session_manager.py:491] Running local_init_op. I1031 09:33:51.862105 140530158384896 session_manager.py:491] Running local_init_op. I1031 09:33:51.864490 140693077554944 session_manager.py:493] Done running local_init_op. I1031 09:33:51.870752 140599664752384 session_manager.py:491] Running local_init_op. I1031 09:33:51.883411 140195360659200 session_manager.py:493] Done running local_init_op. I1031 09:33:51.935793 139625134778112 session_manager.py:493] Done running local_init_op. I1031 09:33:51.940668 140519229642496 session_manager.py:491] Running local_init_op. I1031 09:33:51.945004 139737117169408 session_manager.py:491] Running local_init_op. I1031 09:33:51.967892 139735270668032 session_manager.py:493] Done running local_init_op. I1031 09:33:51.979897 140336927958784 session_manager.py:491] Running local_init_op. I1031 09:33:52.010399 140153269229312 session_manager.py:493] Done running local_init_op. I1031 09:33:52.041698 140519047988992 session_manager.py:491] Running local_init_op. I1031 09:33:52.052779 139829226764032 session_manager.py:493] Done running local_init_op. I1031 09:33:52.060791 140530158384896 session_manager.py:493] Done running local_init_op. I1031 09:33:52.067558 140599664752384 session_manager.py:493] Done running local_init_op. I1031 09:33:52.077525 140556463261440 session_manager.py:491] Running local_init_op. I1031 09:33:52.088519 139681104721664 session_manager.py:491] Running local_init_op. I1031 09:33:52.091564 140271702304512 session_manager.py:491] Running local_init_op. I1031 09:33:52.144148 140519229642496 session_manager.py:493] Done running local_init_op. I1031 09:33:52.144645 139737117169408 session_manager.py:493] Done running local_init_op. I1031 09:33:52.170006 140336927958784 session_manager.py:493] Done running local_init_op. I1031 09:33:52.232106 140519047988992 session_manager.py:493] Done running local_init_op. I1031 09:33:52.265110 140556463261440 session_manager.py:493] Done running local_init_op. I1031 09:33:52.277443 140271702304512 session_manager.py:493] Done running local_init_op. I1031 09:33:52.277595 139681104721664 session_manager.py:493] Done running local_init_op. Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up 2019-10-31 09:34:20.901503: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:20.901916: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:20.985326: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:21.108461: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:21.122823: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:21.298203: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:21.334801: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:21.336122: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:21.388912: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:21.416389: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:21.620958: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:21.845133: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:21.907949: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:21.945960: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:21.994445: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:34:22.360291: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.19<0> tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.19<0> tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.19<0> tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.19<0> tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.27<0> tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.27<0> tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.27<0> tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.27<0> tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1. NCCL version 2.4.2+cuda10.0 tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.18<0> tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.18<0> tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.18<0> tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.18<0> tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.31<0> tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.31<0> tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.31<0> tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.31<0> tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 1. tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Setting affinity for GPU 0 to 0fff tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO comm 0x7fcc343cd5e0 rank 0 nranks 16 cudaDev 0 nvmlDev 0 tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO Setting affinity for GPU 2 to 0fff tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Setting affinity for GPU 1 to 0fff tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO comm 0x7f929c35d880 rank 6 nranks 16 cudaDev 2 nvmlDev 2 tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO comm 0x7f09183cac80 rank 9 nranks 16 cudaDev 1 nvmlDev 1 tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO Setting affinity for GPU 3 to 0fff tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO comm 0x7fa1cc378dd0 rank 15 nranks 16 cudaDev 3 nvmlDev 3 tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Setting affinity for GPU 3 to 0fff tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO comm 0x7efc10383a70 rank 3 nranks 16 cudaDev 3 nvmlDev 3 tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO Setting affinity for GPU 2 to 0fff tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO Setting affinity for GPU 1 to 0fff tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO comm 0x7f1624349ad0 rank 14 nranks 16 cudaDev 2 nvmlDev 2 tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Setting affinity for GPU 2 to 0fff tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO comm 0x7ff4b835d910 rank 2 nranks 16 cudaDev 2 nvmlDev 2 tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO comm 0x7fcec83557f0 rank 1 nranks 16 cudaDev 1 nvmlDev 1 tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Setting affinity for GPU 1 to 0fff tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Setting affinity for GPU 0 to 0fff tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO Setting affinity for GPU 3 to 0fff tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO comm 0x7f15b837ba20 rank 11 nranks 16 cudaDev 3 nvmlDev 3 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Setting affinity for GPU 0 to 0fff tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO comm 0x7fcc3c33bb70 rank 13 nranks 16 cudaDev 1 nvmlDev 1 tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO comm 0x7f770837e260 rank 12 nranks 16 cudaDev 0 nvmlDev 0 tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Setting affinity for GPU 0 to 0fff tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO comm 0x7f2b985654d0 rank 4 nranks 16 cudaDev 0 nvmlDev 0 tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Setting affinity for GPU 1 to 0fff tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO comm 0x7f08503a4be0 rank 5 nranks 16 cudaDev 1 nvmlDev 1 tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO Setting affinity for GPU 3 to 0fff tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO comm 0x7f80d438fcb0 rank 7 nranks 16 cudaDev 3 nvmlDev 3 tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO Setting affinity for GPU 2 to 0fff tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO comm 0x7fd4e83c3530 rank 10 nranks 16 cudaDev 2 nvmlDev 2 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO comm 0x7fdef84c17c0 rank 8 nranks 16 cudaDev 0 nvmlDev 0 tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO CUDA Dev 3[3], Socket NIC distance : SOC tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO CUDA Dev 2[2], Socket NIC distance : SOC tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO CUDA Dev 1[1], Socket NIC distance : SOC tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO CUDA Dev 0[0], Socket NIC distance : SOC tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO CUDA Dev 1[1], Socket NIC distance : SOC tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO CUDA Dev 0[0], Socket NIC distance : SOC tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO CUDA Dev 2[2], Socket NIC distance : SOC tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO CUDA Dev 3[3], Socket NIC distance : SOC tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO CUDA Dev 3[3], Socket NIC distance : SOC tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO CUDA Dev 2[2], Socket NIC distance : SOC tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO CUDA Dev 0[0], Socket NIC distance : SOC tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO CUDA Dev 1[1], Socket NIC distance : SOC tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO CUDA Dev 2[2], Socket NIC distance : SOC tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO CUDA Dev 0[0], Socket NIC distance : SOC tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO CUDA Dev 3[3], Socket NIC distance : SOC tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO CUDA Dev 1[1], Socket NIC distance : SOC tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Channel 00 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0. tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0) tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0. tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0) tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0. tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0) tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0. tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0) tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 7 -> 8 [receive] via NET/Socket/0 tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Ring 00 : 11 -> 12 [receive] via NET/Socket/0 tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Ring 00 : 15 -> 0 [receive] via NET/Socket/0 tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Ring 00 : 3 -> 4 [receive] via NET/Socket/0 tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Ring 00 : 13[1] -> 14[2] via P2P/IPC tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO Ring 00 : 14[2] -> 15[3] via P2P/IPC tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO Ring 00 : 6[2] -> 7[3] via P2P/IPC tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Ring 00 : 5[1] -> 6[2] via P2P/IPC tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Ring 00 : 4[0] -> 5[1] via P2P/IPC tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 00 : 2[2] -> 3[3] via P2P/IPC tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO Ring 00 : 1[1] -> 2[2] via P2P/IPC tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via P2P/IPC tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Ring 00 : 12[0] -> 13[1] via P2P/IPC tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO Ring 00 : 7 -> 8 [send] via NET/Socket/0 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 8[0] -> 9[1] via P2P/IPC tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO Ring 00 : 10[2] -> 11[3] via P2P/IPC tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Ring 00 : 9[1] -> 10[2] via P2P/IPC tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Ring 00 : 3 -> 4 [send] via NET/Socket/0 tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO Ring 00 : 15 -> 0 [send] via NET/Socket/0 tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO Ring 00 : 11 -> 12 [send] via NET/Socket/0 tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO Ring 00 : 7[3] -> 6[2] via P2P/IPC tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Ring 00 : 3[3] -> 2[2] via P2P/IPC tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO Ring 00 : 15[3] -> 14[2] via P2P/IPC tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0) tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO Ring 00 : 11[3] -> 10[2] via P2P/IPC tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Trees [0] 2->3->-1/-1/-1 tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO Trees [0] 14->15->-1/-1/-1 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 4 -> 8 [receive] via NET/Socket/0 tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO Trees [0] 10->11->-1/-1/-1 tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0) tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 00 : 2[2] -> 1[1] via P2P/IPC tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO Ring 00 : 1[1] -> 0[0] via P2P/IPC tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO Ring 00 : 14[2] -> 13[1] via P2P/IPC tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Ring 00 : 13[1] -> 12[0] via P2P/IPC tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Trees [0] 1->2->3/-1/-1 tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO Ring 00 : 10[2] -> 9[1] via P2P/IPC tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0) tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Ring 00 : 5[1] -> 4[0] via P2P/IPC tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Ring 00 : 9[1] -> 8[0] via P2P/IPC tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO Trees [0] 6->7->-1/-1/-1 tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO Trees [0] 13->14->15/-1/-1 tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Ring 00 : 4 -> 8 [send] via NET/Socket/0 tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO Ring 00 : 6[2] -> 5[1] via P2P/IPC tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Trees [0] 4->5->6/-1/-1 tensorflow-benchmarks-worker-2:61:259 [3] NCCL INFO comm 0x7f15b837ba20 rank 11 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO Trees [0] 5->6->7/-1/-1 tensorflow-benchmarks-worker-1:60:261 [2] NCCL INFO comm 0x7f929c35d880 rank 6 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO Trees [0] 9->10->11/-1/-1 tensorflow-benchmarks-worker-2:60:260 [2] NCCL INFO comm 0x7fd4e83c3530 rank 10 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE tensorflow-benchmarks-worker-1:61:259 [3] NCCL INFO comm 0x7f80d438fcb0 rank 7 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Trees [0] 8->9->10/-1/-1 tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO comm 0x7f09183cac80 rank 9 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0) tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Ring 00 : 8 -> 4 [receive] via NET/Socket/0 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 12 -> 8 [receive] via NET/Socket/0 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 8 -> 0 [send] via NET/Socket/0 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0) tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO comm 0x7f08503a4be0 rank 5 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 0 -> 8 [receive] via NET/Socket/0 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 8 -> 4 [send] via NET/Socket/0 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Ring 00 : 8 -> 12 [send] via NET/Socket/0 tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO Trees [0] 8->4->5/-1/-1 tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Trees [0] 12->13->14/-1/-1 tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO comm 0x7fcc3c33bb70 rank 13 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE tensorflow-benchmarks-worker-3:61:261 [3] NCCL INFO comm 0x7fa1cc378dd0 rank 15 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO comm 0x7efc10383a70 rank 3 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO Trees [0] 0->1->2/-1/-1 tensorflow-benchmarks-worker-0:58:260 [1] NCCL INFO comm 0x7fcec83557f0 rank 1 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Ring 00 : 8 -> 0 [receive] via NET/Socket/0 tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Ring 00 : 0 -> 8 [send] via NET/Socket/0 tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Trees [0] -1->0->1/8/-1 tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Using 256 threads, Min Comp Cap 6, Trees enabled up to size 479999 tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO comm 0x7fcc343cd5e0 rank 0 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO comm 0x7ff4b835d910 rank 2 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE tensorflow-benchmarks-worker-1:55:260 [0] NCCL INFO comm 0x7f2b985654d0 rank 4 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Ring 00 : 12 -> 8 [send] via NET/Socket/0 tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO include/net.h:24 -> 2 tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO NET/Socket : GPU Direct RDMA Disabled for GPU 0[0] / HCA 0 (distance 3 >= 0) tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Ring 00 : 8 -> 12 [receive] via NET/Socket/0 tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO Trees [0] 8->12->13/-1/-1 tensorflow-benchmarks-worker-3:56:263 [0] NCCL INFO comm 0x7f770837e260 rank 12 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE tensorflow-benchmarks-worker-3:60:258 [2] NCCL INFO comm 0x7f1624349ad0 rank 14 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE tensorflow-benchmarks-worker-0:56:259 [0] NCCL INFO Launch mode Parallel tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO Trees [0] 0->8->9/4/12 tensorflow-benchmarks-worker-2:55:261 [0] NCCL INFO comm 0x7fdef84c17c0 rank 8 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss 1 images/sec: 17.4 +/- 0.0 (jitter = 0.0) 7.801 1 images/sec: 17.4 +/- 0.0 (jitter = 0.0) 8.004 1 images/sec: 17.4 +/- 0.0 (jitter = 0.0) 8.178 1 images/sec: 17.4 +/- 0.0 (jitter = 0.0) 7.886 1 images/sec: 17.3 +/- 0.0 (jitter = 0.0) 7.729 1 images/sec: 17.3 +/- 0.0 (jitter = 0.0) 7.780 1 images/sec: 17.4 +/- 0.0 (jitter = 0.0) 7.869 1 images/sec: 17.4 +/- 0.0 (jitter = 0.0) 7.788 1 images/sec: 17.4 +/- 0.0 (jitter = 0.0) 7.565 1 images/sec: 17.4 +/- 0.0 (jitter = 0.0) 7.802 1 images/sec: 17.3 +/- 0.0 (jitter = 0.0) 7.806 1 images/sec: 17.4 +/- 0.0 (jitter = 0.0) 8.090 1 images/sec: 17.4 +/- 0.0 (jitter = 0.0) 7.888 1 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.952 1 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.684 1 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.636 10 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.569 10 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.651 10 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.585 10 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.729 10 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.601 10 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.738 10 images/sec: 17.5 +/- 0.0 (jitter = 0.2) 7.696 10 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.547 10 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.879 10 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 8.061 10 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 8.019 10 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.723 10 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.627 10 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.714 10 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.731 10 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.839 20 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.591 20 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.617 20 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.645 20 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.505 20 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.639 20 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.702 20 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.593 20 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.661 20 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.767 20 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.730 20 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.756 20 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.601 20 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.814 20 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.580 20 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.423 20 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.555 30 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.609 30 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.679 30 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.558 30 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.614 30 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.722 30 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.654 30 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.851 30 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.626 30 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.861 30 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.539 30 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.639 30 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.513 30 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.762 30 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.889 30 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.560 30 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.547 40 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.613 40 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.563 40 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.683 40 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.570 40 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.455 40 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.594 40 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.625 40 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.509 40 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.672 40 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.411 40 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.389 40 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.605 40 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.414 40 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.686 40 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.588 40 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.592 50 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.581 50 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.572 50 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.593 50 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.643 50 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.531 50 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.526 50 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.584 50 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.487 50 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.552 50 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.607 50 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.526 50 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.468 50 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.433 50 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.512 50 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.561 50 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.533 60 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.523 60 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.493 60 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.499 60 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.484 60 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.466 60 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.476 60 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.351 60 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.602 60 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.460 60 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.718 60 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.592 60 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.530 60 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.527 60 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.387 60 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.481 60 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.484 70 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.508 70 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.475 70 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.500 70 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.561 70 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.528 70 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.542 70 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.406 70 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.521 70 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.523 70 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.468 70 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.482 70 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.617 70 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.523 70 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.522 70 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.490 70 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.572 80 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.454 80 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.464 80 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.474 80 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.409 80 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.402 80 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.527 80 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.470 80 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.401 80 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.533 80 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.439 80 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.451 80 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.481 80 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.485 80 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.559 80 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.541 80 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.437 90 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.394 90 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.500 90 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.425 90 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.489 90 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.531 90 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.444 90 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.404 90 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.478 90 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.525 90 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.533 90 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.505 90 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.459 90 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.434 90 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.473 90 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.494 90 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.479 100 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.416 ---------------------------------------------------------------- total images/sec: 280.34 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.449 ---------------------------------------------------------------- total images/sec: 280.34 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.474 ---------------------------------------------------------------- total images/sec: 280.36 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.545 ---------------------------------------------------------------- total images/sec: 280.32 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.480 ---------------------------------------------------------------- total images/sec: 280.32 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.571 ---------------------------------------------------------------- total images/sec: 280.33 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.542 ---------------------------------------------------------------- total images/sec: 280.32 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.644 ---------------------------------------------------------------- total images/sec: 280.34 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.576 ---------------------------------------------------------------- total images/sec: 280.31 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.508 ---------------------------------------------------------------- total images/sec: 280.32 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.569 ---------------------------------------------------------------- total images/sec: 280.34 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.466 ---------------------------------------------------------------- total images/sec: 280.31 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.608 ---------------------------------------------------------------- total images/sec: 280.32 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.479 ---------------------------------------------------------------- total images/sec: 280.30 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.1) 7.457 ---------------------------------------------------------------- total images/sec: 280.30 ---------------------------------------------------------------- 100 images/sec: 17.5 +/- 0.0 (jitter = 0.0) 7.603 ---------------------------------------------------------------- total images/sec: 280.33 ----------------------------------------------------------------
Log of running distributed TensorFlow benchmark tests with KubeFlow/mpi-operator without GPUDirect:
+ POD_NAME=tensorflow-benchmarks-worker-0 + shift + /opt/kube/kubectl exec tensorflow-benchmarks-worker-0 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "943194112" -mca ess_base_vpid 1 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[2:99]pq5,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "943194112.0;tcp://10.254.7.17:57414" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "943194112.0;tcp://10.254.7.17:57414" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated" + POD_NAME=tensorflow-benchmarks-worker-2 + shift + /opt/kube/kubectl exec tensorflow-benchmarks-worker-2 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "943194112" -mca ess_base_vpid 3 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[2:99]pq5,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "943194112.0;tcp://10.254.7.17:57414" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "943194112.0;tcp://10.254.7.17:57414" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated" + POD_NAME=tensorflow-benchmarks-worker-1 + shift + /opt/kube/kubectl exec tensorflow-benchmarks-worker-1 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "943194112" -mca ess_base_vpid 2 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[2:99]pq5,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "943194112.0;tcp://10.254.7.17:57414" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "943194112.0;tcp://10.254.7.17:57414" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated" + POD_NAME=tensorflow-benchmarks-worker-3 + shift + /opt/kube/kubectl exec tensorflow-benchmarks-worker-3 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "943194112" -mca ess_base_vpid 4 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-[2:99]pq5,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "943194112.0;tcp://10.254.7.17:57414" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "943194112.0;tcp://10.254.7.17:57414" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated" 2019-10-31 09:22:54.631963: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632290: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632290: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632337: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632513: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632527: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632533: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632586: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632550: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632322: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632709: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632709: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632441: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632877: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632446: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:54.632745: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:22:56.367912: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5cc7a70 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.367966: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.367978: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.367987: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.367995: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.369297: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x530d070 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.369376: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.369390: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.369400: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.369408: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.373515: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5b49380 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.373552: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.373563: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.373973: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:22:56.373572: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.373580: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.373574: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4bf80a0 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.373627: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.373641: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.373650: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.373658: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.375236: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:22:56.376547: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5da52a0 executing computations on platform Host. Devices: 2019-10-31 09:22:56.376576: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.376964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:06:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.376995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:22:56.376762: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:22:56.377652: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:22:56.378087: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x53ea8c0 executing computations on platform Host. Devices: 2019-10-31 09:22:56.378116: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.378480: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0c:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.378519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:22:56.379111: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5c26bd0 executing computations on platform Host. Devices: 2019-10-31 09:22:56.379146: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.379513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0e:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.379556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:22:56.380668: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4cd5910 executing computations on platform Host. Devices: 2019-10-31 09:22:56.380699: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.382276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:04:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.382317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:22:56.382339: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x61aa7c0 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.382406: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.382422: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.382431: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.382439: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.383896: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x529ddf0 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.383958: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.383970: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.383979: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.383988: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.386278: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4c07240 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.386353: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.386366: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.386375: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.386383: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.387767: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:22:56.388707: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:22:56.390507: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6288000 executing computations on platform Host. Devices: 2019-10-31 09:22:56.390536: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.391321: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x537b630 executing computations on platform Host. Devices: 2019-10-31 09:22:56.391351: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.391016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:06:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.391057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:22:56.391610: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5ef7e10 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.391685: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.391697: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.391707: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.391715: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.391836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0c:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.391866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:22:56.392037: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:22:56.393860: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4f09e80 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.393941: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.393954: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.393963: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.393971: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.394344: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x60fc6c0 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.394384: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.394396: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.394406: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.394415: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.395273: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4ce4a90 executing computations on platform Host. Devices: 2019-10-31 09:22:56.395316: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.396847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:06:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.396893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:22:56.397416: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:22:56.397783: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x588fcd0 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.397826: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.397839: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.397848: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.397857: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.398054: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:22:56.399777: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz 2019-10-31 09:22:56.400107: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5fd5650 executing computations on platform Host. Devices: 2019-10-31 09:22:56.400143: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.400499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0c:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.400522: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x61d9ef0 executing computations on platform Host. Devices: 2019-10-31 09:22:56.400539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:22:56.400550: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.400889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0e:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.400930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:22:56.401425: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x602c630 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.401478: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.401491: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.401499: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.401508: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.401740: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:22:56.403194: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4fe76d0 executing computations on platform Host. Devices: 2019-10-31 09:22:56.403226: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.403824: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4ae4ee0 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.403884: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.403897: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.403906: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.403917: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.404350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:06:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.404381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:22:56.405827: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x596d4f0 executing computations on platform Host. Devices: 2019-10-31 09:22:56.405854: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.405909: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz 2019-10-31 09:22:56.406370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:04:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.406412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:22:56.406964: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4e72300 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.407025: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.407037: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.407046: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.407054: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.409017: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6109e60 executing computations on platform Host. Devices: 2019-10-31 09:22:56.409050: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.409434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0e:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.409474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:22:56.410072: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz 2019-10-31 09:22:56.411539: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz 2019-10-31 09:22:56.412921: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4bc2720 executing computations on platform Host. Devices: 2019-10-31 09:22:56.412950: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.413430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:04:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.413476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:22:56.413961: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4f4fb60 executing computations on platform Host. Devices: 2019-10-31 09:22:56.413995: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.416716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0c:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.416780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:22:56.417767: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5457b40 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.417842: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.417854: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.417863: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.417871: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.419904: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5abfd50 executing computations on platform CUDA. Devices: 2019-10-31 09:22:56.419943: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.419955: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.419964: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.419973: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:22:56.423879: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:22:56.424970: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:22:56.426821: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55353a0 executing computations on platform Host. Devices: 2019-10-31 09:22:56.426854: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.427604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:04:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.427637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:22:56.428632: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5b9d580 executing computations on platform Host. Devices: 2019-10-31 09:22:56.428663: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:22:56.429065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0e:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:22:56.429095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:22:56.514977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.515063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:22:56.515075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:22:56.515311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:22:56.521886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.521957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:22:56.521972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:22:56.522219: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:22:56.523146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.523185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:22:56.523198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:22:56.523406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:22:56.524817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.524852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:22:56.524865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:22:56.525049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:22:56.527885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.527965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:22:56.527979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:22:56.528264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:22:56.525635 139738184173312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2019-10-31 09:22:56.530555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.530624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:22:56.530637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:22:56.530876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:22:56.533018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.533090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:22:56.533104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:22:56.533667: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.533777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:22:56.533792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:22:56.534101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:22:56.535090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:22:56.541262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.541337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:22:56.541351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:22:56.541767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.541804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:22:56.541818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model 2019-10-31 09:22:56.547678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.547769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:22:56.547784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:22:56.548011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:22:56.549959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.550024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:22:56.550054: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:22:56.550310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:22:56.540767 139688259663616 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:22:56.549264 139738184173312 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. 2019-10-31 09:22:56.553952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.553995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:22:56.554008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:22:56.554209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:22:56.555930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.555980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:22:56.555992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:22:56.536351 140029635917568 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:22:56.539637 140418172688128 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:22:56.544073 139855535986432 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model 2019-10-31 09:22:56.558225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) W1031 09:22:56.540405 140677311760128 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:22:56.559299 140029635917568 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.562481 140418172688128 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.563024 140677311760128 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.568738 139855535986432 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:22:56.563279 139688259663616 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.571310 140081232451328 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model 2019-10-31 09:22:56.542218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.542256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:22:56.542269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:22:56.543952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) W1031 09:22:56.566590 140056334747392 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:22:56.553285 140399337785088 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2019-10-31 09:22:56.543732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) W1031 09:22:56.561551 140037793744640 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model 2019-10-31 09:22:56.543864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) W1031 09:22:56.558358 140208799237888 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:22:56.575853 140313852450560 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2019-10-31 09:22:56.554008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:22:56.554068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:22:56.554081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:22:56.554325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) W1031 09:22:56.569016 140458139940608 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:22:56.560127 140119947085568 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:22:56.577635 140399337785088 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:22:56.559716 140509980870400 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:22:56.556494 140335919838976 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:22:56.581333 140208799237888 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.581812 140509980870400 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.582422 140335919838976 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.582604 140119947085568 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.585775 140037793744640 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.592101 140458139940608 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.592391 140056334747392 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.595070 140081232451328 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.597523 139738184173312 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.601429 140313852450560 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:22:56.605411 140029635917568 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.608743 140677311760128 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.610531 140418172688128 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.615458 139688259663616 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.621234 139855535986432 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.626380 140399337785088 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.628083 140509980870400 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.628359 140119947085568 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.630022 140208799237888 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.634344 140335919838976 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.636383 140037793744640 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.638491 140458139940608 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.642560 140081232451328 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.644791 140056334747392 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:56.653468 140313852450560 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:22:58.918779 139738184173312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:58.993772 139688259663616 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.006589 140509980870400 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.037595 140458139940608 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.040095 140677311760128 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.048176 140208799237888 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.066786 140056334747392 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.072578 140399337785088 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.074004 140418172688128 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.077747 139738184173312 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.084980 140081232451328 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.085981 140037793744640 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.091650 140029635917568 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.107937 140335919838976 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.113702 139855535986432 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.128616 140313852450560 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.157861 139688259663616 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.172701 140509980870400 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.176086 140119947085568 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.203059 140677311760128 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.208686 140458139940608 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.214540 140208799237888 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.240148 140056334747392 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.241494 140418172688128 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.245378 140399337785088 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.253228 140037793744640 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.257912 140081232451328 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.264204 140029635917568 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.278820 140335919838976 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.285089 139855535986432 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.302716 140313852450560 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:22:59.361849 140119947085568 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph W1031 09:23:01.022134 139738184173312 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession Initializing graph W1031 09:23:01.126116 140509980870400 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.158821 139688259663616 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.161875 140458139940608 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.191108 140208799237888 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.192427 140677311760128 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.226984 140399337785088 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.229759 140418172688128 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.262723 140037793744640 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.277419 140335919838976 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.291485 140029635917568 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.304083 140081232451328 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.315738 140313852450560 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.316774 139855535986432 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.547414 140119947085568 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:23:01.630776 140056334747392 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-10-31 09:23:01.712294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:23:01.712394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:01.712408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:23:01.712418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:23:01.712657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:23:01.801686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:23:01.801803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:01.801819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:23:01.801829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:23:01.805672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:23:01.845457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:23:01.845549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:01.845563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:23:01.845573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:23:01.846712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:23:01.849217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:23:01.849313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:01.849327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:23:01.849337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:23:01.849595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:23:01.873104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:23:01.873235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:01.873251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:23:01.873261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:23:01.873513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:23:01.875778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:23:01.875880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:01.875896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:23:01.875905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:23:01.876185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:23:01.915287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:23:01.915416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:01.915432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:23:01.915443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:23:01.915764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:23:01.929700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:23:01.929808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:01.929823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:23:01.929833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:23:01.930084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:23:01.949855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:23:01.949970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:01.949986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:23:01.949996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:23:01.950398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:23:01.989288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:23:01.989427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:01.989445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:23:01.989455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:23:01.989732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:23:02.007311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:23:02.007427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:02.007441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:23:02.007451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:23:02.007695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:23:02.018650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:23:02.018750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:02.018764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:23:02.018773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:23:02.019026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:23:02.058512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:23:02.058620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:02.058636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:23:02.058645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:23:02.058895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:23:02.087659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:23:02.087769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:02.087786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:23:02.087796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:23:02.088065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:23:02.263137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:23:02.263242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:02.263257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:23:02.263267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:23:02.263647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:23:02.311255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:23:02.311366: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:23:02.311382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:23:02.311407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:23:02.311675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) I1031 09:23:05.001815 139738184173312 session_manager.py:491] Running local_init_op. I1031 09:23:05.110615 140509980870400 session_manager.py:491] Running local_init_op. I1031 09:23:05.175470 139738184173312 session_manager.py:493] Done running local_init_op. I1031 09:23:05.175794 139688259663616 session_manager.py:491] Running local_init_op. I1031 09:23:05.194576 140458139940608 session_manager.py:491] Running local_init_op. I1031 09:23:05.218982 140677311760128 session_manager.py:491] Running local_init_op. I1031 09:23:05.245636 140208799237888 session_manager.py:491] Running local_init_op. I1031 09:23:05.287847 140418172688128 session_manager.py:491] Running local_init_op. I1031 09:23:05.289076 140509980870400 session_manager.py:493] Done running local_init_op. I1031 09:23:05.352205 140399337785088 session_manager.py:491] Running local_init_op. I1031 09:23:05.366151 139688259663616 session_manager.py:493] Done running local_init_op. I1031 09:23:05.377948 140458139940608 session_manager.py:493] Done running local_init_op. I1031 09:23:05.383546 140335919838976 session_manager.py:491] Running local_init_op. I1031 09:23:05.389096 140081232451328 session_manager.py:491] Running local_init_op. I1031 09:23:05.403619 140677311760128 session_manager.py:493] Done running local_init_op. I1031 09:23:05.410839 140029635917568 session_manager.py:491] Running local_init_op. I1031 09:23:05.417464 140037793744640 session_manager.py:491] Running local_init_op. I1031 09:23:05.419884 140208799237888 session_manager.py:493] Done running local_init_op. I1031 09:23:05.487072 140313852450560 session_manager.py:491] Running local_init_op. I1031 09:23:05.487062 140418172688128 session_manager.py:493] Done running local_init_op. I1031 09:23:05.528172 139855535986432 session_manager.py:491] Running local_init_op. I1031 09:23:05.562064 140335919838976 session_manager.py:493] Done running local_init_op. I1031 09:23:05.564437 140399337785088 session_manager.py:493] Done running local_init_op. I1031 09:23:05.572010 140081232451328 session_manager.py:493] Done running local_init_op. I1031 09:23:05.591631 140029635917568 session_manager.py:493] Done running local_init_op. I1031 09:23:05.607434 140037793744640 session_manager.py:493] Done running local_init_op. I1031 09:23:05.667109 140056334747392 session_manager.py:491] Running local_init_op. I1031 09:23:05.671607 140313852450560 session_manager.py:493] Done running local_init_op. I1031 09:23:05.704953 139855535986432 session_manager.py:493] Done running local_init_op. I1031 09:23:05.706148 140119947085568 session_manager.py:491] Running local_init_op. I1031 09:23:05.865507 140056334747392 session_manager.py:493] Done running local_init_op. I1031 09:23:05.876705 140119947085568 session_manager.py:493] Done running local_init_op. Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up 2019-10-31 09:23:34.466658: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:34.697844: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:34.730197: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:34.755108: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:34.909374: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:34.920007: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:34.939428: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:35.025650: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:35.061560: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:35.124756: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:35.156766: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:35.232465: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:35.394387: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:35.593960: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:35.635274: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:23:36.004079: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.27<0> tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.16<0> tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.16<0> tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.16<0> tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.16<0> tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.16<0> tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.16<0> tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.27<0> tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.27<0> tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.27<0> tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.27<0> NCCL version 2.4.2+cuda10.0 tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.16<0> tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.16<0> tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.25<0> tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.25<0> tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.25<0> tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.25<0> tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.27<0> tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.27<0> tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.27<0> tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.16<0> tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.16<0> tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.25<0> tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.16<0> tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.16<0> tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.25<0> tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.25<0> tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.25<0> tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.16<0> tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.16<0> tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.16<0> tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.16<0> tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Setting affinity for GPU 0 to 0fff tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO comm 0x7f60783f8e30 rank 0 nranks 16 cudaDev 0 nvmlDev 0 tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Setting affinity for GPU 2 to 0fff tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO comm 0x7f5c283d9250 rank 2 nranks 16 cudaDev 2 nvmlDev 2 tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO Setting affinity for GPU 3 to 0fff tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO Setting affinity for GPU 1 to 0fff tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO comm 0x7fb0543bc410 rank 1 nranks 16 cudaDev 1 nvmlDev 1 tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO comm 0x7f83f83e3580 rank 3 nranks 16 cudaDev 3 nvmlDev 3 tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Setting affinity for GPU 0 to 0fff tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO comm 0x7f6f4846aa90 rank 12 nranks 16 cudaDev 0 nvmlDev 0 tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO Setting affinity for GPU 3 to 0fff tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO comm 0x7f66443edf60 rank 7 nranks 16 cudaDev 3 nvmlDev 3 tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Setting affinity for GPU 0 to 0fff tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO comm 0x7f5a4048d110 rank 8 nranks 16 cudaDev 0 nvmlDev 0 tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Setting affinity for GPU 2 to 0fff tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO comm 0x7fbe043e2de0 rank 14 nranks 16 cudaDev 2 nvmlDev 2 tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO Setting affinity for GPU 1 to 0fff tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO comm 0x7fa1903dd250 rank 13 nranks 16 cudaDev 1 nvmlDev 1 tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO Setting affinity for GPU 3 to 0fff tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO comm 0x7fca18409ac0 rank 15 nranks 16 cudaDev 3 nvmlDev 3 tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Setting affinity for GPU 1 to 0fff tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Setting affinity for GPU 2 to 0fff tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO comm 0x7f16643ea9b0 rank 5 nranks 16 cudaDev 1 nvmlDev 1 tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Setting affinity for GPU 0 to 0fff tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO comm 0x7f9c6c47b310 rank 4 nranks 16 cudaDev 0 nvmlDev 0 tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO comm 0x7f0ac43ead00 rank 6 nranks 16 cudaDev 2 nvmlDev 2 tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO Setting affinity for GPU 3 to 0fff tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Setting affinity for GPU 2 to 0fff tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO Setting affinity for GPU 1 to 0fff tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO comm 0x7ff10c3ee8d0 rank 11 nranks 16 cudaDev 3 nvmlDev 3 tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO comm 0x7f31b83e57d0 rank 10 nranks 16 cudaDev 2 nvmlDev 2 tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO comm 0x7fb4b83eb6d0 rank 9 nranks 16 cudaDev 1 nvmlDev 1 tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Channel 00 : 0 1 3 6 4 5 7 10 8 9 11 14 12 13 15 2 tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Channel 01 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0. tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0) tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0. tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0) tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0. tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0) tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0. tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0) tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 3 -> 6 [receive] via NET/IB/0 tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 00 : 15 -> 2 [receive] via NET/IB/0 tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Ring 00 : 11 -> 14 [receive] via NET/IB/0 tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Ring 00 : 5[1] -> 7[3] via P2P/IPC tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO Ring 00 : 1[1] -> 3[3] via P2P/IPC tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Ring 00 : 4[0] -> 5[1] via P2P/IPC tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 6[2] -> 4[0] via P2P/IPC tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via P2P/IPC tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO Ring 00 : 13[1] -> 15[3] via P2P/IPC tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 7 -> 10 [receive] via NET/IB/0 tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 00 : 2[2] -> 0[0] via P2P/IPC tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Ring 00 : 14[2] -> 12[0] via P2P/IPC tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 00 : 12[0] -> 13[1] via P2P/IPC tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO Ring 00 : 7 -> 10 [send] via NET/IB/0 tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO Ring 00 : 15 -> 2 [send] via NET/IB/0 tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO Ring 00 : 3 -> 6 [send] via NET/IB/0 tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 263 mtu 5 LID 12 tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 263 mtu 5 LID 16 tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 263 mtu 5 LID 9 tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO Ring 00 : 9[1] -> 11[3] via P2P/IPC tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Ring 00 : 8[0] -> 9[1] via P2P/IPC tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 10[2] -> 8[0] via P2P/IPC tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO Ring 00 : 15[3] -> 13[1] via P2P/IPC tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO Ring 00 : 11 -> 14 [send] via NET/IB/0 tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO Ring 00 : 13[1] -> 12[0] via P2P/IPC tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 263 mtu 5 LID 14 tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO Ring 00 : 7[3] -> 5[1] via P2P/IPC tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 00 : 12[0] -> 14[2] via P2P/IPC tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO Ring 00 : 3[3] -> 1[1] via P2P/IPC tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Ring 00 : 0[0] -> 2[2] via P2P/IPC tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO Ring 00 : 1[1] -> 0[0] via P2P/IPC tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0) tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Ring 00 : 4[0] -> 6[2] via P2P/IPC tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Ring 00 : 5[1] -> 4[0] via P2P/IPC tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO Ring 01 : 13[1] -> 14[2] via P2P/IPC tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO Ring 00 : 11[3] -> 9[1] via P2P/IPC tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0. tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0) tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 00 : 10 -> 2 [receive] via NET/IB/0 tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO Ring 01 : 15 -> 0 [send] via NET/IB/0 tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Ring 01 : 15 -> 0 [receive] via NET/IB/1 tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO Ring 01 : 1[1] -> 2[2] via P2P/IPC tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Ring 00 : 8[0] -> 10[2] via P2P/IPC tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO Ring 00 : 9[1] -> 8[0] via P2P/IPC tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0. tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0) tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 266 mtu 5 LID 16 tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Ring 01 : 0[0] -> 1[1] via P2P/IPC tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Ring 00 : 14 -> 10 [send] via NET/IB/0 tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0) tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO Ring 01 : 3 -> 4 [send] via NET/IB/0 tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 11 -> 12 [receive] via NET/IB/1 tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 12[0] -> 13[1] via P2P/IPC tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0. tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0) tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 6 -> 10 [receive] via NET/IB/0 tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 0. tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0) tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0) tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Ring 01 : 7 -> 8 [receive] via NET/IB/1 tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 14 -> 10 [receive] via NET/IB/0 tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 267 mtu 5 LID 16 tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO Ring 01 : 9[1] -> 10[2] via P2P/IPC tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Ring 01 : 0 -> 12 [send] via NET/IB/1 tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO Ring 01 : 11 -> 12 [send] via NET/IB/0 tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Ring 01 : 8[0] -> 9[1] via P2P/IPC tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0) tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Ring 01 : 3 -> 4 [receive] via NET/IB/1 tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 6 -> 10 [send] via NET/IB/0 tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Ring 00 : 10 -> 14 [receive] via NET/IB/0 tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 266 mtu 5 LID 14 tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 266 mtu 5 LID 12 tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 266 mtu 5 LID 9 tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 10 -> 2 [send] via NET/IB/0 tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 267 mtu 5 LID 14 tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Ring 01 : 5[1] -> 6[2] via P2P/IPC tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0) tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO Ring 01 : 7 -> 8 [send] via NET/IB/0 tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Ring 01 : 4[0] -> 5[1] via P2P/IPC tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 267 mtu 5 LID 12 tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 10 -> 6 [receive] via NET/IB/0 tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0) tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 8 -> 12 [receive] via NET/IB/1 tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 00 : 2 -> 10 [send] via NET/IB/0 tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0) tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 0 -> 12 [receive] via NET/IB/1 tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 0 >= 0) tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 267 mtu 5 LID 8 tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 2 -> 10 [receive] via NET/IB/0 tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0) tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 269 mtu 5 LID 9 tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 12 -> 4 [send] via NET/IB/1 tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Ring 01 : 8 -> 12 [send] via NET/IB/1 tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Ring 01 : 12 -> 4 [receive] via NET/IB/1 tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 270 mtu 5 LID 13 tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 01 : 2[2] -> 3[3] via P2P/IPC tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 10 -> 6 [send] via NET/IB/0 tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 272 mtu 5 LID 10 tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 00 : 10 -> 14 [send] via NET/IB/0 tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 272 mtu 5 LID 14 tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO Ring 01 : 3[3] -> 2[2] via P2P/IPC tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 273 mtu 5 LID 14 tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO Ring 01 : 1[1] -> 0[0] via P2P/IPC tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0) tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO Trees [0] 1->3->-1/-1/-1 [1] 2->3->-1/-1/-1 tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Ring 01 : 2[2] -> 1[1] via P2P/IPC tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Ring 01 : 12 -> 0 [receive] via NET/IB/1 tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO Trees [0] 0->1->3/-1/-1 [1] 0->1->2/-1/-1 tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 01 : 6[2] -> 7[3] via P2P/IPC tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Ring 01 : 14[2] -> 15[3] via P2P/IPC tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 01 : 10[2] -> 11[3] via P2P/IPC tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO Ring 01 : 15[3] -> 14[2] via P2P/IPC tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO Ring 01 : 11[3] -> 10[2] via P2P/IPC tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO Ring 01 : 7[3] -> 6[2] via P2P/IPC tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Ring 01 : 5[1] -> 4[0] via P2P/IPC tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO Ring 01 : 13[1] -> 12[0] via P2P/IPC tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO Ring 01 : 9[1] -> 8[0] via P2P/IPC tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0) tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO Trees [0] 9->11->-1/-1/-1 [1] 10->11->-1/-1/-1 tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Ring 01 : 10[2] -> 9[1] via P2P/IPC tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO Trees [0] 13->15->-1/-1/-1 [1] 14->15->-1/-1/-1 tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Ring 01 : 12 -> 8 [receive] via NET/IB/1 tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO Trees [0] 8->9->11/-1/-1 [1] 8->9->10/-1/-1 tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO Trees [0] 5->7->-1/-1/-1 [1] 6->7->-1/-1/-1 tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Ring 01 : 14[2] -> 13[1] via P2P/IPC tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO Trees [0] 12->13->15/-1/-1 [1] 12->13->14/-1/-1 tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 01 : 6[2] -> 5[1] via P2P/IPC tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO Trees [0] 4->5->7/-1/-1 [1] 4->5->6/-1/-1 tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Trees [0] 10->6->4/-1/-1 [1] 5->6->7/-1/-1 tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO Trees [0] -1->2->0/10/-1 [1] 1->2->3/-1/-1 tensorflow-benchmarks-worker-0:60:261 [2] NCCL INFO comm 0x7f5c283d9250 rank 2 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE tensorflow-benchmarks-worker-0:61:259 [3] NCCL INFO comm 0x7f83f83e3580 rank 3 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE tensorflow-benchmarks-worker-0:59:260 [1] NCCL INFO comm 0x7fb0543bc410 rank 1 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Ring 01 : 4 -> 12 [send] via NET/IB/1 tensorflow-benchmarks-worker-1:58:258 [1] NCCL INFO comm 0x7f16643ea9b0 rank 5 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE tensorflow-benchmarks-worker-1:61:260 [3] NCCL INFO comm 0x7f66443edf60 rank 7 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO comm 0x7f0ac43ead00 rank 6 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 267 mtu 5 LID 11 tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO Trees [0] 6->4->5/-1/-1 [1] -1->4->5/12/-1 tensorflow-benchmarks-worker-1:57:267 [0] NCCL INFO comm 0x7f9c6c47b310 rank 4 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO Trees [0] 2->10->8/6/14 [1] 9->10->11/-1/-1 tensorflow-benchmarks-worker-2:60:258 [2] NCCL INFO comm 0x7f31b83e57d0 rank 10 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE tensorflow-benchmarks-worker-2:61:260 [3] NCCL INFO comm 0x7ff10c3ee8d0 rank 11 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE tensorflow-benchmarks-worker-2:59:261 [1] NCCL INFO comm 0x7fb4b83eb6d0 rank 9 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO Trees [0] 10->14->12/-1/-1 [1] 13->14->15/-1/-1 tensorflow-benchmarks-worker-3:60:259 [2] NCCL INFO comm 0x7fbe043e2de0 rank 14 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE tensorflow-benchmarks-worker-3:61:258 [3] NCCL INFO comm 0x7fca18409ac0 rank 15 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE tensorflow-benchmarks-worker-3:58:260 [1] NCCL INFO comm 0x7fa1903dd250 rank 13 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 0[0] / HCA 1 (distance 0 >= 0) tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 4 -> 12 [receive] via NET/IB/1 tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 12 -> 8 [send] via NET/IB/1 tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Ring 01 : 12 -> 0 [send] via NET/IB/1 tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 277 mtu 5 LID 10 tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 278 mtu 5 LID 10 tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Trees [0] 2->0->1/-1/-1 [1] 12->0->1/-1/-1 tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Using 256 threads, Min Comp Cap 6, Trees enabled up to size 479999 tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO Trees [0] 10->8->9/-1/-1 [1] 12->8->9/-1/-1 tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO comm 0x7f60783f8e30 rank 0 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE tensorflow-benchmarks-worker-0:58:258 [0] NCCL INFO Launch mode Parallel tensorflow-benchmarks-worker-2:57:259 [0] NCCL INFO comm 0x7f5a4048d110 rank 8 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO Trees [0] 14->12->13/-1/-1 [1] 4->12->13/8/0 tensorflow-benchmarks-worker-3:57:261 [0] NCCL INFO comm 0x7f6f4846aa90 rank 12 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss 1 images/sec: 186.8 +/- 0.0 (jitter = 0.0) 7.892 1 images/sec: 186.7 +/- 0.0 (jitter = 0.0) 7.650 1 images/sec: 186.5 +/- 0.0 (jitter = 0.0) 7.778 1 images/sec: 186.2 +/- 0.0 (jitter = 0.0) 7.585 1 images/sec: 186.4 +/- 0.0 (jitter = 0.0) 7.568 1 images/sec: 187.4 +/- 0.0 (jitter = 0.0) 7.965 1 images/sec: 186.5 +/- 0.0 (jitter = 0.0) 7.828 1 images/sec: 186.4 +/- 0.0 (jitter = 0.0) 8.063 1 images/sec: 186.5 +/- 0.0 (jitter = 0.0) 8.154 1 images/sec: 186.7 +/- 0.0 (jitter = 0.0) 7.896 1 images/sec: 186.1 +/- 0.0 (jitter = 0.0) 7.792 1 images/sec: 186.3 +/- 0.0 (jitter = 0.0) 7.707 1 images/sec: 186.6 +/- 0.0 (jitter = 0.0) 7.768 1 images/sec: 186.0 +/- 0.0 (jitter = 0.0) 7.985 1 images/sec: 186.6 +/- 0.0 (jitter = 0.0) 7.752 1 images/sec: 186.2 +/- 0.0 (jitter = 0.0) 7.902 10 images/sec: 185.8 +/- 0.5 (jitter = 0.7) 7.585 10 images/sec: 185.9 +/- 0.4 (jitter = 0.4) 7.623 10 images/sec: 186.0 +/- 0.4 (jitter = 0.3) 7.543 10 images/sec: 185.9 +/- 0.3 (jitter = 0.6) 7.645 10 images/sec: 185.7 +/- 0.5 (jitter = 0.8) 7.742 10 images/sec: 185.7 +/- 0.5 (jitter = 0.8) 7.718 10 images/sec: 185.7 +/- 0.5 (jitter = 0.9) 7.731 10 images/sec: 185.8 +/- 0.3 (jitter = 0.6) 7.557 10 images/sec: 185.7 +/- 0.4 (jitter = 1.0) 7.771 10 images/sec: 185.6 +/- 0.6 (jitter = 0.8) 7.869 10 images/sec: 185.8 +/- 0.4 (jitter = 0.6) 8.020 10 images/sec: 185.7 +/- 0.4 (jitter = 1.2) 7.594 10 images/sec: 185.7 +/- 0.5 (jitter = 0.3) 7.700 10 images/sec: 185.8 +/- 0.5 (jitter = 0.8) 7.648 10 images/sec: 185.8 +/- 0.3 (jitter = 0.7) 8.038 10 images/sec: 185.7 +/- 0.5 (jitter = 0.8) 7.800 20 images/sec: 186.1 +/- 0.3 (jitter = 0.5) 7.636 20 images/sec: 186.1 +/- 0.3 (jitter = 0.8) 7.773 20 images/sec: 186.1 +/- 0.2 (jitter = 0.5) 7.665 20 images/sec: 186.1 +/- 0.3 (jitter = 0.7) 7.570 20 images/sec: 186.2 +/- 0.2 (jitter = 0.4) 7.657 20 images/sec: 186.1 +/- 0.3 (jitter = 0.5) 7.419 20 images/sec: 186.1 +/- 0.3 (jitter = 0.8) 7.733 20 images/sec: 186.1 +/- 0.2 (jitter = 0.5) 7.520 20 images/sec: 186.1 +/- 0.3 (jitter = 0.7) 7.608 20 images/sec: 186.1 +/- 0.2 (jitter = 0.6) 7.645 20 images/sec: 186.1 +/- 0.2 (jitter = 0.4) 7.771 20 images/sec: 186.2 +/- 0.2 (jitter = 0.5) 7.590 20 images/sec: 186.1 +/- 0.2 (jitter = 0.5) 7.814 20 images/sec: 186.1 +/- 0.2 (jitter = 0.3) 7.683 20 images/sec: 186.1 +/- 0.3 (jitter = 0.5) 7.587 20 images/sec: 186.2 +/- 0.3 (jitter = 1.1) 7.582 30 images/sec: 186.0 +/- 0.2 (jitter = 0.6) 7.602 30 images/sec: 186.0 +/- 0.2 (jitter = 0.5) 7.587 30 images/sec: 186.0 +/- 0.2 (jitter = 0.6) 7.590 30 images/sec: 185.9 +/- 0.3 (jitter = 0.5) 7.559 30 images/sec: 185.9 +/- 0.2 (jitter = 0.6) 7.515 30 images/sec: 185.9 +/- 0.2 (jitter = 0.6) 7.636 30 images/sec: 185.9 +/- 0.2 (jitter = 0.8) 7.850 30 images/sec: 185.9 +/- 0.2 (jitter = 0.8) 7.856 30 images/sec: 185.9 +/- 0.3 (jitter = 0.8) 7.596 30 images/sec: 185.9 +/- 0.3 (jitter = 0.6) 7.750 30 images/sec: 185.9 +/- 0.3 (jitter = 0.9) 7.683 30 images/sec: 185.9 +/- 0.3 (jitter = 0.7) 7.593 30 images/sec: 185.9 +/- 0.3 (jitter = 0.8) 7.630 30 images/sec: 185.9 +/- 0.3 (jitter = 0.8) 7.518 30 images/sec: 185.9 +/- 0.3 (jitter = 0.9) 7.754 30 images/sec: 185.9 +/- 0.3 (jitter = 0.7) 7.876 40 images/sec: 185.6 +/- 0.2 (jitter = 0.7) 7.408 40 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.657 40 images/sec: 185.6 +/- 0.2 (jitter = 0.8) 7.647 40 images/sec: 185.6 +/- 0.3 (jitter = 0.6) 7.562 40 images/sec: 185.7 +/- 0.3 (jitter = 0.6) 7.656 40 images/sec: 185.6 +/- 0.3 (jitter = 0.8) 7.656 40 images/sec: 185.7 +/- 0.3 (jitter = 0.9) 7.432 40 images/sec: 185.6 +/- 0.3 (jitter = 0.8) 7.660 40 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.581 40 images/sec: 185.6 +/- 0.2 (jitter = 0.7) 7.613 40 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.540 40 images/sec: 185.6 +/- 0.3 (jitter = 0.9) 7.680 40 images/sec: 185.7 +/- 0.2 (jitter = 0.6) 7.548 40 images/sec: 185.6 +/- 0.3 (jitter = 1.1) 7.512 40 images/sec: 185.7 +/- 0.3 (jitter = 0.8) 7.367 40 images/sec: 185.7 +/- 0.3 (jitter = 0.8) 7.629 50 images/sec: 185.7 +/- 0.3 (jitter = 0.7) 7.626 50 images/sec: 185.6 +/- 0.3 (jitter = 0.8) 7.551 50 images/sec: 185.7 +/- 0.3 (jitter = 0.8) 7.625 50 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.580 50 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.506 50 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.540 50 images/sec: 185.6 +/- 0.2 (jitter = 0.8) 7.560 50 images/sec: 185.6 +/- 0.2 (jitter = 0.7) 7.497 50 images/sec: 185.7 +/- 0.2 (jitter = 1.0) 7.445 50 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.505 50 images/sec: 185.6 +/- 0.2 (jitter = 0.8) 7.550 50 images/sec: 185.6 +/- 0.3 (jitter = 0.9) 7.512 50 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.578 50 images/sec: 185.7 +/- 0.2 (jitter = 1.0) 7.489 50 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.441 50 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.583 60 images/sec: 185.8 +/- 0.2 (jitter = 0.6) 7.435 60 images/sec: 185.8 +/- 0.2 (jitter = 0.7) 7.470 60 images/sec: 185.8 +/- 0.2 (jitter = 0.8) 7.473 60 images/sec: 185.8 +/- 0.2 (jitter = 0.6) 7.435 60 images/sec: 185.8 +/- 0.2 (jitter = 0.8) 7.376 60 images/sec: 185.8 +/- 0.2 (jitter = 0.6) 7.497 60 images/sec: 185.8 +/- 0.2 (jitter = 0.9) 7.574 60 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.555 60 images/sec: 185.8 +/- 0.2 (jitter = 0.7) 7.465 60 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.562 60 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.473 60 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.632 60 images/sec: 185.8 +/- 0.2 (jitter = 0.6) 7.509 60 images/sec: 185.8 +/- 0.2 (jitter = 0.7) 7.473 60 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.532 60 images/sec: 185.8 +/- 0.2 (jitter = 0.8) 7.550 70 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.471 70 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.461 70 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.517 70 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.493 70 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.585 70 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.462 70 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.520 70 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.530 70 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.535 70 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.426 70 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.560 70 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.475 70 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.464 70 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.511 70 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.503 70 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.524 80 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.381 80 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.453 80 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.487 80 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.471 80 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.425 80 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.466 80 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.464 80 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.436 80 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.453 80 images/sec: 185.7 +/- 0.2 (jitter = 0.6) 7.540 80 images/sec: 185.8 +/- 0.2 (jitter = 0.8) 7.503 80 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.479 80 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.384 80 images/sec: 185.7 +/- 0.1 (jitter = 0.7) 7.501 80 images/sec: 185.8 +/- 0.2 (jitter = 0.8) 7.512 80 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.410 90 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.518 90 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.459 90 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.506 90 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.395 90 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.444 90 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.447 90 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.428 90 images/sec: 185.8 +/- 0.2 (jitter = 0.8) 7.527 90 images/sec: 185.8 +/- 0.1 (jitter = 0.8) 7.485 90 images/sec: 185.7 +/- 0.1 (jitter = 0.7) 7.475 90 images/sec: 185.8 +/- 0.2 (jitter = 0.8) 7.450 90 images/sec: 185.7 +/- 0.1 (jitter = 0.6) 7.500 90 images/sec: 185.7 +/- 0.1 (jitter = 0.7) 7.424 90 images/sec: 185.7 +/- 0.1 (jitter = 0.8) 7.404 90 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.528 90 images/sec: 185.8 +/- 0.1 (jitter = 0.8) 7.553 100 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.489 ---------------------------------------------------------------- total images/sec: 2969.87 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.606 ---------------------------------------------------------------- total images/sec: 2969.87 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.2 (jitter = 0.9) 7.510 ---------------------------------------------------------------- total images/sec: 2969.87 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.1 (jitter = 0.8) 7.630 ---------------------------------------------------------------- total images/sec: 2969.83 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.2 (jitter = 0.7) 7.440 ---------------------------------------------------------------- total images/sec: 2969.91 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.614 ---------------------------------------------------------------- total images/sec: 2970.02 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.420 ---------------------------------------------------------------- total images/sec: 2969.89 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.2 (jitter = 0.8) 7.548 ---------------------------------------------------------------- total images/sec: 2969.84 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.1 (jitter = 0.7) 7.416 ---------------------------------------------------------------- total images/sec: 2969.87 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.1 (jitter = 0.8) 7.449 ---------------------------------------------------------------- total images/sec: 2970.00 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.1 (jitter = 0.8) 7.457 ---------------------------------------------------------------- total images/sec: 2969.87 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.1 (jitter = 0.8) 7.480 ---------------------------------------------------------------- total images/sec: 2969.98 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.1 (jitter = 0.6) 7.553 ---------------------------------------------------------------- total images/sec: 2969.87 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.1 (jitter = 0.8) 7.476 ---------------------------------------------------------------- total images/sec: 2969.95 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.1 (jitter = 0.7) 7.432 ---------------------------------------------------------------- total images/sec: 2969.89 ---------------------------------------------------------------- 100 images/sec: 185.7 +/- 0.1 (jitter = 0.7) 7.514 ---------------------------------------------------------------- total images/sec: 2969.75 ----------------------------------------------------------------
Log of running distributed TensorFlow benchmark tests with KubeFlow/mpi-operator with GPUDirect:
+ POD_NAME=tensorflow-benchmarks-worker-2 + shift + /opt/kube/kubectl exec tensorflow-benchmarks-worker-2 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "1488060416" -mca ess_base_vpid 3 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-zc[2:68]w,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "1488060416.0;tcp://10.254.5.28:54316" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "1488060416.0;tcp://10.254.5.28:54316" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated" + POD_NAME=tensorflow-benchmarks-worker-0 + shift + /opt/kube/kubectl exec tensorflow-benchmarks-worker-0 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "1488060416" -mca ess_base_vpid 1 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-zc[2:68]w,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "1488060416.0;tcp://10.254.5.28:54316" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "1488060416.0;tcp://10.254.5.28:54316" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated" + POD_NAME=tensorflow-benchmarks-worker-1 + shift + /opt/kube/kubectl exec tensorflow-benchmarks-worker-1 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "1488060416" -mca ess_base_vpid 2 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-zc[2:68]w,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "1488060416.0;tcp://10.254.5.28:54316" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "1488060416.0;tcp://10.254.5.28:54316" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated" + POD_NAME=tensorflow-benchmarks-worker-3 + shift + /opt/kube/kubectl exec tensorflow-benchmarks-worker-3 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "1488060416" -mca ess_base_vpid 4 -mca ess_base_num_procs "5" -mca orte_node_regex "tensorflow-benchmarks-launcher-zc[2:68]w,tensorflow-benchmarks-worker-[1:0-3]@0(5)" -mca orte_hnp_uri "1488060416.0;tcp://10.254.5.28:54316" -mca pml "ob1" -mca btl "^openib" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "1488060416.0;tcp://10.254.5.28:54316" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca hwloc_base_binding_policy "none" -mca rmaps_base_mapping_policy "slot" -mca pmix "^s1,s2,cray,isolated" 2019-10-31 09:28:00.800628: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.800997: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.800997: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801030: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801030: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801282: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801376: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801302: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801649: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801601: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801636: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801388: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801705: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801389: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801536: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:00.801461: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-31 09:28:02.552940: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x577afe0 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.553014: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.553027: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.553035: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.553043: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.558257: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz 2019-10-31 09:28:02.558331: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x571fd40 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.558380: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.558401: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.558409: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.558418: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.560581: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5858810 executing computations on platform Host. Devices: 2019-10-31 09:28:02.560613: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.560953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0c:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.560997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:28:02.562364: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz 2019-10-31 09:28:02.565507: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x57fd570 executing computations on platform Host. Devices: 2019-10-31 09:28:02.565538: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.566019: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4feb450 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.566090: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.566103: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.566112: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.566120: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.566436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:06:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.566477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:28:02.568316: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x61a8df0 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.568370: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.568382: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.568390: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.568398: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.569013: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x63f9d80 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.569074: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.569088: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.569097: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.569105: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.569386: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x61242b0 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.569435: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.569448: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.569457: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.569465: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.572749: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz 2019-10-31 09:28:02.573019: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x599ee80 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.573059: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.573071: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.573080: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.573088: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.575052: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:28:02.576129: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x50c8cb0 executing computations on platform Host. Devices: 2019-10-31 09:28:02.576165: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.576414: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5c5e400 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.576478: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:28:02.576456: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.576467: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.576482: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.576490: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.576516: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:28:02.576594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:04:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.576640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:28:02.577872: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:28:02.578533: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x580b140 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.578627: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.578640: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.578649: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.578657: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.579048: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6286630 executing computations on platform Host. Devices: 2019-10-31 09:28:02.579134: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.579233: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6201b00 executing computations on platform Host. Devices: 2019-10-31 09:28:02.579274: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.579465: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x64d75b0 executing computations on platform Host. Devices: 2019-10-31 09:28:02.579501: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.579561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:06:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.579598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:28:02.579699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0c:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.579745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:28:02.579795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:04:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.579824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:28:02.580795: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5a7c6d0 executing computations on platform Host. Devices: 2019-10-31 09:28:02.580832: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.581184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0e:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.581222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:28:02.582929: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199930000 Hz 2019-10-31 09:28:02.584555: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x504b780 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.584604: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.584619: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.584627: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.584635: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.586058: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5d3bc70 executing computations on platform Host. Devices: 2019-10-31 09:28:02.586090: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.586869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0e:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.586912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:28:02.587132: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:28:02.588489: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x53cfb70 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.588534: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.588547: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.588556: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.588565: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.589801: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:28:02.589983: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x58e89b0 executing computations on platform Host. Devices: 2019-10-31 09:28:02.590017: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.591058: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:06:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.591099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:28:02.592938: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5128fc0 executing computations on platform Host. Devices: 2019-10-31 09:28:02.592970: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.593754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0e:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.593793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:28:02.597195: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:28:02.600271: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5524250 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.600348: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.600359: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.600368: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.600376: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.600478: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x54ad3d0 executing computations on platform Host. Devices: 2019-10-31 09:28:02.600511: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.601636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0c:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.601681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:28:02.604752: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:28:02.608007: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5601aa0 executing computations on platform Host. Devices: 2019-10-31 09:28:02.608058: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.608527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0e:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.608557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:28:02.609073: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x58ca930 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.609136: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.609150: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.609159: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.609168: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.610637: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4b03cf0 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.610699: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.610712: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.610721: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.610729: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.613057: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4ff4940 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.613092: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.613104: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.613114: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.613123: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.614904: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:28:02.616644: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:28:02.617349: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:28:02.617666: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x59a8170 executing computations on platform Host. Devices: 2019-10-31 09:28:02.617696: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.618133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:04:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.618177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:28:02.619870: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x50d2180 executing computations on platform Host. Devices: 2019-10-31 09:28:02.619901: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.620285: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4be1520 executing computations on platform Host. Devices: 2019-10-31 09:28:02.620314: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.620599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:0c:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.620630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:28:02.620790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:04:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.620819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:28:02.621544: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6531630 executing computations on platform CUDA. Devices: 2019-10-31 09:28:02.621602: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.621615: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.621625: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.621634: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-10-31 09:28:02.629231: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz 2019-10-31 09:28:02.632316: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x660ee60 executing computations on platform Host. Devices: 2019-10-31 09:28:02.632347: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-10-31 09:28:02.633159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:06:00.0 totalMemory: 15.90GiB freeMemory: 14.90GiB 2019-10-31 09:28:02.633189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:28:02.694075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.694155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:28:02.694169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:28:02.694406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:28:02.701335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.701383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:28:02.701396: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:28:02.723552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.723631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:28:02.723645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:28:02.725370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.725442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:28:02.725455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:28:02.725764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.725830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:28:02.725844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:28:02.725697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:28:02.725870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:28:02.726036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.726079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:28:02.726091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:28:02.726137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:28:02.726303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:28:02.726337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.726375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:28:02.726391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:28:02.726585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:28:02.730960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.731001: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:28:02.731014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:28:02.731236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:28:02.738230 139856166496000 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2019-10-31 09:28:02.738561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.738615: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:28:02.738628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:28:02.738826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model 2019-10-31 09:28:02.712095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.712132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:28:02.712144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:28:02.712327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) W1031 09:28:02.731517 139818715670272 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2019-10-31 09:28:02.701587: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) W1031 09:28:02.727308 140163584431872 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model 2019-10-31 09:28:02.709091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.709152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:28:02.709164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:28:02.709379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) W1031 09:28:02.732473 140586624427776 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:28:02.710931 140554276726528 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:28:02.735715 140554276726528 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. 2019-10-31 09:28:02.744438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.744511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:28:02.744524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:28:02.744780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) W1031 09:28:02.754183 140163584431872 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:28:02.753435 139908811421440 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:28:02.745885 139841235764992 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:28:02.742853 139665852200704 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:28:02.737485 139662100997888 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:28:02.743636 139982112827136 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:28:02.756311 139818715670272 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:28:02.757796 140586624427776 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. 2019-10-31 09:28:02.758983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.759041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:28:02.759055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:28:02.759335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) W1031 09:28:02.759724 139662100997888 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. 2019-10-31 09:28:02.745103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.745149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:28:02.745162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:28:02.746129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) W1031 09:28:02.753757 140131597530880 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:28:02.766823 139665852200704 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:28:02.767652 139982112827136 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model 2019-10-31 09:28:02.767762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.767823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:28:02.767837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:28:02.768104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:28:02.769409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:02.769470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:28:02.769483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:28:02.769816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) W1031 09:28:02.770942 139841235764992 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:28:02.756439 139799656847104 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:28:02.761564 139856166496000 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:28:02.767498 140718879745792 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:28:02.777062 139853571868416 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:28:02.776549 140131597530880 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 512 global 32 per device Num batches: 100 Num epochs: 0.04 Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: horovod ========== Generating training model W1031 09:28:02.779106 139928895440640 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:28:02.779887 139908811421440 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:28:02.782153 139799656847104 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:28:02.787169 140554276726528 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.791988 140017035462400 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W1031 09:28:02.792617 140718879745792 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:28:02.801528 139853571868416 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:28:02.802262 139928895440640 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:28:02.805729 139662100997888 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.806548 140163584431872 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.807551 140586624427776 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.808814 139818715670272 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.808916 139856166496000 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.815933 139982112827136 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.817563 139665852200704 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.818260 140017035462400 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W1031 09:28:02.821896 139841235764992 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.826172 140131597530880 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.828052 139908811421440 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.835323 139799656847104 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.841519 140718879745792 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.849115 139928895440640 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.852732 139853571868416 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:02.871250 140017035462400 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W1031 09:28:05.200214 140554276726528 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.218427 139662100997888 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.223022 140586624427776 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.231928 140131597530880 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.257417 139908811421440 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.267662 139982112827136 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.278401 140163584431872 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.293057 140718879745792 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.301071 139853571868416 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.304152 139665852200704 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.308290 139841235764992 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.311634 139856166496000 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.324692 139799656847104 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.337773 139928895440640 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.370465 140554276726528 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.386629 139662100997888 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.389393 140586624427776 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.395435 140131597530880 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.402461 140017035462400 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.422204 139908811421440 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.434711 139982112827136 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.452119 140163584431872 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.464861 139853571868416 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.472254 140718879745792 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.472842 139665852200704 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.478122 139856166496000 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.478706 139841235764992 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.488337 139818715670272 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.490337 139799656847104 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.516188 139928895440640 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.572691 140017035462400 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W1031 09:28:05.681944 139818715670272 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph Initializing graph W1031 09:28:07.374773 140554276726528 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.388925 140131597530880 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.394137 139662100997888 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.403966 139982112827136 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.430644 139853571868416 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.430839 139908811421440 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.458472 139799656847104 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.483218 139856166496000 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession Initializing graph W1031 09:28:07.488722 139841235764992 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.499745 140163584431872 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.507745 139665852200704 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.528688 140718879745792 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.568386 139928895440640 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.604121 140017035462400 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.769806 140586624427776 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession W1031 09:28:07.904350 139818715670272 deprecation.py:323] From /examples/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-10-31 09:28:08.072869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:28:08.072974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.072988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:28:08.072998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:28:08.073250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:28:08.081420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:28:08.081544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.081559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:28:08.081569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:28:08.081911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:28:08.086161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:28:08.086268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.086283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:28:08.086292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:28:08.086530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:28:08.112899: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:28:08.113008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.113023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:28:08.113033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:28:08.113283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:28:08.123983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:28:08.124095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.124112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:28:08.124123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:28:08.124359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:28:08.165104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:28:08.165240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.165260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:28:08.165270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:28:08.165595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:28:08.194105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:28:08.194211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.194226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:28:08.194236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:28:08.194487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:28:08.198259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:28:08.198364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.198379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:28:08.198389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:28:08.198660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:28:08.208208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:28:08.208331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.208346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:28:08.208356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:28:08.208692: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) 2019-10-31 09:28:08.226755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:28:08.226854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.226869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:28:08.226893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:28:08.227156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:28:08.228849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:28:08.228959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.228974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:28:08.228984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:28:08.229243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:28:08.270501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:28:08.270605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.270621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:28:08.270630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:28:08.270879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:28:08.318952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-10-31 09:28:08.319104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.319121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-10-31 09:28:08.319132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-10-31 09:28:08.319487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:06:00.0, compute capability: 6.0) 2019-10-31 09:28:08.328654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-10-31 09:28:08.328751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.328766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-10-31 09:28:08.328777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-10-31 09:28:08.329034: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0c:00.0, compute capability: 6.0) 2019-10-31 09:28:08.365747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-31 09:28:08.365852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.365867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-31 09:28:08.365876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-31 09:28:08.366110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0) 2019-10-31 09:28:08.678376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-10-31 09:28:08.678506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-31 09:28:08.678522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-10-31 09:28:08.678532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-10-31 09:28:08.679667: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14482 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0e:00.0, compute capability: 6.0) I1031 09:28:11.440287 139982112827136 session_manager.py:491] Running local_init_op. I1031 09:28:11.446637 140554276726528 session_manager.py:491] Running local_init_op. I1031 09:28:11.481120 139853571868416 session_manager.py:491] Running local_init_op. I1031 09:28:11.490534 139662100997888 session_manager.py:491] Running local_init_op. I1031 09:28:11.541015 139908811421440 session_manager.py:491] Running local_init_op. I1031 09:28:11.559617 140131597530880 session_manager.py:491] Running local_init_op. I1031 09:28:11.591317 139799656847104 session_manager.py:491] Running local_init_op. I1031 09:28:11.606469 139841235764992 session_manager.py:491] Running local_init_op. I1031 09:28:11.627890 139982112827136 session_manager.py:493] Done running local_init_op. I1031 09:28:11.633262 140554276726528 session_manager.py:493] Done running local_init_op. I1031 09:28:11.646192 139665852200704 session_manager.py:491] Running local_init_op. I1031 09:28:11.661424 140718879745792 session_manager.py:491] Running local_init_op. I1031 09:28:11.665061 139853571868416 session_manager.py:493] Done running local_init_op. I1031 09:28:11.668709 139856166496000 session_manager.py:491] Running local_init_op. I1031 09:28:11.677753 139662100997888 session_manager.py:493] Done running local_init_op. I1031 09:28:11.721434 140163584431872 session_manager.py:491] Running local_init_op. I1031 09:28:11.736203 140017035462400 session_manager.py:491] Running local_init_op. I1031 09:28:11.739211 139908811421440 session_manager.py:493] Done running local_init_op. I1031 09:28:11.763015 140131597530880 session_manager.py:493] Done running local_init_op. I1031 09:28:11.778822 140586624427776 session_manager.py:491] Running local_init_op. I1031 09:28:11.791004 139799656847104 session_manager.py:493] Done running local_init_op. I1031 09:28:11.792444 139841235764992 session_manager.py:493] Done running local_init_op. I1031 09:28:11.840747 139665852200704 session_manager.py:493] Done running local_init_op. I1031 09:28:11.841266 139928895440640 session_manager.py:491] Running local_init_op. I1031 09:28:11.859958 140718879745792 session_manager.py:493] Done running local_init_op. I1031 09:28:11.861997 139856166496000 session_manager.py:493] Done running local_init_op. I1031 09:28:11.902634 140163584431872 session_manager.py:493] Done running local_init_op. I1031 09:28:11.913870 140017035462400 session_manager.py:493] Done running local_init_op. I1031 09:28:11.981170 140586624427776 session_manager.py:493] Done running local_init_op. I1031 09:28:12.039165 139928895440640 session_manager.py:493] Done running local_init_op. I1031 09:28:12.076092 139818715670272 session_manager.py:491] Running local_init_op. I1031 09:28:12.256900 139818715670272 session_manager.py:493] Done running local_init_op. Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up Running warm up 2019-10-31 09:28:40.740985: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:40.810479: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:40.975794: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.009451: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.137098: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.326322: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.340050: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.355355: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.369061: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.440759: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.484446: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.532443: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.560915: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.690003: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.727085: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-10-31 09:28:41.974115: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.26<0> tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.29<0> tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.29<0> tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.29<0> tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.5.29<0> tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.17<0> tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.17<0> tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.17<0> tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.4.17<0> tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.26<0> tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.26<0> tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.6.26<0> tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.26<0> NCCL version 2.4.2+cuda10.0 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.18<0> tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.18<0> tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.18<0> tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NET/Socket : Using [0]eth0:10.254.7.18<0> tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.26<0> tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.26<0> tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.6.26<0> tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.18<0> tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.18<0> tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.18<0> tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.7.18<0> tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.29<0> tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.29<0> tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.29<0> tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.5.29<0> tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.17<0> tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.17<0> tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.17<0> tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB ; OOB eth0:10.254.4.17<0> tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Setting affinity for GPU 0 to 0fff tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO comm 0x7fdbf03f2980 rank 0 nranks 16 cudaDev 0 nvmlDev 0 tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Setting affinity for GPU 2 to 0fff tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO comm 0x7fd4683eab00 rank 2 nranks 16 cudaDev 2 nvmlDev 2 tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Setting affinity for GPU 1 to 0fff tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Setting affinity for GPU 0 to 0fff tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO Setting affinity for GPU 1 to 0fff tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO comm 0x7f57503a6770 rank 9 nranks 16 cudaDev 1 nvmlDev 1 tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Setting affinity for GPU 3 to 0fff tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO comm 0x7f29243fcf30 rank 3 nranks 16 cudaDev 3 nvmlDev 3 tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO comm 0x7f79703ef4d0 rank 1 nranks 16 cudaDev 1 nvmlDev 1 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO comm 0x7f04ac3dc080 rank 12 nranks 16 cudaDev 0 nvmlDev 0 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Setting affinity for GPU 2 to 0fff tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO comm 0x7f42cc3d4910 rank 10 nranks 16 cudaDev 2 nvmlDev 2 tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO Setting affinity for GPU 3 to 0fff tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Setting affinity for GPU 1 to 0fff tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO Setting affinity for GPU 3 to 0fff tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Setting affinity for GPU 0 to 0fff tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO comm 0x7f31404045c0 rank 8 nranks 16 cudaDev 0 nvmlDev 0 tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO comm 0x7f3e203c5400 rank 11 nranks 16 cudaDev 3 nvmlDev 3 tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Setting affinity for GPU 2 to 0fff tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO comm 0x7f4f303a9880 rank 14 nranks 16 cudaDev 2 nvmlDev 2 tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO comm 0x7f058c3a3740 rank 13 nranks 16 cudaDev 1 nvmlDev 1 tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO comm 0x7f2e643dd590 rank 15 nranks 16 cudaDev 3 nvmlDev 3 tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO Setting affinity for GPU 1 to 0fff tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO comm 0x7f31dc403440 rank 5 nranks 16 cudaDev 1 nvmlDev 1 tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Setting affinity for GPU 0 to 0fff tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO comm 0x7ffab84792e0 rank 4 nranks 16 cudaDev 0 nvmlDev 0 tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO Setting affinity for GPU 3 to 0fff tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Setting affinity for GPU 2 to 0fff tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO comm 0x7f71fc4035a0 rank 6 nranks 16 cudaDev 2 nvmlDev 2 tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO comm 0x7f24b43e5c10 rank 7 nranks 16 cudaDev 3 nvmlDev 3 tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance : PIX PHB tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance : PHB PIX tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Channel 00 : 0 1 3 6 4 5 7 10 8 9 11 14 12 13 15 2 tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Channel 01 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1. tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1. tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1. tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1. tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Ring 00 : 11 -> 14 [receive] via NET/IB/0/GDRDMA tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Ring 00 : 15 -> 2 [receive] via NET/IB/0/GDRDMA tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 7 -> 10 [receive] via NET/IB/0/GDRDMA tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 3 -> 6 [receive] via NET/IB/0/GDRDMA tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO Ring 00 : 1[1] -> 3[3] via P2P/IPC tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via P2P/IPC tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Ring 00 : 8[0] -> 9[1] via P2P/IPC tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Ring 00 : 9[1] -> 11[3] via P2P/IPC tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Ring 00 : 13[1] -> 15[3] via P2P/IPC tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 00 : 12[0] -> 13[1] via P2P/IPC tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Ring 00 : 2[2] -> 0[0] via P2P/IPC tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO Ring 00 : 5[1] -> 7[3] via P2P/IPC tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Ring 00 : 4[0] -> 5[1] via P2P/IPC tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Ring 00 : 14[2] -> 12[0] via P2P/IPC tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 10[2] -> 8[0] via P2P/IPC tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 6[2] -> 4[0] via P2P/IPC tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO Ring 00 : 11 -> 14 [send] via NET/IB/0 tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO Ring 00 : 15 -> 2 [send] via NET/IB/0 tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Ring 00 : 3 -> 6 [send] via NET/IB/0 tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO Ring 00 : 7 -> 10 [send] via NET/IB/0 tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 270 mtu 5 LID 9 tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 270 mtu 5 LID 12 tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 270 mtu 5 LID 16 tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 276 mtu 5 LID 14 tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO Ring 00 : 15[3] -> 13[1] via P2P/IPC tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO Ring 00 : 11[3] -> 9[1] via P2P/IPC tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Ring 00 : 3[3] -> 1[1] via P2P/IPC tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Ring 00 : 8[0] -> 10[2] via P2P/IPC tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Ring 00 : 13[1] -> 12[0] via P2P/IPC tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO Ring 00 : 1[1] -> 0[0] via P2P/IPC tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Ring 00 : 0[0] -> 2[2] via P2P/IPC tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO Ring 00 : 7[3] -> 5[1] via P2P/IPC tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Ring 00 : 9[1] -> 8[0] via P2P/IPC tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 00 : 12[0] -> 14[2] via P2P/IPC tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO Ring 00 : 5[1] -> 4[0] via P2P/IPC tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1. tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1. tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Ring 00 : 4[0] -> 6[2] via P2P/IPC tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1. tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 6 -> 10 [receive] via NET/IB/0/GDRDMA tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Ring 00 : 10 -> 2 [receive] via NET/IB/0/GDRDMA tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 1. tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Ring 00 : 14 -> 10 [send] via NET/IB/0 tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Ring 01 : 13[1] -> 14[2] via P2P/IPC tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO Ring 01 : 1[1] -> 2[2] via P2P/IPC tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO Ring 01 : 15 -> 0 [send] via NET/IB/0 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 11 -> 12 [receive] via NET/IB/1/GDRDMA tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Ring 01 : 15 -> 0 [receive] via NET/IB/1/GDRDMA tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Ring 01 : 3 -> 4 [send] via NET/IB/0 tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 273 mtu 5 LID 12 tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Ring 01 : 0[0] -> 1[1] via P2P/IPC tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 12[0] -> 13[1] via P2P/IPC tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Ring 01 : 7 -> 8 [receive] via NET/IB/1/GDRDMA tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 6 -> 10 [send] via NET/IB/0 tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO Ring 01 : 5[1] -> 6[2] via P2P/IPC tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO Ring 01 : 11 -> 12 [send] via NET/IB/0 tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO Ring 01 : 7 -> 8 [send] via NET/IB/0 tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 279 mtu 5 LID 14 tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Ring 01 : 9[1] -> 10[2] via P2P/IPC tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Ring 01 : 3 -> 4 [receive] via NET/IB/1/GDRDMA tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 273 mtu 5 LID 9 tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 280 mtu 5 LID 14 tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Ring 01 : 8[0] -> 9[1] via P2P/IPC tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 14 -> 10 [receive] via NET/IB/0/GDRDMA tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 273 mtu 5 LID 16 tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 274 mtu 5 LID 12 tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Ring 01 : 4[0] -> 5[1] via P2P/IPC tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 00 : 10 -> 6 [receive] via NET/IB/0/GDRDMA tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 10 -> 2 [send] via NET/IB/0 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 274 mtu 5 LID 9 tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Ring 01 : 0 -> 12 [send] via NET/IB/1 tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Ring 00 : 10 -> 14 [receive] via NET/IB/0/GDRDMA tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Ring 01 : 12 -> 4 [receive] via NET/IB/1/GDRDMA tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 8 -> 12 [receive] via NET/IB/1/GDRDMA tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Ring 00 : 2 -> 10 [send] via NET/IB/0 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 0 -> 12 [receive] via NET/IB/1/GDRDMA tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Ring 01 : 8 -> 12 [send] via NET/IB/1 tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 283 mtu 5 LID 10 tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 272 mtu 5 LID 8 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 12 -> 4 [send] via NET/IB/1 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 270 mtu 5 LID 11 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 2[2] / HCA 0 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 2 -> 10 [receive] via NET/IB/0/GDRDMA tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 276 mtu 5 LID 16 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 10 -> 6 [send] via NET/IB/0 tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Ring 01 : 2[2] -> 3[3] via P2P/IPC tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 00 : 10 -> 14 [send] via NET/IB/0 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 279 mtu 5 LID 9 tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Ring 01 : 3[3] -> 2[2] via P2P/IPC tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 280 mtu 5 LID 9 tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO Ring 01 : 1[1] -> 0[0] via P2P/IPC tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO Trees [0] 1->3->-1/-1/-1 [1] 2->3->-1/-1/-1 tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Ring 01 : 2[2] -> 1[1] via P2P/IPC tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO Trees [0] 0->1->3/-1/-1 [1] 0->1->2/-1/-1 tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Ring 01 : 14[2] -> 15[3] via P2P/IPC tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 01 : 6[2] -> 7[3] via P2P/IPC tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 01 : 10[2] -> 11[3] via P2P/IPC tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO Ring 01 : 15[3] -> 14[2] via P2P/IPC tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO Ring 01 : 11[3] -> 10[2] via P2P/IPC tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Ring 01 : 13[1] -> 12[0] via P2P/IPC tensorflow-benchmarks-worker-0:61:258 [3] NCCL INFO comm 0x7f29243fcf30 rank 3 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO Ring 01 : 7[3] -> 6[2] via P2P/IPC tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO Trees [0] -1->2->0/10/-1 [1] 1->2->3/-1/-1 tensorflow-benchmarks-worker-0:60:260 [2] NCCL INFO comm 0x7fd4683eab00 rank 2 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE tensorflow-benchmarks-worker-0:58:259 [1] NCCL INFO comm 0x7f79703ef4d0 rank 1 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Ring 01 : 12 -> 0 [receive] via NET/IB/1/GDRDMA tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Ring 01 : 9[1] -> 8[0] via P2P/IPC tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO Ring 01 : 5[1] -> 4[0] via P2P/IPC tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO Trees [0] 13->15->-1/-1/-1 [1] 14->15->-1/-1/-1 tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Ring 01 : 14[2] -> 13[1] via P2P/IPC tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO Trees [0] 9->11->-1/-1/-1 [1] 10->11->-1/-1/-1 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Ring 01 : 10[2] -> 9[1] via P2P/IPC tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO Trees [0] 8->9->11/-1/-1 [1] 8->9->10/-1/-1 tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO Trees [0] 5->7->-1/-1/-1 [1] 6->7->-1/-1/-1 tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO Trees [0] 12->13->15/-1/-1 [1] 12->13->14/-1/-1 tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO Trees [0] 10->14->12/-1/-1 [1] 13->14->15/-1/-1 tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Ring 01 : 6[2] -> 5[1] via P2P/IPC tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO Trees [0] 4->5->7/-1/-1 [1] 4->5->6/-1/-1 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO Trees [0] 2->10->8/6/14 [1] 9->10->11/-1/-1 tensorflow-benchmarks-worker-2:60:259 [2] NCCL INFO comm 0x7f42cc3d4910 rank 10 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE tensorflow-benchmarks-worker-2:61:261 [3] NCCL INFO comm 0x7f3e203c5400 rank 11 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Ring 01 : 12 -> 8 [receive] via NET/IB/1/GDRDMA tensorflow-benchmarks-worker-2:58:258 [1] NCCL INFO comm 0x7f57503a6770 rank 9 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE tensorflow-benchmarks-worker-3:61:260 [3] NCCL INFO comm 0x7f2e643dd590 rank 15 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 0[0] / HCA 1 (distance 0 < 1), read 0 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 4 -> 12 [receive] via NET/IB/1/GDRDMA tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 12 -> 8 [send] via NET/IB/1 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Ring 01 : 12 -> 0 [send] via NET/IB/1 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 275 mtu 5 LID 11 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 276 mtu 5 LID 11 tensorflow-benchmarks-worker-3:58:259 [1] NCCL INFO comm 0x7f058c3a3740 rank 13 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE tensorflow-benchmarks-worker-3:60:261 [2] NCCL INFO comm 0x7f4f303a9880 rank 14 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE tensorflow-benchmarks-worker-1:61:258 [3] NCCL INFO comm 0x7f24b43e5c10 rank 7 nranks 16 cudaDev 3 nvmlDev 3 - Init COMPLETE tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO Trees [0] 10->6->4/-1/-1 [1] 5->6->7/-1/-1 tensorflow-benchmarks-worker-1:60:259 [2] NCCL INFO comm 0x7f71fc4035a0 rank 6 nranks 16 cudaDev 2 nvmlDev 2 - Init COMPLETE tensorflow-benchmarks-worker-1:58:260 [1] NCCL INFO comm 0x7f31dc403440 rank 5 nranks 16 cudaDev 1 nvmlDev 1 - Init COMPLETE tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Ring 01 : 4 -> 12 [send] via NET/IB/1 tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO NET/IB: Dev 1 Port 1 qpn 277 mtu 5 LID 13 tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO Trees [0] 6->4->5/-1/-1 [1] -1->4->5/12/-1 tensorflow-benchmarks-worker-1:57:261 [0] NCCL INFO comm 0x7ffab84792e0 rank 4 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO Trees [0] 14->12->13/-1/-1 [1] 4->12->13/8/0 tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO Trees [0] 10->8->9/-1/-1 [1] 12->8->9/-1/-1 tensorflow-benchmarks-worker-3:57:258 [0] NCCL INFO comm 0x7f04ac3dc080 rank 12 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE tensorflow-benchmarks-worker-2:57:260 [0] NCCL INFO comm 0x7f31404045c0 rank 8 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Trees [0] 2->0->1/-1/-1 [1] 12->0->1/-1/-1 tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Using 256 threads, Min Comp Cap 6, Trees enabled up to size 479999 tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO comm 0x7fdbf03f2980 rank 0 nranks 16 cudaDev 0 nvmlDev 0 - Init COMPLETE tensorflow-benchmarks-worker-0:57:261 [0] NCCL INFO Launch mode Parallel Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss Done warm up Step Img/sec total_loss 1 images/sec: 198.5 +/- 0.0 (jitter = 0.0) 7.672 1 images/sec: 198.9 +/- 0.0 (jitter = 0.0) 7.889 1 images/sec: 198.4 +/- 0.0 (jitter = 0.0) 7.782 1 images/sec: 198.5 +/- 0.0 (jitter = 0.0) 7.790 1 images/sec: 196.8 +/- 0.0 (jitter = 0.0) 7.873 1 images/sec: 197.4 +/- 0.0 (jitter = 0.0) 7.645 1 images/sec: 197.3 +/- 0.0 (jitter = 0.0) 8.005 1 images/sec: 197.5 +/- 0.0 (jitter = 0.0) 7.615 1 images/sec: 196.8 +/- 0.0 (jitter = 0.0) 7.785 1 images/sec: 196.7 +/- 0.0 (jitter = 0.0) 7.579 1 images/sec: 196.9 +/- 0.0 (jitter = 0.0) 7.843 1 images/sec: 197.9 +/- 0.0 (jitter = 0.0) 7.785 1 images/sec: 196.8 +/- 0.0 (jitter = 0.0) 8.085 1 images/sec: 198.0 +/- 0.0 (jitter = 0.0) 8.152 1 images/sec: 198.1 +/- 0.0 (jitter = 0.0) 8.017 1 images/sec: 198.5 +/- 0.0 (jitter = 0.0) 7.888 10 images/sec: 197.9 +/- 0.6 (jitter = 1.1) 7.675 10 images/sec: 198.1 +/- 0.5 (jitter = 0.6) 7.622 10 images/sec: 198.1 +/- 0.5 (jitter = 0.9) 7.662 10 images/sec: 197.9 +/- 0.6 (jitter = 1.3) 7.735 10 images/sec: 198.1 +/- 0.4 (jitter = 0.6) 7.623 10 images/sec: 198.1 +/- 0.4 (jitter = 0.7) 7.702 10 images/sec: 198.1 +/- 0.4 (jitter = 0.7) 7.734 10 images/sec: 198.1 +/- 0.4 (jitter = 0.8) 8.034 10 images/sec: 198.1 +/- 0.5 (jitter = 1.0) 8.060 10 images/sec: 197.9 +/- 0.6 (jitter = 1.3) 7.607 10 images/sec: 198.0 +/- 0.4 (jitter = 0.9) 7.706 10 images/sec: 197.9 +/- 0.5 (jitter = 1.4) 7.846 10 images/sec: 197.9 +/- 0.4 (jitter = 0.9) 7.738 10 images/sec: 197.9 +/- 0.4 (jitter = 0.9) 7.844 10 images/sec: 198.1 +/- 0.4 (jitter = 0.8) 7.562 10 images/sec: 197.6 +/- 0.7 (jitter = 1.6) 7.721 20 images/sec: 197.9 +/- 0.3 (jitter = 1.1) 7.660 20 images/sec: 197.9 +/- 0.3 (jitter = 0.8) 7.641 20 images/sec: 198.0 +/- 0.3 (jitter = 0.6) 7.684 20 images/sec: 197.9 +/- 0.3 (jitter = 1.4) 7.777 20 images/sec: 197.8 +/- 0.4 (jitter = 1.5) 7.606 20 images/sec: 197.9 +/- 0.3 (jitter = 1.4) 7.548 20 images/sec: 197.9 +/- 0.4 (jitter = 1.5) 7.615 20 images/sec: 197.9 +/- 0.4 (jitter = 1.4) 7.811 20 images/sec: 197.9 +/- 0.4 (jitter = 1.1) 7.711 20 images/sec: 197.9 +/- 0.4 (jitter = 0.9) 7.582 20 images/sec: 197.9 +/- 0.3 (jitter = 0.7) 7.611 20 images/sec: 197.8 +/- 0.2 (jitter = 1.1) 7.465 20 images/sec: 197.8 +/- 0.3 (jitter = 1.3) 7.557 20 images/sec: 197.8 +/- 0.4 (jitter = 1.1) 7.632 20 images/sec: 197.8 +/- 0.3 (jitter = 1.5) 7.757 20 images/sec: 197.8 +/- 0.3 (jitter = 1.0) 7.591 30 images/sec: 197.8 +/- 0.3 (jitter = 1.0) 7.578 30 images/sec: 197.9 +/- 0.3 (jitter = 1.5) 7.533 30 images/sec: 197.8 +/- 0.3 (jitter = 1.4) 7.672 30 images/sec: 197.8 +/- 0.3 (jitter = 1.3) 7.692 30 images/sec: 197.9 +/- 0.3 (jitter = 1.4) 7.661 30 images/sec: 197.8 +/- 0.3 (jitter = 1.1) 7.545 30 images/sec: 197.8 +/- 0.3 (jitter = 1.0) 7.874 30 images/sec: 197.8 +/- 0.2 (jitter = 1.4) 7.641 30 images/sec: 197.8 +/- 0.2 (jitter = 1.1) 7.705 30 images/sec: 197.8 +/- 0.3 (jitter = 1.1) 7.622 30 images/sec: 197.9 +/- 0.2 (jitter = 0.8) 7.624 30 images/sec: 197.8 +/- 0.3 (jitter = 0.9) 7.643 30 images/sec: 197.8 +/- 0.2 (jitter = 1.1) 7.876 30 images/sec: 197.8 +/- 0.3 (jitter = 1.0) 7.605 30 images/sec: 197.8 +/- 0.3 (jitter = 1.1) 7.880 30 images/sec: 197.8 +/- 0.3 (jitter = 1.0) 7.626 40 images/sec: 197.4 +/- 0.3 (jitter = 1.4) 7.569 40 images/sec: 197.4 +/- 0.3 (jitter = 1.9) 7.676 40 images/sec: 197.3 +/- 0.3 (jitter = 1.5) 7.450 40 images/sec: 197.4 +/- 0.4 (jitter = 1.5) 7.642 40 images/sec: 197.4 +/- 0.3 (jitter = 1.4) 7.400 40 images/sec: 197.4 +/- 0.4 (jitter = 1.7) 7.366 40 images/sec: 197.4 +/- 0.4 (jitter = 1.3) 7.579 40 images/sec: 197.3 +/- 0.3 (jitter = 1.7) 7.630 40 images/sec: 197.3 +/- 0.3 (jitter = 1.4) 7.679 40 images/sec: 197.4 +/- 0.3 (jitter = 1.6) 7.562 40 images/sec: 197.3 +/- 0.3 (jitter = 1.6) 7.412 40 images/sec: 197.3 +/- 0.3 (jitter = 1.4) 7.684 40 images/sec: 197.4 +/- 0.3 (jitter = 1.2) 7.548 40 images/sec: 197.4 +/- 0.3 (jitter = 1.1) 7.539 40 images/sec: 197.3 +/- 0.3 (jitter = 1.4) 7.652 40 images/sec: 197.4 +/- 0.3 (jitter = 1.3) 7.646 50 images/sec: 197.3 +/- 0.3 (jitter = 1.4) 7.481 50 images/sec: 197.3 +/- 0.3 (jitter = 1.3) 7.574 50 images/sec: 197.4 +/- 0.3 (jitter = 1.3) 7.538 50 images/sec: 197.4 +/- 0.3 (jitter = 1.2) 7.579 50 images/sec: 197.4 +/- 0.3 (jitter = 1.3) 7.576 50 images/sec: 197.4 +/- 0.3 (jitter = 1.7) 7.537 50 images/sec: 197.4 +/- 0.3 (jitter = 1.5) 7.528 50 images/sec: 197.4 +/- 0.3 (jitter = 1.2) 7.516 50 images/sec: 197.3 +/- 0.3 (jitter = 1.4) 7.566 50 images/sec: 197.3 +/- 0.3 (jitter = 1.3) 7.477 50 images/sec: 197.4 +/- 0.3 (jitter = 1.1) 7.448 50 images/sec: 197.4 +/- 0.3 (jitter = 1.2) 7.606 50 images/sec: 197.4 +/- 0.3 (jitter = 1.6) 7.614 50 images/sec: 197.3 +/- 0.3 (jitter = 1.6) 7.643 50 images/sec: 197.3 +/- 0.3 (jitter = 1.6) 7.450 50 images/sec: 197.3 +/- 0.3 (jitter = 1.4) 7.605 60 images/sec: 197.5 +/- 0.3 (jitter = 1.2) 7.633 60 images/sec: 197.4 +/- 0.3 (jitter = 1.2) 7.443 60 images/sec: 197.5 +/- 0.3 (jitter = 1.2) 7.491 60 images/sec: 197.5 +/- 0.3 (jitter = 1.1) 7.517 60 images/sec: 197.5 +/- 0.3 (jitter = 1.3) 7.413 60 images/sec: 197.5 +/- 0.3 (jitter = 1.3) 7.464 60 images/sec: 197.5 +/- 0.2 (jitter = 1.3) 7.496 60 images/sec: 197.5 +/- 0.2 (jitter = 1.0) 7.521 60 images/sec: 197.5 +/- 0.3 (jitter = 1.2) 7.455 60 images/sec: 197.5 +/- 0.3 (jitter = 1.5) 7.493 60 images/sec: 197.5 +/- 0.3 (jitter = 1.2) 7.586 60 images/sec: 197.4 +/- 0.3 (jitter = 1.5) 7.541 60 images/sec: 197.5 +/- 0.3 (jitter = 1.5) 7.501 60 images/sec: 197.5 +/- 0.3 (jitter = 1.2) 7.556 60 images/sec: 197.5 +/- 0.2 (jitter = 1.1) 7.412 60 images/sec: 197.5 +/- 0.3 (jitter = 1.4) 7.350 70 images/sec: 197.3 +/- 0.3 (jitter = 1.5) 7.535 70 images/sec: 197.3 +/- 0.3 (jitter = 1.6) 7.489 70 images/sec: 197.3 +/- 0.3 (jitter = 1.3) 7.553 70 images/sec: 197.3 +/- 0.3 (jitter = 1.2) 7.506 70 images/sec: 197.3 +/- 0.3 (jitter = 1.4) 7.520 70 images/sec: 197.3 +/- 0.3 (jitter = 1.5) 7.498 70 images/sec: 197.3 +/- 0.2 (jitter = 1.2) 7.594 70 images/sec: 197.3 +/- 0.2 (jitter = 1.4) 7.428 70 images/sec: 197.3 +/- 0.3 (jitter = 1.3) 7.550 70 images/sec: 197.3 +/- 0.3 (jitter = 1.3) 7.542 70 images/sec: 197.3 +/- 0.2 (jitter = 1.2) 7.498 70 images/sec: 197.3 +/- 0.3 (jitter = 1.4) 7.477 70 images/sec: 197.3 +/- 0.3 (jitter = 1.3) 7.528 70 images/sec: 197.3 +/- 0.3 (jitter = 1.5) 7.506 70 images/sec: 197.3 +/- 0.3 (jitter = 1.5) 7.552 70 images/sec: 197.3 +/- 0.3 (jitter = 1.5) 7.500 80 images/sec: 197.4 +/- 0.2 (jitter = 1.2) 7.454 80 images/sec: 197.3 +/- 0.2 (jitter = 1.4) 7.504 80 images/sec: 197.4 +/- 0.2 (jitter = 1.2) 7.388 80 images/sec: 197.4 +/- 0.2 (jitter = 1.3) 7.528 80 images/sec: 197.4 +/- 0.2 (jitter = 1.5) 7.426 80 images/sec: 197.4 +/- 0.2 (jitter = 1.5) 7.422 80 images/sec: 197.4 +/- 0.2 (jitter = 1.7) 7.467 80 images/sec: 197.4 +/- 0.2 (jitter = 1.3) 7.447 80 images/sec: 197.4 +/- 0.2 (jitter = 1.5) 7.459 80 images/sec: 197.4 +/- 0.2 (jitter = 1.4) 7.460 80 images/sec: 197.4 +/- 0.2 (jitter = 1.5) 7.471 80 images/sec: 197.4 +/- 0.2 (jitter = 1.2) 7.472 80 images/sec: 197.4 +/- 0.2 (jitter = 1.4) 7.513 80 images/sec: 197.4 +/- 0.2 (jitter = 1.2) 7.517 80 images/sec: 197.4 +/- 0.2 (jitter = 1.4) 7.409 80 images/sec: 197.4 +/- 0.2 (jitter = 1.3) 7.527 90 images/sec: 197.5 +/- 0.2 (jitter = 1.2) 7.434 90 images/sec: 197.4 +/- 0.2 (jitter = 1.4) 7.472 90 images/sec: 197.4 +/- 0.2 (jitter = 1.2) 7.498 90 images/sec: 197.4 +/- 0.2 (jitter = 1.4) 7.432 90 images/sec: 197.5 +/- 0.2 (jitter = 1.4) 7.462 90 images/sec: 197.5 +/- 0.2 (jitter = 1.3) 7.451 90 images/sec: 197.5 +/- 0.2 (jitter = 1.4) 7.456 90 images/sec: 197.5 +/- 0.2 (jitter = 1.6) 7.374 90 images/sec: 197.4 +/- 0.2 (jitter = 1.6) 7.439 90 images/sec: 197.4 +/- 0.2 (jitter = 1.5) 7.499 90 images/sec: 197.4 +/- 0.2 (jitter = 1.3) 7.519 90 images/sec: 197.4 +/- 0.2 (jitter = 1.4) 7.467 90 images/sec: 197.4 +/- 0.2 (jitter = 1.1) 7.366 90 images/sec: 197.5 +/- 0.2 (jitter = 1.2) 7.441 90 images/sec: 197.4 +/- 0.2 (jitter = 1.3) 7.527 90 images/sec: 197.4 +/- 0.2 (jitter = 1.3) 7.459 100 images/sec: 197.4 +/- 0.2 (jitter = 1.4) 7.483 ---------------------------------------------------------------- total images/sec: 3156.37 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.5) 7.469 ---------------------------------------------------------------- total images/sec: 3156.67 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.3) 7.441 ---------------------------------------------------------------- total images/sec: 3156.58 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.2) 7.486 ---------------------------------------------------------------- total images/sec: 3156.68 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.4) 7.421 ---------------------------------------------------------------- total images/sec: 3156.25 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.3) 7.500 ---------------------------------------------------------------- total images/sec: 3156.61 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.3) 7.592 ---------------------------------------------------------------- total images/sec: 3156.35 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.4) 7.418 ---------------------------------------------------------------- total images/sec: 3156.62 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.4) 7.563 ---------------------------------------------------------------- total images/sec: 3156.42 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.2) 7.506 ---------------------------------------------------------------- total images/sec: 3156.61 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.5) 7.466 ---------------------------------------------------------------- total images/sec: 3156.46 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.6) 7.532 ---------------------------------------------------------------- total images/sec: 3156.47 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.4) 7.452 ---------------------------------------------------------------- total images/sec: 3156.24 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.6) 7.613 ---------------------------------------------------------------- total images/sec: 3156.26 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.3) 7.470 ---------------------------------------------------------------- total images/sec: 3156.22 ---------------------------------------------------------------- 100 images/sec: 197.4 +/- 0.2 (jitter = 1.3) 7.569 ---------------------------------------------------------------- total images/sec: 3156.20 ----------------------------------------------------------------
Appendix
Customized Mellanox OFED image for kernel-3.10.0-1062.1.2.el7.
SELinux InfiniBand patch - infiniband.zip
Additional OCP components - Openshift-rdma.zip
Related Documents