Jetson TX2: framerate comparison between YOLOv4 YOLOv4-tiny and YOLOv3-tyny

14 minute read

YOLO is an efficient and fast object detection system. Recently a new version has appeared - YOLOv4. How does it work on NVIDIA Jetson TX2? Time to check!

Benchmark setup

prints the name, version and other details about the current machine and the operating system running on it:

$ uname -a

Linux antmicro-tx2-baseboard 4.9.140-tegra #2 SMP PREEMPT Tue May 19 16:58:27 CEST 2020 aarch64 aarch64 aarch64 GNU/Linux

check Linux for Tegra version (we have R32.3.1)

$ cat /etc/nv_tegra_release

# R32 (release), REVISION: 3.1, GCID: 18186506, BOARD: t186ref, EABI: aarch64, DATE: Tue Dec 10 07:03:07 UTC 2019

set the highest performance settings with sudo nvpmodel -m 0 and check the frequencies:

$ sudo nvpmodel -q --verbose

NVPM VERB: Config file: /etc/nvpmodel.conf
NVPM VERB: parsing done for /etc/nvpmodel.conf
NVPM VERB: Current mode: NV Power Mode: MAXN
0
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_1: PATH /sys/devices/system/cpu/cpu1/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_2: PATH /sys/devices/system/cpu/cpu2/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_3: PATH /sys/devices/system/cpu/cpu3/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_4: PATH /sys/devices/system/cpu/cpu4/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_5: PATH /sys/devices/system/cpu/cpu5/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_A57: ARG MIN_FREQ: PATH /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq: REAL_VAL: 345600 CONF_VAL: 0
NVPM VERB: PARAM CPU_A57: ARG MAX_FREQ: PATH /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: REAL_VAL: 2035200 CONF_VAL: 2147483647
NVPM VERB: PARAM CPU_DENVER: ARG MIN_FREQ: PATH /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq: REAL_VAL: 345600 CONF_VAL: 0
NVPM VERB: PARAM CPU_DENVER: ARG MAX_FREQ: PATH /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq: REAL_VAL: 2035200 CONF_VAL: 2147483647
NVPM VERB: PARAM GPU_POWER_CONTROL_ENABLE: ARG GPU_PWR_CNTL_EN: PATH /sys/devices/gpu.0/power/control: REAL_VAL: auto CONF_VAL: on
NVPM VERB: PARAM GPU: ARG MIN_FREQ: PATH /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/min_freq: REAL_VAL: 114750000 CONF_VAL: 0
NVPM VERB: PARAM GPU: ARG MAX_FREQ: PATH /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/max_freq: REAL_VAL: 1300500000 CONF_VAL: 2147483647
NVPM VERB: PARAM GPU_POWER_CONTROL_DISABLE: ARG GPU_PWR_CNTL_DIS: PATH /sys/devices/gpu.0/power/control: REAL_VAL: auto CONF_VAL: auto
NVPM VERB: PARAM EMC: ARG MAX_FREQ: PATH /sys/kernel/nvpmodel_emc_cap/emc_iso_cap: REAL_VAL: 0 CONF_VAL: 0

check CUDA version (we have 10.0):

$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Mon_Mar_11_22:13:24_CDT_2019
Cuda compilation tools, release 10.0, V10.0.326

display camera parameters (we have 1920x1080 resolution for /dev/video0)

$ gst-device-monitor-1.0 Video/Source

Device found:

	name  : vi-output, ov5640 32-003c
	class : Video/Source
	caps  : video/x-raw, format=(string)UYVY, width=(int)1920, height=(int)1080, framerate=(fraction)30/1;
	properties:
		udev-probed = true
		device.bus_path = platform-15700000.vi
		sysfs.path = /sys/devices/13e10000.host1x/15700000.vi/video4linux/video0
		device.subsystem = video4linux
		device.product.name = "vi-output\,\ ov5640\ 32-003c"
		device.capabilities = :capture:
		device.api = v4l2
		device.path = /dev/video0
		v4l2.device.driver = tegra-video
		v4l2.device.card = "vi-output\,\ ov5640\ 32-003c"
		v4l2.device.bus_info = platform:15700000.vi:0
		v4l2.device.version = 264588 (0x0004098c)
		v4l2.device.capabilities = 2216689665 (0x84200001)
		v4l2.device.device_caps = 69206017 (0x04200001)
	gst-launch-1.0 v4l2src ! ...

Build from sources

We use the implementation from AlexeyAB/darknet. This is currently the official implementation of YOLO v4. We recommend reading all the information on that page.

download sources

git clone https://github.com/AlexeyAB/darknet.git
cd darknet/

edit Makefile

GPU=1
CUDNN=1
CUDNN_HALF=0
OPENCV=1
...
ARCH= -gencode arch=compute_62,code=[sm_62,compute_62]

build sources
```
make
```

Download configs and pretrained weights

Model	Size	BFLOPS	mAP@0.5	Config	Weights
YOLOv4	512	91.1	64.9%	link	gdrive
YOLOv4	416	60.1	62.8%	(as above)	(as above)
YOLOv4	320	35.5	60.0%	(as above)	(as above)
EfficientNetB0-Yolov3	416	3.7	45.5%	link	gdrive
YOLOv3-tyny-prn	416	3.5	33.1%	link	gdrive
YOLOv4-tyny	416	6.9	40.2%	link	link

To download configs, just use wget "<file_url>". For downloading from Google Drive in the console you can install sudo pip install gdown and download using: gdown https://drive.google.com/uc?id=<gdrive_file_id>.

Modify config files

We set batch=1 and subdivisions=1 in all files. The widthxheight resolution was changed depending on the test, e.g. 512x512, 416x416 or 320x320. You can use any resolution that is a square and a multiple of 32.

[net]
batch=1
subdivisions=1
# Training
#batch=64
#subdivisions=8
width=416
height=416
channels=3

Benchmark

the following command was used in the benchmark:

./darknet detector demo cfg/coco.data <config_file> <weights_file> -benchmark -c <camera_id>

YOLO detects the graphics card:

 CUDA-version: 10000 (10000), cuDNN: 7.6.3, GPU count: 1  
 OpenCV version: 4.1.1
 Demo
 0 : compute_capability = 620, cudnn_half = 0, GPU: NVIDIA Tegra X2

and video stream

 Video stream: 1920 x 1080

Results

Model	Size	BFLOPS	mAP@0.5	AVG_FPS
YOLOv4	512	91.1	64.9%	4.3
YOLOv4	416	60.1	62.8%	5.4
YOLOv4	320	35.5	60.0%	7.6
EfficientNetB0-Yolov3	416	3.7	45.5%	0.7
YOLOv3-tyny-prn	416	3.5	33.1%	57.0
YOLOv4-tyny	416	6.9	40.2%	42.0

Not cool. The results are not satisfactory. Maybe it would work much better for fewer classes? Maybe changing the parameters would lead to optimization? Why is YOLO with EfficientNet backbone so weak? We currently have no idea.

Share on

Twitter Facebook LinkedIn

Bartosz Ptak