Jetson TX2: framerate comparison between YOLOv4 YOLOv4-tiny and YOLOv3-tyny
YOLO is an efficient and fast object detection system. Recently a new version has appeared - YOLOv4. How does it work on NVIDIA Jetson TX2? Time to check!
Benchmark setup
- prints the name, version and other details about the current machine and the operating system running on it:
$ uname -a
Linux antmicro-tx2-baseboard 4.9.140-tegra #2 SMP PREEMPT Tue May 19 16:58:27 CEST 2020 aarch64 aarch64 aarch64 GNU/Linux
- check Linux for Tegra version (we have R32.3.1)
$ cat /etc/nv_tegra_release
# R32 (release), REVISION: 3.1, GCID: 18186506, BOARD: t186ref, EABI: aarch64, DATE: Tue Dec 10 07:03:07 UTC 2019
- set the highest performance settings with
sudo nvpmodel -m 0
and check the frequencies:
$ sudo nvpmodel -q --verbose
NVPM VERB: Config file: /etc/nvpmodel.conf
NVPM VERB: parsing done for /etc/nvpmodel.conf
NVPM VERB: Current mode: NV Power Mode: MAXN
0
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_1: PATH /sys/devices/system/cpu/cpu1/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_2: PATH /sys/devices/system/cpu/cpu2/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_3: PATH /sys/devices/system/cpu/cpu3/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_4: PATH /sys/devices/system/cpu/cpu4/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_5: PATH /sys/devices/system/cpu/cpu5/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_A57: ARG MIN_FREQ: PATH /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq: REAL_VAL: 345600 CONF_VAL: 0
NVPM VERB: PARAM CPU_A57: ARG MAX_FREQ: PATH /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: REAL_VAL: 2035200 CONF_VAL: 2147483647
NVPM VERB: PARAM CPU_DENVER: ARG MIN_FREQ: PATH /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq: REAL_VAL: 345600 CONF_VAL: 0
NVPM VERB: PARAM CPU_DENVER: ARG MAX_FREQ: PATH /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq: REAL_VAL: 2035200 CONF_VAL: 2147483647
NVPM VERB: PARAM GPU_POWER_CONTROL_ENABLE: ARG GPU_PWR_CNTL_EN: PATH /sys/devices/gpu.0/power/control: REAL_VAL: auto CONF_VAL: on
NVPM VERB: PARAM GPU: ARG MIN_FREQ: PATH /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/min_freq: REAL_VAL: 114750000 CONF_VAL: 0
NVPM VERB: PARAM GPU: ARG MAX_FREQ: PATH /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/max_freq: REAL_VAL: 1300500000 CONF_VAL: 2147483647
NVPM VERB: PARAM GPU_POWER_CONTROL_DISABLE: ARG GPU_PWR_CNTL_DIS: PATH /sys/devices/gpu.0/power/control: REAL_VAL: auto CONF_VAL: auto
NVPM VERB: PARAM EMC: ARG MAX_FREQ: PATH /sys/kernel/nvpmodel_emc_cap/emc_iso_cap: REAL_VAL: 0 CONF_VAL: 0
- check CUDA version (we have 10.0):
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Mon_Mar_11_22:13:24_CDT_2019
Cuda compilation tools, release 10.0, V10.0.326
- display camera parameters (we have
1920x1080
resolution for/dev/video0
)
$ gst-device-monitor-1.0 Video/Source
Device found:
name : vi-output, ov5640 32-003c
class : Video/Source
caps : video/x-raw, format=(string)UYVY, width=(int)1920, height=(int)1080, framerate=(fraction)30/1;
properties:
udev-probed = true
device.bus_path = platform-15700000.vi
sysfs.path = /sys/devices/13e10000.host1x/15700000.vi/video4linux/video0
device.subsystem = video4linux
device.product.name = "vi-output\,\ ov5640\ 32-003c"
device.capabilities = :capture:
device.api = v4l2
device.path = /dev/video0
v4l2.device.driver = tegra-video
v4l2.device.card = "vi-output\,\ ov5640\ 32-003c"
v4l2.device.bus_info = platform:15700000.vi:0
v4l2.device.version = 264588 (0x0004098c)
v4l2.device.capabilities = 2216689665 (0x84200001)
v4l2.device.device_caps = 69206017 (0x04200001)
gst-launch-1.0 v4l2src ! ...
Build from sources
We use the implementation from AlexeyAB/darknet. This is currently the official implementation of YOLO v4. We recommend reading all the information on that page.
- download sources
git clone https://github.com/AlexeyAB/darknet.git cd darknet/
- edit Makefile
1 GPU=1 2 CUDNN=1 3 CUDNN_HALF=0 4 OPENCV=1 ... 45 ARCH= -gencode arch=compute_62,code=[sm_62,compute_62]
- build sources
make
Download configs and pretrained weights
Model | Size | BFLOPS | mAP@0.5 | Config | Weights |
---|---|---|---|---|---|
YOLOv4 | 512 | 91.1 | 64.9% | link | gdrive |
YOLOv4 | 416 | 60.1 | 62.8% | (as above) | (as above) |
YOLOv4 | 320 | 35.5 | 60.0% | (as above) | (as above) |
EfficientNetB0-Yolov3 | 416 | 3.7 | 45.5% | link | gdrive |
YOLOv3-tyny-prn | 416 | 3.5 | 33.1% | link | gdrive |
YOLOv4-tyny | 416 | 6.9 | 40.2% | link | link |
To download configs, just use wget "<file_url>"
. For downloading from Google Drive in the console you can install sudo pip install gdown
and download using: gdown https://drive.google.com/uc?id=<gdrive_file_id>
.
Modify config files
We set batch=1
and subdivisions=1
in all files. The width
xheight
resolution was changed depending on the test, e.g. 512
x512
, 416
x416
or 320
x320
. You can use any resolution that is a square and a multiple of 32.
[net]
batch=1
subdivisions=1
# Training
#batch=64
#subdivisions=8
width=416
height=416
channels=3
Benchmark
- the following command was used in the benchmark:
./darknet detector demo cfg/coco.data <config_file> <weights_file> -benchmark -c <camera_id>
- YOLO detects the graphics card:
CUDA-version: 10000 (10000), cuDNN: 7.6.3, GPU count: 1
OpenCV version: 4.1.1
Demo
0 : compute_capability = 620, cudnn_half = 0, GPU: NVIDIA Tegra X2
- and video stream
Video stream: 1920 x 1080
Results
Model | Size | BFLOPS | mAP@0.5 | AVG_FPS |
---|---|---|---|---|
YOLOv4 | 512 | 91.1 | 64.9% | 4.3 |
YOLOv4 | 416 | 60.1 | 62.8% | 5.4 |
YOLOv4 | 320 | 35.5 | 60.0% | 7.6 |
EfficientNetB0-Yolov3 | 416 | 3.7 | 45.5% | 0.7 |
YOLOv3-tyny-prn | 416 | 3.5 | 33.1% | 57.0 |
YOLOv4-tyny | 416 | 6.9 | 40.2% | 42.0 |
Not cool. The results are not satisfactory. Maybe it would work much better for fewer classes? Maybe changing the parameters would lead to optimization? Why is YOLO with EfficientNet backbone so weak? We currently have no idea.
Leave a comment