Read 3.0 Training Models
In section 1, we downloaded the DNNDK tools and copied them over to the ZCU102 board. A portion of these tools also needs to be installed on the host x86 machine for quantizing and compiling the model.
The tools needed are contained under the host_x86 tools directory.
Please copy the host_x86 folder and its subdirectories to the host machine.
Next cd into the host_x86 directory and install the host tools.
sudo ./install.sh ZCU102
NOTE: The target for this tutorial is the ZCU102, but it should be possible to target other boards as well by changing the target shown above when installing the tools and also modifying the dnnc command to target the correct DPU. As a quick reference, the ZCU102 and ZCU104 use a 4096FA DPU.
Please refer to the DNNDK User Guide for more details on the DNNDK tools.
If you would like to quantize and deploy the model, continue onto 4.0 part 2, otherwise if you would like to first test the quantized and floating point models and compare the mIOU between the two, then jump down to 4.0 part 3.
I have included an example workspace in Segment/DNNDK to show how the DNNDK tools may be invoked as well as the necessary modifications to the prototxt files for both quantization/compilation and testing the float and quantized model mIOUs. Change directory to the DNNDK directory before proceeding to the next step.
Within the DNNDK directory, there is a subdirectory for each model. Inside each model directory several files:
"float.prototxt" is used for quantizing/compiling the models for deployment on the target hardware
"float_test.prototxt" is used for testing the float and quantized models to report the mIOU against the cityscapes validation dataset
"float.caffemodel" is the pre-trained caffemodel.
"quantize_and_compile.sh" is a script that is used to perform both quantization and compilation (decent_q and dnnc) for deployment on the target hardware
"test_float_and_quantized.sh" is a script that will test both the floating point and quantized models and report out the mIOU for each
There is also a subdirectory for decent as a local copy of decent_q is provided rather than the publicly distributed decent_q, as this local copy provides the capability to test both the floating point and quantized models.
The "float.prototxt" files should be mostly identical to your "train_val.prototxt" except for the following:
The input layer has changed from "ImageSegData" type to "ImageData"
Paths have been specified to the calibration data in a relative fashion so that they point to the correct locations if the directory structure is left intact.
Note by default that the prototxt files are set to generate a 512x256 input size model which is intended for use with the xxx_video applications (e.g. fpn_video). If you wish to run the evaluation in hardware on cityscapes validation images rather than on the recorded video (e.g. fpn_eval), the applications use 1024x512, so you will need to modify these input layers accordingly (the float_test.prototxt files have the input set for 1024x512 if you wish to use this as an example).
line 11: source: "../data/cityscapes/calibration.txt"
line 12: root_folder: "../data/cityscapes/calibration_images/"
Important note for ENet float.prototxt: the "UpsamplingBilinear2d_x" layers have been changed to "DeephiResize" type because decent doesn't support bilinear upsampling with the deconvolution layer
You can use these prototxt files directly if the differences mentioned above are the only deltas between your train_val.prototxt file and float.prototxt. Otherwise, if you are deploying the encoder model only or a modified version, you will need to update your train_val.prototxt to accommodate for the differences mentioned above, rename that file to "float.prototxt", and place it in the correct directory.
The data listed in the calibration.txt file calls out the following 1000 images:
You will need to copy these images or potentially create soft links from the dataset directories listed about to the Segment/DNNDK/data/cityscapes/calibration_images directory. You can use other calibration images if desired, however, the provided calibration.txt file uses the images listed above.
Next copy your latest trained model from Caffe into the Segment/DNNDK/model_subdirectory_name directory (or reuse the already populated float.caffemodel) and rename it "float.caffemodel". This model should be located wherever the snapshot was saved from the the training step.
Next run the quantization tools using the following command:
./quantize_and_compile.sh
If you open the script, you will see the following contents which indicate several things - first of all, you should make sure the GPUID environment variable is set correctly for your machine. If you have only one GPU, this should be '0', otherwise, please change this to the index for the desired GPU to use for quantization.
Secondarily, there is a Segment/DNNDK/decent/setup_decent_q.sh script being called which checks your nVidia environment and uses the correct local decent_q executable for quantization. The reason for this is that at the time this tutorial was authored, the public version of decent did not yet have the capability to perform testing on the floating point models, so this version of decent_q has been provided with this tutorial to enable mIOU testing for both the floating point and quantized models.
Next, you can see that decent_q_segment quantize is called with various arguments including calibration iterations, GPUID, paths to the input and output models, and a tee to dump the output to a text file in the decent_output directory.
For reference, I have included an enet decent log file and espent decent log file that shows the output of my console after running the decent command. You should see something similar after running the command on your machine.
Finally, the dnnc command is called which compiles the floating point model and produces a file called "dpu_segmentation_0.elf" under the dnnc_output directory.
For reference, I have included an enet dnnc log file and espent dnnc log file that shows the output of my console after the dnnc command is run. You should see something similar after running the command on your machine.
#!/usr/bin/env bash
export GPUID=0
net=segmentation
source ../decent/setup_decent_q.sh
#working directory
work_dir=$(pwd)
#path of float model
model_dir=decent_output
#output directory
output_dir=dnnc_output
echo "quantizing network: $(pwd)/float.prototxt"
./../decent/decent_q_segment quantize \
-model $(pwd)/float.prototxt \
-weights $(pwd)/float.caffemodel \
-gpu $GPUID \
-calib_iter 1000 \
-output_dir ${model_dir} 2>&1 | tee ${model_dir}/decent_log.txt
echo "Compiling network: ${net}"
dnnc --prototxt=${model_dir}/deploy.prototxt \
--caffemodel=${model_dir}/deploy.caffemodel \
--output_dir=${output_dir} \
--net_name=${net} --dpu=4096FA \
--cpu_arch=arm64 2>&1 | tee ${output_dir}/dnnc_log.txt
At this point, an elf file should have been created in the dnnc_output directory which can be used in the final step which is to run the models on the ZCU102. If desired, you can also proceed to the Part 3 of 4.0 which is testing the floating point and quantized models.
As mentioned in the previous section, files have been provided under the Segment/DNNDK/model_subdirectory_name
filepath which can enable you to rapidly test the mIOU of both the floating point model as well as the quantized model on the cityscapes validation dataset. In order to perform this testing, perform the following steps:
Open the Segment/DNNDK/data/val_img_seg_nomap.txt
file with a text editor.
Notice that this file contains paths to the cityscapes validation dataset as they are stored on my local machine. The left column has a path to the input image, and the right column has a path to the labels. You need to modify the root directory portion of both paths to point to the location of the cityscapes dataset on your machine.
Open the float_test.prototxt file that corresponds to the model of interest. Notice that there are several differences between this file and the float.prototxt that was used for deployment. The reason for this is that the DeephiResize layer causes some problems in the current version of decent which will prevent dnnc from compiling the model (it causes the input layer to be renamed to "resize_down" which causes dnnc to fail- for this reason two separate files are used, one for testing and one for deployment).
The new additions to this model are to support the auto_test and test decent_q commands:
The input size of the model has been changed from 512x256 to 1024x512. This is because the larger input size produces better mIOU results. It would be possible to use other sizes such as the native input size for the citysacpes dataset which is 2048x1024, but testing the models would take longer and the Unet-full model will not work in this case because of some limitations on the Caffe distribution used within the decent_q tool. Additionally, the models were trained with an input crop size of 512, so it is not necessarily expected that using the larger size will produce better results.
An additional input layer "ImageSegData" has been added which has a path to the val_img_seg_nomap.txt file. This is how the labels and input images are supplied for the testing procedure.
A layer after this called "resize_down" has been added to scale the input image to the desired input size for the model (in this case 1024x512).
4. Open the one of the test_float_and_quantized.sh
scripts. The contents of this script are shown below. You will only need to edit the GPUID to specify the correct GPU index for your tests. Note that the log files will be captured under the test_results subdirectory for both the floating point and quantized results.
export GPUID=0
export WKDIR=`pwd`
cd ../decent
source setup_decent_q.sh
cd $WKDIR
./../decent/decent_q_segment test -model float_test.prototxt -weights float.caffemodel -test_iter 500 -gpu $GPUID 2>&1 | tee test_results/float_model_test.txt
#working directory
work_dir=$(pwd)
#path of float model
model_dir=${work_dir}
#output directory
output_dir=${work_dir}/decent_output
./../decent/decent_q_segment quantize \
-model ${model_dir}/float_test.prototxt \
-weights ${model_dir}/float.caffemodel \
-gpu $GPUID \
-calib_iter 1000 \
-test_iter 500 \
-auto_test \
-output_dir ${output_dir} 2>&1 | tee test_results/quantized_model_test.txt
5. Execute the Script by running the following command. This will take some time which varies depending on the GPU hardware available as well as which model is being run. I have included example test results from a previous run under the associated model directories such as Segment/DNNDK/FPN/test_results. Note that the previous run results I have included does not necessarily represent the best performing snapshot - it is just an example of the output of running the test script.
./test_float_and_quantized.sh
At this point, the quantized and floating point models have been fully verified on the host and you are ready to proceed to deploying the models to the target hardware, however, if you skipped the section on pre-trained models you may be wondering how they scored. Jump back up 3.1.0 About the Pre-Trained Models to see the results.
Jon Cory is located near Detroit, Michigan and serves as an Automotive focused Machine Learning Specialist Field Applications Engineer (FAE) for AMD. Jon’s key roles include introducing AMD ML solutions, training customers on the ML tool flow, and assisting with deployment and optimization of ML algorithms in AMD devices. Previously, Jon spent two years as an Embedded Vision Specialist (FAE) with a focus on handcrafted feature algorithms, Vivado HLS, and ML algorithms, and six years prior as an AMD Generalist FAE in Michigan/Northern Ohio. Jon is happily married for four years to Monica Cory and enjoys a wide variety of interests including music, running, traveling, and losing at online video games.