Human motion analysis from UAV video

Asanka G. Perera (School of Engineering, University of South Australia, Mawson Lakes, Australia)
Yee Wei Law (School of Engineering, University of South Australia, Mawson Lakes, Australia)
Ali Al-Naji (School of Engineering, University of South Australia, Mawson Lakes, Australia) (Electrical Engineering Technical College, Middle Technical University, Baghdad, Iraq)
Javaan Chahl (School of Engineering, University of South Australia, Mawson Lakes, Australia) (Joint and Operations Analysis Division, Defence Science and Technology Group Melbourne, Fishermans Bend, Australia)

International Journal of Intelligent Unmanned Systems

ISSN: 2049-6427

Publication date: 16 April 2018



The purpose of this paper is to present a preliminary solution to address the problem of estimating human pose and trajectory by an aerial robot with a monocular camera in near real time.


The distinguishing feature of the solution is a dynamic classifier selection architecture. Each video frame is corrected for perspective using projective transformation. Then, a silhouette is extracted as a Histogram of Oriented Gradients (HOG). The HOG is then classified using a dynamic classifier. A class is defined as a pose-viewpoint pair, and a total of 64 classes are defined to represent a forward walking and turning gait sequence. The dynamic classifier consists of a Support Vector Machine (SVM) classifier C64 that recognizes all 64 classes, and 64 SVM classifiers that recognize four classes each – these four classes are chosen based on the temporal relationship between them, dictated by the gait sequence.


The solution provides three main advantages: first, classification is efficient due to dynamic selection (4-class vs 64-class classification). Second, classification errors are confined to neighbors of the true viewpoints. This means a wrongly estimated viewpoint is at most an adjacent viewpoint of the true viewpoint, enabling fast recovery from incorrect estimations. Third, the robust temporal relationship between poses is used to resolve the left-right ambiguities of human silhouettes.


Experiments conducted on both fronto-parallel videos and aerial videos confirm that the solution can achieve accurate pose and trajectory estimation for these different kinds of videos. For example, the “walking on an 8-shaped path” data set (1,652 frames) can achieve the following estimation accuracies: 85 percent for viewpoints and 98.14 percent for poses.



Perera, A., Law, Y., Al-Naji, A. and Chahl, J. (2018), "Human motion analysis from UAV video", International Journal of Intelligent Unmanned Systems, Vol. 6 No. 2, pp. 69-92.

Download as .RIS



Emerald Publishing Limited

Copyright © 2018, Emerald Publishing Limited

Please note you might not have access to this content

You may be able to access this content by login via Shibboleth, Open Athens or with your Emerald account.
If you would like to contact us about accessing this content, click the button and fill out the form.
To rent this content from Deepdyve, please click the button.