Human Action Recognition

Usman Khalid
Nov 30, 2017
2 min read

Human action recognition has recently get a lot of attention of researchers owing to its a lot of applications in security surveillance, humans behavior analysis, video content analysis. The idea behind this action recognition was to make the cameras intelligent enough to detect whats going around in the surroundings. In conventional ways, a security personnel do his duty for 24 hours a day just to monitor any anomalous activity. With this human action recognition, the job of the security personnel will be a lot easier when camera itself can monitor any anomalous activity of human. Also if you want to search for a particular scene in the whole movie, this human action recognition will make your life easier. Lets say you want to search for Robot dance scene in Real steel movie (which is by the way very cool scene), you just have to type Robotic dance in Real Steel movie, then this action recognition will do all the working for you and give you the relevant scene that you have asked for. Unfortunately, there is no such website currently which provides you with this facility but in recent future, you will find this kind of websites for sure.

A lot of researchers are using deep learning for this action recognition task. The most popular one and the stat-of-the-art method is the two stream method which is based on two deep constitutional neural networks and named as Temporal Segment Networks. In this method, one network named as Spatial ConvNet is trained on RGB images and the second network named as Temporal ConvNet is trained on Optical Flow. Spatial ConvNet extracts mostly the context information in the video and Temporal ConvNet keeps the motion information of the human. Once both networks are trained, then scores of each network corresponding to videos are fused to get the final results.

I used this Temporal Segment Networks and retrained it on UCF101 data set. UCF101 is the most popular dataset for action recognition tasks and researchers use it for bench marking. Then I added the visualization tool in it to visualize the results of network live. The video is recorded for the live evaluation of UCF101 data sets. 4 GPUs were used that evaluated each video parallel.

Muhammad Usman Khalid

Human Action Recognition

Comments