Earlier this year, Google hosted a competition on Kaggle for YouTube video classification. Google provided 7 million videos with a total of 450,000 hours, which would be classified in 4716 categories. The third-placed team, a group of researchers from Tsinghua University and Baidu, have recently published their approach.
With a 7-layer deep LSTM architecture, an accuracy of 82.75% is achieved according to the used Global Average Precision metric.
The architecture of the temporal residual CNN used is as follows:
Source: Tsinghua University, 2017Zurück