Extracting several kinds of visual representations from videos.
The following frame-level (*_features) and video-level (*_gloabal) visual representations are supported:
cnn_features, cnn_globals, cnn_sem_globals)c3d_features, c3d_globals, i3d_features, i3d_globals)eco_features, eco_globals, eco_sem_features, eco_sem_globals)tsm_features, tsm_globals, tsm_sem_features, tsm_sem_globals)Note: *_sem_* representations are based on the classification level (probability distribution) of respective models.
This package has been tested for extracting visual representations from videos of the following video-caption datasets: