@inproceedings{KarpathyCVPR14, title = {Large-scale Video Classification with Convolutional Neural Networks}, author = {Andrej Karpathy and George Toderici and Sanketh Shetty and Thomas Leung and Rahul Sukthankar and Li Fei-Fei}, year = {2014}, booktitle = {CVPR} }
The Sports-1M dataset is licensed under Creative Commons 3.0 and contains 1,133,158 video URLs which have been annotated automatically with 487 Sports labels using the YouTube Topics API. To download the dataset, check out our Github Repository, or simply use:
$ git clone https://github.com/gtoderici/sports-1m-dataset.git
Then see the attached README for details.
Here is a visualization of some thumbnails for every one of the 487 classes (7MB html page).
Finer details about all videos as JSON (53MB zip). Example entry:
{ "stitle": "Improving Sprint Start Technique", "label487": [ 205 ], "thumbnail": "https://i1.ytimg.com/vi/Drdm1WsRQwA/hqdefault.jpg", "width": 640, "duration": 86, "height": 360, "id": "Drdm1WsRQwA", "source487": "train" },
A common question is how one can manage data of this scale. We'd like to note that the JSON information we release contains durations of all videos, so it is possible to filter to only videos below some threshold of duration. Another idea is to sample frames/segments from the videos right away and not store the full original files, or even further resize them right away to 227x277 in spatial resolution, for example. A large portion of the dataset (90%+) can thus add up to at most a few TB.
Example per-frame classification results overlayed on top of a video. Also available as a direct download (35mb).
Spatio-temporal features learned by the Slow Fusion network on the first layer. Compare to [Le et al. '11] unsupervised spatio-temporal features.
The full confusion matrix on the Sports-1M, and a few diagonal crops.