The amount of digital videos being created is increasing exponentially, e.g., YouTube has reached the upload rate of 100 hours of video per minute. A great deal of this growth is due to the tremendous popularity of smartphones and ubiquitous Internet access. This means that amateur-user generated videos form the new trend in content generation. Thus, there is an immediate need for robust algorithms to automatically analyze and retrieve these videos. On the other hand, many computer vision problems are data-driven and the existence of representative and realistic datasets are necessary for developing robust algorithms. Therefore, we present a highly unconstrained dataset of sports videos, called Sport Videos in the Wild (SVW).
SVW is comprised of 4200 videos captured solely with smartphones by users of Coach’s Eye smartphone app, a leading app for sports training developed by TechSmith corporation. SVW includes 30 categories of sports and 44 different actions. Due to imperfect practice of amateur players and unprofessional capturing by amateur users, SVW is very challenging for automated analysis.
Potential applications of SVW include: genre categorization, action recognition, action detection, and spatio-temporal alignment.
Labelling
- Each video is annotated with the sport genre. In addition, for 40% of the video, time span of each action and a bounding box showing the spatial extent of the action at the start and end frame of the action is also specified.
-
In SVW, unlike existing datasets, there are multiple actions from the same sport genre, making appearance-based recognition infeasible.
Annotated actions categories ([343,
359, Forearm], [380, 400, Set],
[438, 454, Spike]) within a video from
Volleyball genre category.
Comparison with existing datasets
Dataset | Purpose | Categ. # | Clip # |
Avg. length |
Unconst. actions |
Unconst. capturing |
Camera vibration |
Orientation | Sources |
KTH | AR | 6 | 100 | NA | No | No | No | Landscape | Staged |
Weizmann | AR | 9 | 9 | NA | No | No | No | Landscape | Staged |
IXMAS | AR | 11 | 30 | NA | No | No | No | Landscape | Staged |
UCF Sports | AR | 9 | 14+ | NA | Yes | No | No | Landscape | Broadcast TV |
Olympic | AR | 16 | 50 | NA | Yes | No | No | Landscape | YouTube |
Hollywood2 | AR SU |
A: 12 S: 10 |
61+ 62+ |
NA | Yes | No | No | Landscape | Movies |
UCF50 | AR | 50 | 100+ | NA | Yes | No | Slight | Landscape | YouTube |
HMDB | AR | 51 | 101+ | NA | Yes | No | Slight | Landscape | Movies & Internet |
UCF101 | AR | 101 | 100+ | 7.2 | Yes | No | Slight | Landscape | YouTube |
THUMOS | AR/AD | 101 | 100+ | NA | Yes | No | Slight | Landscape | YouTube |
SVW | AR/AD GC |
A:44 G: 30 |
50+ 110+ |
15.1 | Yes | Yes | Yes | Landscape & Portrait | Smartphone & Tablet |
Statistics
Evaluation protocol
-
The genre categorization accuracy is used as the performance metric and is defined as the fraction of testing videos whose genres are correctly classified.Three splits of 70% training and 30% testing are generated for this purpose.
For questions regarding this dataset please contact Morteza Safdarnejad (safdarne [at] egr.msu.edu).