The amount of digital videos being created is increasing exponentially, e.g., YouTube has reached the upload rate of 100 hours of video per minute. A great deal of this growth is due to the tremendous popularity of smartphones and ubiquitous Internet access. This means that amateur-user generated videos form the new trend in content generation. Thus, there is an immediate need for robust algorithms to automatically analyze and retrieve these videos. On the other hand, many computer vision problems are data-driven and the existence of representative and realistic datasets are necessary for developing robust algorithms. Therefore, we present a highly unconstrained dataset of sports videos, called Sport Videos in the Wild (SVW).

SVW is comprised of 4200 videos captured solely with smartphones by users of Coach’s Eye smartphone app, a leading app for sports training developed by TechSmith corporation. SVW includes 30 categories of sports and 44 different actions.  Due to imperfect practice of amateur players and unprofessional capturing by amateur users, SVW is very challenging for automated analysis.

Potential applications of SVW include: genre categorization, action recognition, action detection, and spatio-temporal alignment.

Sample Frames


  • Each video is annotated with the sport genre. In addition, for 40% of the video, time span of each action and a bounding box showing the spatial extent of the action at the start and end frame of the action is also specified.


  • In SVW, unlike existing datasets, there are multiple actions from the same sport genre, making appearance-based recognition infeasible.

Volleyball Labels
Annotated actions categories ([343, 359, Forearm], [380, 400, Set], [438, 454, Spike]) within a video from Volleyball genre category.

Comparison with existing datasets

Dataset Purpose Categ. # Clip # Avg.
Orientation Sources
KTH AR 6 100 NA No No No Landscape Staged
Weizmann AR 9 9 NA No No No Landscape Staged
IXMAS AR 11 30 NA No No No Landscape Staged
UCF Sports AR 9 14+ NA Yes No No Landscape Broadcast TV
Olympic AR 16 50 NA Yes No No Landscape YouTube
Hollywood2 AR
A: 12
S: 10
NA Yes No No Landscape Movies
UCF50 AR 50 100+ NA Yes No Slight Landscape YouTube
HMDB AR 51 101+ NA Yes No Slight Landscape Movies & Internet
UCF101 AR 101 100+ 7.2 Yes No Slight Landscape YouTube
THUMOS AR/AD 101 100+ NA Yes No Slight Landscape YouTube
G: 30
15.1 Yes Yes Yes Landscape & Portrait Smartphone & Tablet


Video length

Camera Orientation


Evaluation protocol

  • The genre categorization accuracy is used as the performance metric and is defined as the fraction of testing videos whose genres are correctly classified.Three splits of 70% training and 30% testing are generated for this purpose.

For questions regarding this dataset please contact Morteza Safdarnejad (safdarne [at]

SVW Download

  • SVW videos and labels can be downloaded from here.


If you use SVW dataset, please refer to this paper in your publications:


  • Sports Videos in the Wild (SVW): A Video Dataset for Sports Analysis
    Seyed Morteza Safdarnejad, Xiaoming Liu, Lalita Udpa, Brooks Andrus, John Wood, Dean Craven
    Proc. International Conference on Automatic Face and Gesture Recognition (FG 2015), Ljubljana, Slovenia, May. 2015 (Acceptance rate 84/221 = 38%)
    Bibtex | PDF | Project Webpage
  • @inproceedings{ sports-videos-in-the-wild-svw-a-video-dataset-for-sports-analysis,
      author = { Seyed Morteza Safdarnejad and Xiaoming Liu and Lalita Udpa and Brooks Andrus and John Wood and Dean Craven },
      title = { Sports Videos in the Wild (SVW): A Video Dataset for Sports Analysis },
      booktitle = { Proc. International Conference on Automatic Face and Gesture Recognition },
      address = { Ljubljana, Slovenia },
      month = { May },
      year = { 2015 },