-
-
Notifications
You must be signed in to change notification settings - Fork 168
/
fifa.Rd
47 lines (44 loc) · 1.58 KB
/
fifa.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_fifa.R
\docType{data}
\name{fifa}
\alias{fifa}
\title{FIFA 20 preprocessed data}
\format{
a data frame with 5000 rows, 42 columns and rownames
}
\source{
The \code{players_20.csv} dataset was downloaded from the Kaggle site and went through few transformations.
The complete dataset was obtained from
\url{https://www.kaggle.com/stefanoleone992/fifa-20-complete-player-dataset#players_20.csv} on January 1, 2020.
}
\usage{
data(fifa)
}
\description{
The \code{fifa} dataset is a preprocessed \code{players_20.csv} dataset which comes as
a part of "FIFA 20 complete player dataset" at Kaggle.
}
\details{
It contains 5000 'overall' best players and 43 variables. These are:
\itemize{
\item short_name (rownames)
\item nationality of the player (not used in modeling)
\item overall, potential, value_eur, wage_eur (4 potential target variables)
\item age, height, weight, attacking skills, defending skills, goalkeeping skills (37 variables)
}
It is advised to leave only one target variable for modeling.
Source: \url{https://www.kaggle.com/stefanoleone992/fifa-20-complete-player-dataset}
All transformations:
\enumerate{
\item take 43 columns: \code{[3, 5, 7:9, 11:14, 45:78]} (R indexing)
\item take rows with \code{value_eur > 0}
\item convert \code{short_name} to ASCII
\item remove rows with duplicated \code{short_name} (keep first)
\item sort rows on \code{overall} and take top \code{5000}
\item set \code{short_name} column as rownames
\item transform \code{nationality} to factor
\item reorder columns
}
}
\keyword{fifa}