pysoccer.algorithms.playerank.models package¶
Submodules¶
pysoccer.algorithms.playerank.models.Clusterer module¶
-
class
pysoccer.algorithms.playerank.models.Clusterer.Clusterer(*args: Any, **kwargs: Any)¶ Bases:
sklearn.base.,sklearn.base.Performance clustering
Attributes:
- cluster_centers_array, [n_clusters, n_features]
Coordinates of cluster centers
- n_clusters_int
number of clusters found by the algorithm
- labels_
Labels of each point
- k_rangetuple
minimum and maximum number of clusters to try
- verboseboolean
whether or not to show details of the execution
- random_stateint
RandomState instance or None, optional, default: None If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by ‘np.random’.
- sample_size
None
- kmeansscikit-learn KMeans object
None
- Parameters
k_range – tuple (pair) the minimum and the maximum $k$ to try when choosing the best value of $k$ (the one having the best silhouette score)
border_threshold – float the threshold to use for selecting the borderline. It indicates the max silhouette for a borderline point.
verbose – boolean verbosity mode. default: False
random_state – int RandomState instance or None, optional, default: None If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
sample_size – int the number of samples (rows) that must be used when computing the silhouette score (the function silhouette_score is computationally expensive and generates a Memory Error when the number of samples is too high) default: 10000
max_rows – int the maximum number of samples (rows) to be considered for the clustering task (the function silhouette_samples is computationally expensive and generates a Memory Error when the input matrix have too many rows) default: 40000
-
fit(player_ids, match_ids, dataframe, y=None, kind='single', filename='clusters')¶ Compute performance clustering.
- Parameters
X – array-like or sparse matrix, shape=(n_samples, n_features) Training instances to cluster.
kind – str single: single cluster multi: multi cluster
y – ignored
-
get_clusters_matrix(kind='single')¶
-
predict(X, y=None)¶ Predict the closest cluster each sample in X belongs to. In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
- Parameters
X – {array-like, sparse matrix}, shape = [n_samples, n_features] New data to predict.
- Returns
multi_labels: array, shape [n_samples,] Index of the cluster each sample belongs to.
-
pysoccer.algorithms.playerank.models.Clusterer.scalable_silhouette_samples(X, labels, metric='euclidean', n_jobs=1, **kwds)¶ Compute the Silhouette Coefficient for each sample. The Silhoeutte Coefficient is a measure of how well samples are clustered with samples that are similar to themselves. Clustering models with a high Silhouette Coefficient are said to be dense, where samples in the same cluster are similar to each other, and well separated, where samples in different clusters are not very similar to each other. The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is $(b - a) / max(a, b)$. This function returns the Silhoeutte Coefficient for each sample. The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters.
- Parameters
X – array [n_samples_a, n_features] Feature array.
labels – array, shape = [n_samples] label values for each sample
metric – string, or callable The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by metrics.pairwise.pairwise_distances. If X is the distance array itself, use “precomputed” as the metric.
**kwds – optional keyword parameters Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.
- Returns
silhouette : array, shape = [n_samples] Silhouette Coefficient for each samples.
References Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65. doi:10.1016/0377-0427(87)90125-7. http://en.wikipedia.org/wiki/Silhouette_(clustering)
-
pysoccer.algorithms.playerank.models.Clusterer.scalable_silhouette_score(X, labels, metric='euclidean', sample_size=None, random_state=None, n_jobs=1, **kwds)¶ Compute the mean Silhouette Coefficient of all samples. The Silhouette Coefficient is compute using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is $(b - a) / max(a, b)$. To clarify, b is the distance between a sample and the nearest cluster that b is not a part of. This function returns the mean Silhoeutte Coefficient over all samples. To obtain the values for each sample, it uses silhouette_samples. The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values generally indicate that a sample has been assigned to the wrong cluster, as a different cluster is more similar.
- Parameters
X – array [n_samples_a, n_features] the Feature array.
labels – array, shape = [n_samples] label values for each sample
metric – string, or callable The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by metrics.pairwise.pairwise_distances. If X is the distance array itself, use “precomputed” as the metric.
sample_size – int or None The size of the sample to use when computing the Silhouette Coefficient. If sample_size is None, no sampling is used.
random_state – integer or numpy.RandomState, optional The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
**kwds – optional keyword parameters Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.
- Returns
silhouette: float the Mean Silhouette Coefficient for all samples.
References Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65. doi:10.1016/0377-0427(87)90125-7. http://en.wikipedia.org/wiki/Silhouette_(clustering)
pysoccer.algorithms.playerank.models.Rater module¶
-
class
pysoccer.algorithms.playerank.models.Rater.Rater(alpha_goal=0.0)¶ Bases:
objectPerformance rating
Attributes:
- ratings_numpy array
the ratings of the performances
- Parameters
alpha_goal – float importance of the goal in the evaluation of performance, in the range [0, 1] default=0.0
-
get_rating(weighted_sum, goals)¶
-
predict(dataframe, goal_feature, score_feature, filename='ratings')¶ Compute the rating of each performance in X
- Parameters
dataframe – dataframe of playerank scores
goal_feature – column name for goal scored dataframe column
score_feature – column name for playerank score dataframe column
- Returns
ratings_: numpy array
pysoccer.algorithms.playerank.models.Weighter module¶
-
class
pysoccer.algorithms.playerank.models.Weighter.Weighter(*args: Any, **kwargs: Any)¶ Bases:
sklearn.base.Automatic weighting of performance features
Attributes:
- feature_names_array, [n_features]
names of the features
- label_type_str
the label type associated to the game outcome. options: w-dl (victory vs draw or defeat), wd-l (victory or draw vs defeat), w-d-l (victory, draw, defeat)
- clf_LinearSVC object
the object of the trained classifier
- weights_array, [n_features]
weights of the features computed by the classifier
- random_state_int
RandomState instance or None, optional, default: None If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by ‘np.random’.
- Parameters
label_type – str the label type associated to the game outcome. options: w-dl (victory vs draw or defeat), wd-l (victory or draw vs defeat), w-d-l (victory, draw, defeat)
random_state – int RandomState instance or None, optional, default: None If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
-
fit(dataframe, target, scaled=False, var_threshold=0.001, filename='weights.json')¶ Compute weights of features.
- Parameters
dataframe – pandas DataFrame a dataframe containing the feature values and the target values
target – str a string indicating the name of the target variable in the dataframe
scaled – boolean True if X must be normalized, False otherwise (optional)
filename – str the name of the files to be saved (the json file containing the feature weights) default: “weights”
-
get_feature_names()¶
-
get_weights()¶