pysoccer.algorithms.playerank.models package¶

Submodules¶

pysoccer.algorithms.playerank.models.Clusterer module¶

class pysoccer.algorithms.playerank.models.Clusterer.Clusterer(*args: Any, **kwargs: Any)¶

Bases: sklearn.base., sklearn.base.

Performance clustering

Attributes:

cluster_centers_array, [n_clusters, n_features]: Coordinates of cluster centers
n_clusters_int: number of clusters found by the algorithm
labels_: Labels of each point
k_rangetuple: minimum and maximum number of clusters to try
verboseboolean: whether or not to show details of the execution
random_stateint: RandomState instance or None, optional, default: None If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by ‘np.random’.
sample_size: None
kmeansscikit-learn KMeans object: None

Parameters

k_range – tuple (pair) the minimum and the maximum $k$ to try when choosing the best value of $k$ (the one having the best silhouette score)
border_threshold – float the threshold to use for selecting the borderline. It indicates the max silhouette for a borderline point.
verbose – boolean verbosity mode. default: False
random_state – int RandomState instance or None, optional, default: None If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
sample_size – int the number of samples (rows) that must be used when computing the silhouette score (the function silhouette_score is computationally expensive and generates a Memory Error when the number of samples is too high) default: 10000
max_rows – int the maximum number of samples (rows) to be considered for the clustering task (the function silhouette_samples is computationally expensive and generates a Memory Error when the input matrix have too many rows) default: 40000

fit(player_ids, match_ids, dataframe, y=None, kind='single', filename='clusters')¶

Compute performance clustering.

Parameters

X – array-like or sparse matrix, shape=(n_samples, n_features) Training instances to cluster.
kind – str single: single cluster multi: multi cluster
y – ignored

get_clusters_matrix(kind='single')¶

predict(X, y=None)¶

Predict the closest cluster each sample in X belongs to. In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.

Parameters: X – {array-like, sparse matrix}, shape = [n_samples, n_features] New data to predict.
Returns: multi_labels: array, shape [n_samples,] Index of the cluster each sample belongs to.

pysoccer.algorithms.playerank.models.Clusterer.scalable_silhouette_samples(X, labels, metric='euclidean', n_jobs=1, **kwds)¶

Compute the Silhouette Coefficient for each sample. The Silhoeutte Coefficient is a measure of how well samples are clustered with samples that are similar to themselves. Clustering models with a high Silhouette Coefficient are said to be dense, where samples in the same cluster are similar to each other, and well separated, where samples in different clusters are not very similar to each other. The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is $(b - a) / max(a, b)$. This function returns the Silhoeutte Coefficient for each sample. The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters.

Parameters

X – array [n_samples_a, n_features] Feature array.
labels – array, shape = [n_samples] label values for each sample
metric – string, or callable The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by metrics.pairwise.pairwise_distances. If X is the distance array itself, use “precomputed” as the metric.
**kwds – optional keyword parameters Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.

Returns

silhouette : array, shape = [n_samples] Silhouette Coefficient for each samples.

References Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65. doi:10.1016/0377-0427(87)90125-7. http://en.wikipedia.org/wiki/Silhouette_(clustering)

pysoccer.algorithms.playerank.models.Clusterer.scalable_silhouette_score(X, labels, metric='euclidean', sample_size=None, random_state=None, n_jobs=1, **kwds)¶

Compute the mean Silhouette Coefficient of all samples. The Silhouette Coefficient is compute using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is $(b - a) / max(a, b)$. To clarify, b is the distance between a sample and the nearest cluster that b is not a part of. This function returns the mean Silhoeutte Coefficient over all samples. To obtain the values for each sample, it uses silhouette_samples. The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values generally indicate that a sample has been assigned to the wrong cluster, as a different cluster is more similar.

Parameters

X – array [n_samples_a, n_features] the Feature array.
labels – array, shape = [n_samples] label values for each sample
metric – string, or callable The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by metrics.pairwise.pairwise_distances. If X is the distance array itself, use “precomputed” as the metric.
sample_size – int or None The size of the sample to use when computing the Silhouette Coefficient. If sample_size is None, no sampling is used.
random_state – integer or numpy.RandomState, optional The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
**kwds – optional keyword parameters Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.

Returns

silhouette: float the Mean Silhouette Coefficient for all samples.

References Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65. doi:10.1016/0377-0427(87)90125-7. http://en.wikipedia.org/wiki/Silhouette_(clustering)

pysoccer.algorithms.playerank.models.Rater module¶

class pysoccer.algorithms.playerank.models.Rater.Rater(alpha_goal=0.0)¶

Bases: object

Performance rating

Attributes:

ratings_numpy array: the ratings of the performances

Parameters: alpha_goal – float importance of the goal in the evaluation of performance, in the range [0, 1] default=0.0

get_rating(weighted_sum, goals)¶

predict(dataframe, goal_feature, score_feature, filename='ratings')¶

Compute the rating of each performance in X

Parameters

dataframe – dataframe of playerank scores
goal_feature – column name for goal scored dataframe column
score_feature – column name for playerank score dataframe column

Returns

ratings_: numpy array

pysoccer.algorithms.playerank.models.Weighter module¶

class pysoccer.algorithms.playerank.models.Weighter.Weighter(*args: Any, **kwargs: Any)¶

Bases: sklearn.base.

Automatic weighting of performance features

Attributes:

feature_names_array, [n_features]: names of the features
label_type_str: the label type associated to the game outcome. options: w-dl (victory vs draw or defeat), wd-l (victory or draw vs defeat), w-d-l (victory, draw, defeat)
clf_LinearSVC object: the object of the trained classifier
weights_array, [n_features]: weights of the features computed by the classifier
random_state_int: RandomState instance or None, optional, default: None If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by ‘np.random’.

Parameters

label_type – str the label type associated to the game outcome. options: w-dl (victory vs draw or defeat), wd-l (victory or draw vs defeat), w-d-l (victory, draw, defeat)
random_state – int RandomState instance or None, optional, default: None If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

fit(dataframe, target, scaled=False, var_threshold=0.001, filename='weights.json')¶

Compute weights of features.

Parameters

dataframe – pandas DataFrame a dataframe containing the feature values and the target values
target – str a string indicating the name of the target variable in the dataframe
scaled – boolean True if X must be normalized, False otherwise (optional)
filename – str the name of the files to be saved (the json file containing the feature weights) default: “weights”

get_feature_names()¶

get_weights()¶

pysoccer.algorithms.playerank.models package¶

Submodules¶

pysoccer.algorithms.playerank.models.Clusterer module¶

pysoccer.algorithms.playerank.models.Rater module¶

pysoccer.algorithms.playerank.models.Weighter module¶

Module contents¶