API Reference
API Reference for RobustRankAggregPy
Functions for performing rank aggregation
- robustrankaggregpy.aggregate_ranks.aggregate_ranks(rank_lists: list[list[Hashable]] | None = None, rank_matrix: DataFrame | None = None, ranked_elements: int | None = None, method: Literal['rra', 'min', 'geom-mean', 'mean', 'median', 'stuart'] = 'rra', full: bool = False, exact: bool = False, top_cutoff: ndarray[Tuple[int], dtype[float32 | float64]] | _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None) Series
Aggregate ranked lists
- Parameters:
rank_lists (list of list of Hashable) – The ranked lists to aggregate, each list should be ordered from lowest rank to highest (i.e. from rank 1 to n)
rank_matrix (pd.DataFrame, optional) – The ranking in a matrix format (like that from create_rank_matrix), by default the rank_lists will be automatically converted into this form
ranked_elements (int, optional) – The number of ranked elements, by default it is calculated as the number of unique elements in the input rankings
method (Literal 'rra' or 'min' or 'geom-mean' or 'mean' or 'median' or 'stuart', default='rra') – The method to use for aggregating the ranks
full (bool, default=False) – Whether the full rankings are given
exact (bool, default=False) – Whether the exact p-value should be calculated based on the rho scores
top_cutoff (FloatMatrix1D or ArrayLike, optional) – The cutoff values used to limit the number of elements in each of the input lists, should be the proportion of the list which is provided (so if there are 1000 elements being ranked, but the first list is limited to 100, the second to 200, and the third to 900, this should be [0.1, 0.2, 0.9]).
- Returns:
aggregated_ranks – The aggregated ranks, the index is the elements in rank order, and the values are the associated score or p-value
- Return type:
pd.Series
Note
The top_cutoff parameter is the proportion
- robustrankaggregpy.aggregate_ranks.beta_scores(rank_vector: ndarray[Tuple[int], dtype[float32 | float64]]) ndarray[Tuple[int], dtype[float32 | float64]]
Calculate beta scores for a normalized rank vector
- Parameters:
rank_vector (FloatMatrix1D) – Vector of normalized ranks (aka rank ratios), should be values in the range [0,1]
- Returns:
scores – p-values calculated using the beta distribution
- Return type:
FloatMatrix1D
- robustrankaggregpy.aggregate_ranks.correct_beta_pvalues(pvalues: ndarray[Tuple[int], dtype[float32 | float64]], k: int) float
- robustrankaggregpy.aggregate_ranks.correct_beta_pvalues_exact(pvalues: ndarray[Tuple[int], dtype[float32 | float64]], k: int) float
- robustrankaggregpy.aggregate_ranks.create_rank_matrix(rank_lists: list[list[Hashable]], ranked_elements: int | list[int] | None = None, full: bool = False) DataFrame
Create a rank matrix by converting a list of rank lists into a dataframe
- Parameters:
rank_lists (list of lists of Hashable) – The rank lists to create the rank matrix from. Each list should be in order starting from rank 1 (the first element)
ranked_elements (int or list of int, optional) – The total number of elements being ranked, if not provided will use the number of unique elements found across all rank lists. If a single int, that is used as the total number of elements for all the lists, if a list should be the same length as rank_lists, and each list will have a different total number of elements specified.
full (bool, default=False) – Whether the given ranks are complete
- Returns:
rank_matrix – The rank matrix describing the ranks across the lists, with a column for each input list, and a row for each unique element found in any list
- Return type:
pd.DataFrame
- robustrankaggregpy.aggregate_ranks.q_stuart(row: ndarray[Tuple[int], dtype[float32 | float64]]) float
Calculate the Q-statistic for a single row of a matrix
- Parameters:
row (1-D numpy NDArray) – The row to calculate the Q-statistic for
- Returns:
q – The Q-statistic for the row
- Return type:
float
- robustrankaggregpy.aggregate_ranks.rank_matrix_from_df(data: DataFrame, total_elems: int | list[int] | Series | None = None, full: bool = False, ascending=False, rank_method: Literal['average', 'min', 'max', 'first', 'dense'] = 'max', **kwargs)
Create a rank matrix by ranking data in a DataFrame
- Parameters:
data (pd.DataFrame) – The data to create a rank matrix from. The columns will be treated as the different rank lists.
total_elems (int or list of int, optional) – The total number of elements being ranked, if not provided will use the number of rows in the dataframe. If a single int, that is used as the total number of elements for all the lists, if a list should be the same length as the number of columns, and each column will have a different total number of elements specified. If a Series will be used directly to divide the columns to scale the ranks.
full (bool, default=False) – Whether the ranks are complete. If True treats the maximally ranked element in each column as the maximum possible rank, and missing data will be given a scaled rank of NaN. If False, all missing values will be treated as having the maximum possible rank.
ascending (bool, default=False) – Whether the values should be ranked in ascending order. If True, smaller data values will be given lower ranks.
rank_method ({'average', 'min', 'max', 'first', 'dense'}, default='max') – Method to use for ranking ties, see Pandas rank function for details
kwargs – Keyword arguments passed to pandas rank function
- robustrankaggregpy.aggregate_ranks.rho_scores(r: ndarray[Tuple[int], dtype[float32 | float64]], top_cutoff: ndarray[Tuple[int], dtype[float32 | float64]] | None = None, exact: bool = False) float
Calculate the rho scores for a row of the rank matrix
- Parameters:
r (FloatMatrix1D) – Normalized ranks to calculate the rho scores for (must be in range [0,1])
top_cutoff (FloatMatrix1D, optional) – Cutoff values used to limit the number of elements in the input lists
exact (bool) – Whether to calculate exact p-values (which is computationally expensive and unstable, and does not provide a lot of benefit in most cases)
- Returns:
rho_score – The rho score of the normalized rank vector
- Return type:
float
- robustrankaggregpy.aggregate_ranks.stuart(rank_matrix: ndarray[Tuple[int, int], dtype[float32 | float64]])
Compute the Stuart ranks for each row in a 2-D numpy array
- Parameters:
rank_matrix (2-D numpy NDArray) – The rank matrix to compute the Stuart-Aerts ranks for
- Returns:
ranks – The ranks of the rows in the rank_matrix
- Return type:
1-D numpy NDArray
- robustrankaggregpy.aggregate_ranks.sum_stuart(v: ndarray[Tuple[int], dtype[float32 | float64]], r: float) float
Helper function for Stuart-Aerts method
- Parameters:
v (1-D numpy array of floats) – The array to compute the Stuart-Aerts sum for
r (float) – The rank ratio to compute the Stuart-Aerts sum for
- Returns:
The Stuart-Aerts sum of v with rank ratio r
- Return type:
sum
- robustrankaggregpy.aggregate_ranks.threshold_beta_score(scores: ndarray[Tuple[int], dtype[float32 | float64]], k: ndarray[Tuple[int], dtype[int32 | int64]] | None = None, n: int | None = None, sigma: ndarray[Tuple[int], dtype[float32 | float64]] | None = None)
Threshold the Beta Scores, used when the lists being aggregated are only the top sigma portion of the rankings
- Parameters:
scores (FloatMatrix1D) – Beta scores to threshold
k (IntMatrix1D, optional)
n (int, optional)
sigma (FloatMatrix1D, optional) – The thresholds
- Returns:
The thresholded beta scores
- Return type:
FloatMatrix1D