API Reference

API Reference for RobustRankAggregPy

Functions for performing rank aggregation

robustrankaggregpy.aggregate_ranks.aggregate_ranks(rank_lists: list[list[Hashable]] | None = None, rank_matrix: DataFrame | None = None, ranked_elements: int | None = None, method: Literal['rra', 'min', 'geom-mean', 'mean', 'median', 'stuart'] = 'rra', full: bool = False, exact: bool = False, top_cutoff: ndarray[Tuple[int], dtype[float32 | float64]] | _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None) Series

Aggregate ranked lists

Parameters:
  • rank_lists (list of list of Hashable) – The ranked lists to aggregate, each list should be ordered from lowest rank to highest (i.e. from rank 1 to n)

  • rank_matrix (pd.DataFrame, optional) – The ranking in a matrix format (like that from create_rank_matrix), by default the rank_lists will be automatically converted into this form

  • ranked_elements (int, optional) – The number of ranked elements, by default it is calculated as the number of unique elements in the input rankings

  • method (Literal 'rra' or 'min' or 'geom-mean' or 'mean' or 'median' or 'stuart', default='rra') – The method to use for aggregating the ranks

  • full (bool, default=False) – Whether the full rankings are given

  • exact (bool, default=False) – Whether the exact p-value should be calculated based on the rho scores

  • top_cutoff (FloatMatrix1D or ArrayLike, optional) – The cutoff values used to limit the number of elements in each of the input lists, should be the proportion of the list which is provided (so if there are 1000 elements being ranked, but the first list is limited to 100, the second to 200, and the third to 900, this should be [0.1, 0.2, 0.9]).

Returns:

aggregated_ranks – The aggregated ranks, the index is the elements in rank order, and the values are the associated score or p-value

Return type:

pd.Series

Note

The top_cutoff parameter is the proportion

robustrankaggregpy.aggregate_ranks.beta_scores(rank_vector: ndarray[Tuple[int], dtype[float32 | float64]]) ndarray[Tuple[int], dtype[float32 | float64]]

Calculate beta scores for a normalized rank vector

Parameters:

rank_vector (FloatMatrix1D) – Vector of normalized ranks (aka rank ratios), should be values in the range [0,1]

Returns:

scores – p-values calculated using the beta distribution

Return type:

FloatMatrix1D

robustrankaggregpy.aggregate_ranks.correct_beta_pvalues(pvalues: ndarray[Tuple[int], dtype[float32 | float64]], k: int) float
robustrankaggregpy.aggregate_ranks.correct_beta_pvalues_exact(pvalues: ndarray[Tuple[int], dtype[float32 | float64]], k: int) float
robustrankaggregpy.aggregate_ranks.create_rank_matrix(rank_lists: list[list[Hashable]], ranked_elements: int | list[int] | None = None, full: bool = False) DataFrame

Create a rank matrix by converting a list of rank lists into a dataframe

Parameters:
  • rank_lists (list of lists of Hashable) – The rank lists to create the rank matrix from. Each list should be in order starting from rank 1 (the first element)

  • ranked_elements (int or list of int, optional) – The total number of elements being ranked, if not provided will use the number of unique elements found across all rank lists. If a single int, that is used as the total number of elements for all the lists, if a list should be the same length as rank_lists, and each list will have a different total number of elements specified.

  • full (bool, default=False) – Whether the given ranks are complete

Returns:

rank_matrix – The rank matrix describing the ranks across the lists, with a column for each input list, and a row for each unique element found in any list

Return type:

pd.DataFrame

robustrankaggregpy.aggregate_ranks.q_stuart(row: ndarray[Tuple[int], dtype[float32 | float64]]) float

Calculate the Q-statistic for a single row of a matrix

Parameters:

row (1-D numpy NDArray) – The row to calculate the Q-statistic for

Returns:

q – The Q-statistic for the row

Return type:

float

robustrankaggregpy.aggregate_ranks.rank_matrix_from_df(data: DataFrame, total_elems: int | list[int] | Series | None = None, full: bool = False, ascending=False, rank_method: Literal['average', 'min', 'max', 'first', 'dense'] = 'max', **kwargs)

Create a rank matrix by ranking data in a DataFrame

Parameters:
  • data (pd.DataFrame) – The data to create a rank matrix from. The columns will be treated as the different rank lists.

  • total_elems (int or list of int, optional) – The total number of elements being ranked, if not provided will use the number of rows in the dataframe. If a single int, that is used as the total number of elements for all the lists, if a list should be the same length as the number of columns, and each column will have a different total number of elements specified. If a Series will be used directly to divide the columns to scale the ranks.

  • full (bool, default=False) – Whether the ranks are complete. If True treats the maximally ranked element in each column as the maximum possible rank, and missing data will be given a scaled rank of NaN. If False, all missing values will be treated as having the maximum possible rank.

  • ascending (bool, default=False) – Whether the values should be ranked in ascending order. If True, smaller data values will be given lower ranks.

  • rank_method ({'average', 'min', 'max', 'first', 'dense'}, default='max') – Method to use for ranking ties, see Pandas rank function for details

  • kwargs – Keyword arguments passed to pandas rank function

robustrankaggregpy.aggregate_ranks.rho_scores(r: ndarray[Tuple[int], dtype[float32 | float64]], top_cutoff: ndarray[Tuple[int], dtype[float32 | float64]] | None = None, exact: bool = False) float

Calculate the rho scores for a row of the rank matrix

Parameters:
  • r (FloatMatrix1D) – Normalized ranks to calculate the rho scores for (must be in range [0,1])

  • top_cutoff (FloatMatrix1D, optional) – Cutoff values used to limit the number of elements in the input lists

  • exact (bool) – Whether to calculate exact p-values (which is computationally expensive and unstable, and does not provide a lot of benefit in most cases)

Returns:

rho_score – The rho score of the normalized rank vector

Return type:

float

robustrankaggregpy.aggregate_ranks.stuart(rank_matrix: ndarray[Tuple[int, int], dtype[float32 | float64]])

Compute the Stuart ranks for each row in a 2-D numpy array

Parameters:

rank_matrix (2-D numpy NDArray) – The rank matrix to compute the Stuart-Aerts ranks for

Returns:

ranks – The ranks of the rows in the rank_matrix

Return type:

1-D numpy NDArray

robustrankaggregpy.aggregate_ranks.sum_stuart(v: ndarray[Tuple[int], dtype[float32 | float64]], r: float) float

Helper function for Stuart-Aerts method

Parameters:
  • v (1-D numpy array of floats) – The array to compute the Stuart-Aerts sum for

  • r (float) – The rank ratio to compute the Stuart-Aerts sum for

Returns:

The Stuart-Aerts sum of v with rank ratio r

Return type:

sum

robustrankaggregpy.aggregate_ranks.threshold_beta_score(scores: ndarray[Tuple[int], dtype[float32 | float64]], k: ndarray[Tuple[int], dtype[int32 | int64]] | None = None, n: int | None = None, sigma: ndarray[Tuple[int], dtype[float32 | float64]] | None = None)

Threshold the Beta Scores, used when the lists being aggregated are only the top sigma portion of the rankings

Parameters:
  • scores (FloatMatrix1D) – Beta scores to threshold

  • k (IntMatrix1D, optional)

  • n (int, optional)

  • sigma (FloatMatrix1D, optional) – The thresholds

Returns:

The thresholded beta scores

Return type:

FloatMatrix1D