RankingMetric

Base class for ranking metrics

RankingMetric

 RankingMetric (name:str,
                prompt:str|ragas_experimental.prompt.base.Prompt,
                llm:ragas_experimental.llm.llm.RagasLLM, num_ranks:int)

Example usage

from ragas_experimental.llm import ragas_llm
from openai import OpenAI

llm = ragas_llm(provider="openai",model="gpt-4o",client=OpenAI())

my_ranking_metric = RankingMetric(
    name='response_ranking',
    llm=llm,  # Your language model instance
    prompt="Rank the following responses:\n{candidates}",
    num_ranks=3,
)

# To score a single input (ranking candidate responses)
result = my_ranking_metric.score(candidates=[
    "short answer.",
    "a bit more detailed.",
    "the longest and most detailed answer."
],n=3)
print(result)   # Might output something like: [1, 0, 2]
print(result.reason)  # Provides the reasoning behind the ranking

[2, 1, 0]
Ensemble ranking based on multiple evaluations.
The ranking is based on the length and detail of each response. 'the longest and most detailed answer.' is the most comprehensive, followed by 'a bit more detailed.', and 'short answer.' is the briefest.
The ranking is based on the length and detail of each response. The response 'the longest and most detailed answer.' is ranked highest (2) because it is the most detailed, followed by 'a bit more detailed.' (1), and finally 'short answer.' (0) as it is the least detailed.
The responses are ranked based on the level of detail and length. 'short answer.' is the least detailed, 'a bit more detailed.' provides more information, and 'the longest and most detailed answer.' offers the most comprehensive explanation.

Custom ranking metric

from ragas_experimental.metric import MetricResult

@ranking_metric(
    llm=llm,  # Your language model instance
    prompt="Rank the following responses:\n{candidates}",
    name='new_ranking_metric',
    num_ranks=3
)
def my_ranking_metric(llm, prompt, **kwargs):
    # Your custom logic that calls the LLM and returns a tuple of (ranking, reason)
    # For example, process the prompt (formatted with candidates) and produce a ranking.
    ranking = [1, 0, 2]  # Dummy ranking: second candidate is best, then first, then third.
    reason = "Ranked based on response clarity and detail."
    return MetricResult(result=ranking, reason=reason)

# Using the decorator-based ranking metric:
result = my_ranking_metric.score(candidates=[
    "Response A: short answer.",
    "Response B: a bit more detailed.",
    "Response C: the longest and most detailed answer."
])
print(result)   # E.g., [1, 0, 2]
print(result.reason)  # E.g., "Ranked based on response clarity and detail."

[1, 0, 2]
Ranked based on response clarity and detail.