nemo_curator.stages.deduplication.semantic.ranking
nemo_curator.stages.deduplication.semantic.ranking
Module Contents
Classes
API
Flexible ranking strategy that allows users to specify metadata columns and sorting order.
This design allows for extensible ranking based on any metadata columns with user-specified sorting criteria.
Create a metadata-based ranking strategy.
Parameters:
List of metadata column names to sort by (in priority order)
Boolean or list of booleans indicating sort order for each column
Random seed for reproducible results
Returns: RankingStrategy
RankingStrategy instance configured for metadata-based ranking
Create a random ranking strategy.
Parameters:
Random seed for reproducible results
Returns: RankingStrategy
RankingStrategy instance configured for random ranking
Rank cluster based on the specified strategy.