Steps to generate CSVs
Contents
Steps to generate CSVs#
Note
In case of value errors saying that ASR language model is not allowed to vary inside a
raw CSV file, run fix_varying_language_model.py before prepare_raw_data.py.
Prerequisites: python, bash, numpy, pandas, PyYAML.
The following steps generate the performance tables, given that the directory structure of --input_dir is correct:
cd docs/tabbed_tables_scripts/performance
python prepare_raw_data.py \
--input_dir "${PATH_TO_RESULTS_DIR}" \
--output_dir "${PATH_TO_FINAL_CSV_TABLES}" \
--tmp_dir_with_parsed_performance_data __tmp_dir_with_parsed_performance_data \
--keep_intermediate_parsed_data
python fill_performance_pages.py --metadata "${PATH_TO_FINAL_CSV_TABLES}/metadata.yaml"
Of the above parameters, only --input_dir is required.
Currently, for the above command to work, the --input_dir must have structure:
.
├── AWS
│ ├── g4dn.16xlarge
│ │ ├── ASR
│ │ │ └── All_CPU_Cores
│ │ │ ├── results_citrinet-1024_en-US_flashlight_offline.csv
│ │ │ ├── ...
│ │ │ └── results_conformer_en-US_flashlight_streaming.csv
│ │ ├── NLP
│ │ │ └── results.csv
│ │ └── TTS
│ │ ├── results_file_faspitch_hifigan.csv
│ │ ...
│ ├── ...
│ └── p4d.24xlarge
│ ├── ASR
│ ├── NLP
│ └── TTS
├── GCP
│ ├── a2-highgpu-1g_[a100_12vcpu]
│ ├── ...
│ └── n1-highmem-8_[v100_8vcpu]
└── on_prem
├── A10
│ ├── ASR
│ │ ├── 16_Cores
│ │ │ ├── results_conformer_en-US_flashlight_offline.csv
│ │ │ ├── results_conformer_en-US_flashlight_streaming-throughput.csv
│ │ │ └── results_conformer_en-US_flashlight_streaming.csv
│ │ ├── 32_Cores
│ │ ├── 64_Cores
│ │ └── All_CPU_Cores
│ │ ├── results_citrinet-1024_de-DE_flashlight_streaming.csv
│ │ ├── ...
│ │ └── results_quartznet_en-US_os2s_streaming.csv
│ ├── NLP
│ │ └── results.csv
│ └── TTS
│ └── results_file_faspitch_hifigan.csv
├── ...
└── V100
Please note that results on embedded devices are provided in a different format. Before putting the results into
the above directory tree, you need to convert .csv files from Embedded format to the format which is similar to
the format of other .csv files. Use the following command:
python embedded_format_to_usual_format.py \
--input_dir "${PATH_TO_DIR_IN_EMBEDDED_FORMAT}" \
--output_dir "${PATH_TO_DIR_IN_USUAL_FORMAT}"
The above directory tree is composed of 4 PBR results:
PBR for On-Prem A100, A30, A10, V100, T4
PBR for On-Prem L4
PBR for AWS and GCP
PBR for results on Embedded results (JA0, JAX, JAX-NX GPUs in On-Prem directory)
Each directory level has to correspond to a specific parameter. In the example, the first level represents cloud
service: AWS, GCP, on_prem (no cloud). The second level in AWS and GCP directories is a cloud instance
(e.g. g4dn.16xlarge). In the on_prem directory the second level is the type of GPU (e.g. A10).
In AWS and GCP directories GPU type level is missing and in on_prem directory cloud instance level is
missing. You can find an explanation of missing levels in “How to deal with missing levels” section of this README.
NOTE: You can omit directories on a level if the whole level is kept. For example, your directory structure can be without
AWSandGCPdirectories, but in that case it must containon_premdirectory forcloudlevel to remain. Thus, when a new PBR results come, you DON’T have to search for previous PBR results to update the table.
The ASR directories are special because they always contain an additional directory level describing number
of used cores. No directory “number of cores” level is expected for NLP and TTS tasks.
It is ok for terminate directory with .csv files to contain an excess directory. Only a warning will be shown.
It is also fine if there are excess files on any level of results tree.
If parameter --output_dir of prepare_raw_data.py scripts is not set, then resulting CSV tables are saved in
../../perf_data/perfomance_tables.
--tmp_dir_with_parsed_performance_data directory contains intermediate parsed results. You can use its content for
debugging. If you do not provide --tmp_dir_with_parsed_performance_data, then it is created by the script
automatically and removed before the script exits.
The script fill_performance_tables.py updates markdown files
docs/source/asr/asr-performance-table.md,docs/source/asr/asr-cpu-effect-performance-table.md,docs/source/tts/tts-performance-table.mdwith new CSV files created byprepare_raw_data.py.
High level explanation of table preprocessing scripts#
prepare_raw_data.py#
prepare_raw_data.py parses CSV files in --input_dir directory tree and then saves ready performance tables in
the --output_dir together with a file metadata.yaml. An item in metadata.yaml contains a path to final
performance CSV file and a metadata section with several fields describing how the performance data in the table
was collected.
Table preparation is performed step by step and intermediate results are saved into
--tmp_dir_with_parsed_performance_data. If you provide the option --keep_intermediate_parsed_data, then
the intermediate results are not removed after preprocessing finishes.
Table preparation is governed by several constants defined inside config.py:
SupportedTasks- an enum which defines which tasks are supported by the system;TASK_NAMES- the names of the tasks in--input_dirdirectory tree;Level- an enum which defines supported levels in directory tree;LEVELS- a list of levels, which are expected in input directory tree from top to bottom. This level structure is acquired after filler directories are added (seeFILLER_DIRECTORIES_SCHEME) and directory tree is split into “all CPU cores” and “varying number of CPU cores” trees;ALL_CPU_CORES_PATTERN- a compiled pattern which must match “all CPU cores” directory names for ASR tasks;VARYING_CORES_PATTERN- a compiled pattern which must match “varying number of CPU cores” directory names for ASR tasks;RESULTS_FILE_PATTERN- a compiled pattern which must match raw results CSV files;FILLER_DIRECTORIES_SCHEME- allows to process an “asymmetric” directory tree--input_dir(if in a branch of the tree 1 or more levels are missing);PREPROCESSING_FUNCS_BY_TASK- how to preprocess a table. Sometimes a raw table needs cleanup or other small fixes.LEVEL_TO_EXPERIMENT_DESCRIPTION- how to preprocess a name of a directory to extract experiment description parameter from it. For example, number of CPU cores level directories match pattern'^[0-9]+_Cores$'. It is more convenient to work with an integer, than with a string containing excess suffix_Cores;Experiment{ASR,TTS,NLP}- enums of parameters of ASR, TTS, NLP experiments which are collected from raw CSV files. There are also parameters which are encoded in directory names (seeLevelandLEVELS). Fields of these enums are collected afterPREPROCESSING_FUNCS_BY_TASKwere applied.AdditionalInfo{ASR,TTS,NLP}- enums with additional info about experiments which is generated during final CSV preparation. For example,source_filecontaining raw CSV table;{ASR,TTS,NLP}ExperimentReplacementParameters,ReplacementParametersType,REPLACEMENT_EXPERIMENT_DESCRIPTION_FIELDSshow how to check that results from a new PBR contain same experiments as already present CSVs.{ASR,TTS,NLP}ExperimentReplacementParametersare similar toExperimentDescriptionbut contain fewer parameters. If an experiment from a new PBR has same “replacement parameters”, then all old CSVs with such “replacement parameters” are removed. It is done to avoid situations like “Steaming” mode is measured on version 2.8.0 and “Streaming-Throughput” measured on version 2.9.0. “Replacement parameters” also ensure that only 1 CSV file is suitable for a table tab.ASR_MODE_VALUES- a list of names of ASR modes;EXPERIMENT_DESCRIPTION_COLS_IN_PREPROCESSED_TABLES- a dictionary with column names from whichExperiment{ASR,TTS,NLP}fields are collected. It also verifies types of values taken from the columns;EXPERIMENT_INFO_USED_FOR_FILE_NAME_CREATION- contains names of fields ofExperimentDescription{ASR,TTS,NLP}and names of levels which are used in final CSV file names. There have to be enough parameters for all names to be unique;EXPERIMENT_DESCRIPTION_FIELDS_WHICH_ARE_ALLOWED_TO_VARY_INSIDE_PREPROCESSED_TABLE- contains sets ofExperimentDescription{ASR,TTS,NLP}field names which are allowed to vary inside a preprocessed CSV file. A file, in which an experiment description fields varies, is split into 2 or more separate files;LATENCY_COLUMNS_BY_TASK- compiled regex patterns which matches latency column names;THROUGHPUT_COLUMNS_BY_TASK- names of throughput columns;NUM_PARALLEL_COLUMN_NAMES_BY_TASK- columns which contain names of “# of streams” columns;MAX_NOT_SUSPICIOUS_OUTLIER_QUOTIENT- used for detecting unusually bad measurements which are counted and shown in the warning. TheMAX_NOT_SUSPICIOUS_OUTLIER_QUOTIENTshows maximum quotient of a measured value and the best value which was acquired under same conditions (Currently, 3 trials are made for all sets of parameters). For thoughtput the quotient is the largest value divided by checked value, and for latency the quotient is checked value divided by the smallest value.MERGE_STEPS- defines which tables will be merged (the second table is appended the first table);FINAL_CSV_SCHEMES- describes which columns which be added into a final CSV file and defines headers.
fill_performance_pages.py#
The fill_performance_pages.py script arranges CSV files prepared by prepare_raw_data.py into tabbed tables.
A tabbed table structure is inferred from tab level list (e.g. ASR_TABBED_TABLE_LEVELS for ASR task) and available
CSV files. If in a branch of tabbed table a level is missing (all values of metadata field are None),
then the level will be omitted in this branch (e.g. in 23.01 version there is no GPU type level for AWS and GCP).
Customization options:
TAB_LEVEL_SORT_FUNCTIONS- can be used for sorting of tabs (indocs/source/tabbed_tables_scripts/tabbed_tables.py);METADATA_VALUES_PRETTY- can be used for setting tab names (indocs/source/tabbed_tables_scripts/tabbed_tables.py);METADATA_KEYS_PRETTY- is for customizing metadata fields names inside the innermost tab. Currently, it is used only for ASR;
Adding a new parameter (level) to an input directory structure#
If you need to insert a level to an input directory structure, you will need to make several changes
in scripts config.py and fill_performance_pages.py. Let’s consider a case when before the insertion your
directory structure is following:
├── A100
│ ├── ASR
│ │ ├── 12_Cores
│ │ ...
│ │ └── All_CPU_Cores
│ ├── NLP
│ └── TTS
├── T4
│ ├── ASR
│ │ ├── 16_Cores
│ │ ...
│ │ └── All_CPU_Cores
│ ├── NLP
│ └── TTS
└── V100
├── ASR
│ ├── 12_Cores
│ ├── ...
│ └── All_CPU_Cores
├── NLP
└── TTS
and you need to add results on AWS. This means that there will be a new level cloud with to directories: AWS and
on_prem. The directory structure after the insertion is
├── aws
│ ├── A100
│ │ ├── ASR
│ │ │ ├── 12_Cores
│ │ │ ├── ...
│ │ │ └── All_CPU_Cores
│ │ ├── NLP
│ │ └── TTS
│ ├── T4
│ │ ├── ASR
│ │ │ ├── 16_Cores
│ │ │ ├── ...
│ │ │ └── All_CPU_Cores
│ │ ├── NLP
│ │ └── TTS
│ └── V100
│ ├── ASR
│ │ ├── 12_Cores
│ │ ├── ...
│ │ └── All_CPU_Cores
│ ├── NLP
│ └── TTS
└── on_prem
├── A100
│ ├── ASR
│ │ ├── 12_Cores
│ │ ├── ...
│ │ └── All_CPU_Cores
│ ├── NLP
│ └── TTS
├── T4
│ ├── ASR
│ │ ├── 16_Cores
│ │ ├── ...
│ │ └── All_CPU_Cores
│ ├── NLP
│ └── TTS
└── V100
├── ASR
│ ├── 12_Cores
│ ├── ...
│ └── All_CPU_Cores
├── NLP
└── TTS
For adding the cloud level to your tabbed tables you will need:
Inside config.py#
Add
cloudtoLevelEnum:
class Level(Enum):
...
cloud = auto()
Add the new level to
LEVELS.
LEVELS = (Level.cloud, Level.gpu_type, Level.task)
It is possible that one of “branches” of a directory tree lacks some levels (see current required structure of
--input_dir). In such case, consult the section “How to deal with missing levels” of this README.(optional) If you need to preprocess level directory name before using it as an experiment description field, then add a method for preprocessing into
LEVEL_TO_EXPERIMENT_DESCRIPTIONdictionary constant.
Inside fill_performance_pages.py#
Add
Level.cloud.nametoASR_TABBED_TABLE_LEVELSandTTS_TABBED_TABLE_LEVELS. The position ofLevel.cloud.nameinASR_TABBED_TABLE_LEVELSandTTS_TABBED_TABLE_LEVELSshows the level of cloud tab level in resulting tabbed tables.
ASR_TABBED_TABLE_LEVELS = [
Level.cloud.name, # cloud name is the topmost level of the ASR table
Level.gpu_type.name,
ExperimentASR.asr_acoustic_model.name,
ExperimentASR.language.name,
ExperimentASR.asr_mode.name
]
TTS_TABBED_TABLE_LEVELS = [
Level.gpu_type.name,
ExperimentTTS.model.name,
Level.cloud.name, # cloud name is the lowest level of the TTS table
]
Inside docs/source/tabbed_tables_scripts/tabbed_tables.py#
(optional) You can sort tabs by specifying a sorting key in a constant
TAB_SORT_FUNCTIONS.(optional) You can prettify names of cloud instances in the table by adding mappings to the
PRETTIFYING_TABS:
PRETTIFYING_TABS = {
...
'cloud': {'aws': 'AWS', 'on_prem': 'on Prem'},
...
}
How to add a new task#
Raw CSV files for different tasks vary, so quite a few things need to be added to config.py,
fill_performance_pages.py and docs/source/tabbed_tables_scripts/tabbed_tables.py scripts for processing of a
new task.
Below is a description of adding of a new task to the system. Just for convenience, let’s call this task NMT.
Inside config.py#
Add a new task to the enum
SupportedTasks.
class SupportedTasks(Enum):
...
nmt = auto()
Add a name of directory, which stores results for the new task, to
TASK_NAMESconstant. In directory tree passed toprepare_raw_data.py, all NMT results have to be inNMTdirectories.
TASK_NAMES = {..., SupportedTasks.nmt: 'NMT'}
Add enums
ExperimentNMTandAdditionalInfoNMT.ExperimentNMTcontains fields which allow to distinguish between different NMT experiments. Values for those fields will be taken from CSV files afterPREPROCESSING_FUNCS_BY_TASKare applied.AdditionalInfoNMTfields are auxiliary info.AdditionalInfoNMThas to containsource_filefield (for storing the path to a raw CSV file in which the data was originally before all preprocessing). If you need to add other fields to additional info, then it will take you to write a separate function for it as it was done formax_effective_number_of_streamsinAdditionalInfoASRenum.Add
ExperimentNMTandAdditionalInfoNMTcorrespondingly toExperimentDescriptionandAdditionalInfo:
ExperimentDescription = namedtuple(
"ExperimentDescription",
unite_enum_members([Level, ExperimentASR, ExperimentTTS, ExperimentNLP, ExperimentNMT]),
defaults=[None] * len(
set().union(
*[enum_.__members__ for enum_ in [Level, ExperimentASR, ExperimentTTS, ExperimentNLP, ExperimentNMT]]
)
),
)
AdditionalInfo = namedtuple(
"AdditionalInfo",
unite_enum_members([AdditionalInfoASR, AdditionalInfoTTS, AdditionalInfoNLP, AdditionalInfoNMT]),
defaults=[None] * len(
set().union(
*[
enum_.__members__
for enum_ in [AdditionalInfoASR, AdditionalInfoTTS, AdditionalInfoNLP, AdditionalInfoNMT]
]
)
),
)
Add NMT preprocessing functions to
PREPROCESSING_FUNCS_BY_TASK. It is recommended to addremove_spaces_near_commasfunction to the list.Specify types and column names for experiment description fields in
EXPERIMENT_DESCRIPTION_COLS_IN_PREPROCESSED_TABLESconstant. For every experiment description field, there has to be a dictionary with 2 keys:ColSpec.nameandColSpec.types.ColSpec.namevalue is a name of a column from which an experiment description field will be taken, andColSpec.typesvalue is a tuple of types which are allowed to be in this column. For more details please look up a comment aboveEXPERIMENT_DESCRIPTION_COLS_IN_PREPROCESSED_TABLESdeclaration.Specify which
ExperimentDescriptionfields will be used in file names of resulting CSV files inEXPERIMENT_INFO_USED_FOR_FILE_NAME_CREATION. There have to be enough field names to distinguish between any resulting tables. Elements ofLEVEL_NAMESare required inEXPERIMENT_INFO_USED_FOR_FILE_NAME_CREATION.Add
SupportedTasks.nmtitem toEXPERIMENT_DESCRIPTION_FIELDS_WHICH_ARE_ALLOWED_TO_VARY_INSIDE_PREPROCESSED_TABLE. Sometimes 1 raw CSV file contains results for several experiments. If an experiment description column in a raw CSV can vary, then you need to add a corresponding field name toEXPERIMENT_DESCRIPTION_FIELDS_WHICH_ARE_ALLOWED_TO_VARY_INSIDE_PREPROCESSED_TABLE. Please note, that splitting of CSV tables is performed after preprocessing.Add
SupportedTasks.nmtitem intoADDITIONAL_INFO_BY_TASKdictionary.Add
NMTExperimentReplacementParametersnamedtuple. Add this namedtuple toReplacementParametersTypeandREPLACEMENT_EXPERIMENT_DESCRIPTION_FIELDS.If the new task performance tables have latency columns, you need to add a regex for latency columns into
LATENCY_COLUMNS_BY_TASKdictionary.If the new task performance tables has a throughput column, then you need to add the name of this column into
THROUGHPUT_COLUMNS_BY_TASKdictionary.Performance result tables are expected to have “# of streams” column. Please, add the name of this column into
NUM_PARALLEL_COLUMN_NAMES_BY_TASKdictionary.Define a format of the output tables for NMT task in
FINAL_CSV_SCHEMESdictionary. Please see present descriptions and a comment aboveFINAL_CSV_SCHEMESfor more details.
Inside fill_performance_pages.py#
Declare NMT tabbed table level order. For this you will need to import
ExperimentNMTand define a constant which will list levels from top to bottom, e.g.
NMT_TABBED_TABLE_LEVELS = [Level.gpu_type.name, ...]
Any Level and ExperimentNMT element can be in NMT_TABBED_TABLE_LEVELS.
16. (optional) If some metadata fields are be used inside innermost tabs, then you may set pretty names for these
metadata fields in METADATA_KEYS_PRETTY constant.
17. Call a function create_tabbed_table() inside main(),
create_tabbed_table(
args.metadata,
Path(__file__).parent / '../../nmt/nmt-performance-table.md',
NMT_TABBED_TABLE_LEVELS,
SupportedTasks.nmt,
nmt_tab_creator,
{Level.task.name: TASK_NAMES[SupportedTasks.nmt]},
)
Write a function
nmt_tab_creator()which is responsible for rendering of innermost tabs in the NMT table.
def nmt_tab_creator(level_values: Dict[str, Any], metadata_file: Path, metadata: List[MetadataItemType]) -> str:
"""A function for creation of TTS tab. This function should be passed in ``inner_most_tab_creator`` parameter of
``create_nested_tabs()` function."""
relevant_items = extract_relevant_items_from_metadata(metadata, level_values, metadata_file)
raise_error_if_more_than_1_item(relevant_items, level_values, metadata_file)
return f'''.. csv-table::
:header-rows: 2
:file: {to_unix_path_str(build_path_to_performance_table(metadata_file.parent / relevant_items[0]['path']))}
'''
Inside docs/source/tabbed_tables_scripts/tabbed_tables.py#
(optional) Prettify tab names for NMT tasks in
PRETTIFYING_TABS.(optional) You may sort tabs in a tab level. For this you will need to provide sorting key in
TAB_SORT_FUNCTIONSconstant.
How to deal with missing levels#
Look at the following directory tree
.
├── AWS
│ ├── g4dn.16xlarge
│ │ ├── ASR
│ │ │ └── All_CPU_Cores
│ │ │ ├── results_citrinet-1024_en-US_flashlight_offline.csv
│ │ │ ├── ...
│ │ │ └── results_conformer_en-US_flashlight_streaming.csv
│ │ ├── NLP
│ │ │ └── results.csv
│ │ └── TTS
│ │ ├── results_file_faspitch_hifigan.csv
│ │ ...
│ ├── ...
│ └── p4d.24xlarge
│ ├── ASR
│ ├── NLP
│ └── TTS
├── GCP
│ ├── a2-highgpu-1g_[a100_12vcpu]
│ ├── ...
│ └── n1-highmem-8_[v100_8vcpu]
└── on_prem
├── A10
│ ├── ASR
│ │ ├── 16_Cores
│ │ │ ├── results_conformer_en-US_flashlight_offline.csv
│ │ │ ├── results_conformer_en-US_flashlight_streaming-throughput.csv
│ │ │ └── results_conformer_en-US_flashlight_streaming.csv
│ │ ├── 32_Cores
│ │ ├── 64_Cores
│ │ └── All_CPU_Cores
│ │ ├── results_citrinet-1024_de-DE_flashlight_streaming.csv
│ │ ├── ...
│ │ └── results_quartznet_en-US_os2s_streaming.csv
│ ├── NLP
│ │ └── results.csv
│ └── TTS
│ └── results_file_faspitch_hifigan.csv
├── ...
└── V100
on_prem directory contains GPU type directories, whereas AWS and GCP contain cloud instance
directories. If you intend to pass such directory tree into prepare_raw_data.py script, you need to specify
which levels are missing in FILLER_DIRECTORIES_SCHEME constant in config.py.
A FILLER_DIRECTORIES_SCHEME for the above case is
FILLER_DIRECTORIES_SCHEME = {
FillerScheme.all_present_directories: {FillerScheme.all_present_directories: FillerScheme.filler},
'on_prem': FillerScheme.filler,
}
The FILLER_DIRECTORIES_SCHEME constant shows where to insert “filler” directories so that level structure become
identical in all branches of the input tree. A key of a dictionary inside FILLER_DIRECTORIES_SCHEME can be:
FillerScheme.filler,FillerScheme.all_present_directories,a name of a directory from the processed directory tree.
If a value in a dictionary from FILLER_DIRECTORIES_SCHEME is not nest nested dictionary, then this value has to
be FillerScheme.filler.
If a key in a dictionary is FillerScheme.filler, there can be no other keys in the same dictionary. In the next
example to “filler” directories are inserted in 'AWS' branch.
FILLER_DIRECTORIES_SCHEME = {
FillerScheme.all_present_directories: {FillerScheme.all_present_directories: FillerScheme.filler},
'AWS': {FillerScheme.filler: {FillerScheme.all_present_directories: FillerScheme.filler}},
'on_prem': FillerScheme.filler,
}
If a key k of dictionary D is FillerScheme.all_present_directories, then the value corresponding to
k is for all directories which are not among D keys.
How to preprocess tables#
You can fix raw CSV files before metadata will be extracted from them. For this you need to create a function which
takes a path to a raw CSV file as an input
does necessary changes to the CSV file and after that saves it into the original place
Then you need to add the function into PREPROCESSING_FUNCS_BY_TASK constant in config.py script. The functions
in PREPROCESSING_FUNCS_BY_TASK are applied to CSV files in the order, in which they are listed.
How to edit headers of final tables#
A constant FINAL_CSV_SCHEMES from config.py contains info which columns are added to the final CSV file.
There are different headers for ASR CSVs depending on the mode (streaming or offline) and number of language models.
A header type is defined by HeaderDescriptionKeys.table_type_callback callback. In cases when a callback is used
header schemes are HeaderDescriptionKeys.headers item.
If all final CSVs belonging to a task have same headers, then a FINAL_CSV_SCHEMES[<task>] is a header scheme.
Headers can have 1 or 2 levels.
1 level header is a dictionary, which keys are columns added to the final CSV and values are names of these columns in the final CSV. A key of 1 level header can be a compiled regex matching column names. If a key is a regex, then a corresponding value must be a callable, which takes old column as input and returns a column name for final CSV.
2 level header is a dictionary, which keys are column names of top level of final CSV and values are 1 level headers.
You can make a column optional. If a corresponding column in a raw CSV is missing, then an optional
column is not added to the final CSV. To make this work, you need to replace a usual key with a tuple of 2 elements:
the first element is the key and the second is ColSpec.only_if_present_in_table. For example, see “speaker
diarization” columns in FINAL_CSV_SCHEMES. If in a 2 level header the top column is optional, its subcolumns cannot
be optional (all subcolumns become optional by default).
How to prettify a tab name in a final CSV header#
By default, metadata values serve as tab names. However, it is often not convenient. For example, values of
model metadata field for TTS task can be 'fastpitch-hifigan' and tacotron-waveglow. To improve tab
names, you can use PRETTIFYING_TABS in docs/source/tabbed_tables_scripts/tabbed_tables.py script. In
PRETTIFYING_TABS, you can provide a dictionary which maps metadata values to tab names.
You can also provide a function taking metadata value as an input and returning a tab name. For
num_cpu metadata field, it looks the following way:
def prettify_number_of_cores(n: int) -> str:
return f'{n} cores'
PRETTIFYING_TABS = {
SupportedTasks.asr: {
Level.num_cpu.name: prettify_number_of_cores,
...
},
...
}
How to sort tabs in tabbed tables#
For sorting tabs on a tab level, you need to set a sorting key in TAB_SORT_FUNCTIONS in
docs/source/tabbed_tables_scripts/tabbed_tables.py script. The sorting
is performed by Python built-in function sorted() applied to tuples
(<metadata_value>, <list_of_metadata_items_which_belong_to_the_tab>). Function in
TAB_SORT_FUNCTIONS is passed into sorted() in key parameter. If a sorting function is missing,
then key parameter of sorted() is lambda x: x[0].