nemo_curator.stages.text.utils.text_utils
nemo_curator.stages.text.utils.text_utils
Module Contents
Functions
Data
API
Returns a string including all coments
Parse Python source code from file or string and print docstrings.
For Chinese and Japanese text, we use external libraries to split the text because these languages are not separated by spaces. For all other languages, such as English, we assume words are separated by spaces.
Returns: A function which can be used to parse the words of a string into a list.
Parameters:
language
An ISO 639-1 language code. For example, “en” for English, “zh” for Chinese, and “ja” for Japanese.
Parse Python source code and yield a tuple of ast node instance, name, and docstring for each function/method, class and module.