Learn how to extend NeMo Curator by adding custom code to a new or existing stage.
The NeMo Curator container includes a robust set of default pipelines with commonly used stages. If they do not meet your requirements, extend them with your own modules.
Before you begin adding custom code, make sure that you have:
Create a custom_code directory anywhere on your system to organize your custom pipeline code.
Create a new folder for your environment, for example: new_stage/.
Create a new file, for example my_file.py. This file must define a class (MyClass) made available for import.
Import the class in your stage or pipeline code to use it.
Save the files.
Create or edit a stage to use your code, then assemble a pipeline and run it in Python:
To containerize, use a Dockerfile to copy your code and install dependencies, then build and run with your preferred tooling. Prefer aligning packages with optional extras in pyproject.toml.
Now that you have created custom code, you can create a custom stage that uses your code.