Adding Custom Code#
Learn how to extend NeMo Curator by adding custom code to a new or existing stage.
The NeMo Curator container includes a robust set of default pipelines with commonly used stages. If they do not meet your requirements, extend them with your own modules.
Before You Start#
Before you begin adding custom code, make sure that you have:
Reviewed the pipeline concepts and diagrams.
A working NeMo Curator development environment.
Optionally prepared a container image that includes your dependencies.
Optionally created a custom environment to support your new custom code.
How to Add Custom Code#
Define New Functionality#
Create a
custom_code
directory anywhere on your system to organize your custom pipeline code.Create a new folder for your environment, for example:
new_stage/
.Create a new file, for example
my_file.py
. This file must define a class (MyClass
) made available for import.# your code here
Import the class in your stage or pipeline code to use it.
from my_code.my_file import MyClass ...
Save the files.
Use your code in a pipeline#
Create or edit a stage to use your code, then assemble a pipeline and run it in Python:
from nemo_curator.pipeline import Pipeline
from nemo_curator.stages.base import ProcessingStage
from nemo_curator.tasks.video import VideoTask
from nemo_curator.stages.video.io.video_reader import VideoReader
from nemo_curator.stages.video.io.clip_writer import ClipWriterStage
from my_code.my_file import MyClass
class MyStage(ProcessingStage[VideoTask, VideoTask]):
def process(self, task: VideoTask) -> VideoTask | list[VideoTask]:
helper = MyClass()
# use helper with task.data (Video/Clips)
return task
pipeline = (
Pipeline(name="my-pipeline")
.add_stage(VideoReader(input_video_path="/path/to/videos", video_limit=10))
.add_stage(MyStage())
.add_stage(
ClipWriterStage(
output_path="/path/to/output",
input_path="/path/to/videos",
upload_clips=True,
dry_run=False,
generate_embeddings=False,
generate_previews=False,
generate_captions=False,
embedding_algorithm="cosmos-embed1",
caption_models=["qwen"],
enhanced_caption_models=["qwen_lm"],
)
)
)
pipeline.run()
To containerize, use a Dockerfile to copy your code and install dependencies, then build and run with your preferred tooling. Prefer aligning packages with optional extras in pyproject.toml
.
Next Steps#
Now that you have created custom code, you can create a custom stage that uses your code.