Adding Custom Stages
Learn how to customize NeMo Curator by adding new pipeline stages.
NeMo Curator includes a series of pipelines with default stages; however, they might not always meet your pipeline requirements. This tutorial demonstrates how to add a new pipeline stage and integrate it into a pipeline.
Before You Start
Before you begin adding a new pipeline stage, make sure that you have:
- Reviewed the pipeline concepts and diagrams.
- Downloaded the NeMo Curator container.
- Reviewed the container environments available.
- Optionally created custom code that defines your new requirements.
- Optionally created a custom environment to support your new custom code.
- Optionally created a custom model.
How to Add a Custom Pipeline Stage
1. Define the Stage Class
2. Specify Resource Requirements
3. Implement Core Methods
Required methods for every stage:
Setup Method
Process Data Method
4. Update Data Model
Modify the pipeline’s data model to include your stage’s outputs:
5. Modify Pipeline Output Handling
Update the ClipWriterStage to handle your stage’s output:
-
Create a writer method:
-
Add to the main process:
Integration Steps
1. Build and Run a Pipeline in Python
2. Refer to Examples
For end-to-end usage, review and adapt the example:
examples/video/video_split_clip_example.py
3. (Optional) Containerize Your Changes
If you need a container image, extend your base image using a Dockerfile and include your code and dependencies. Then build and run with your preferred container tooling.