LLM-Based Columns#

LLM-based columns use large language models to generate intelligent, contextual content. These columns can create text, code, structured data, and even evaluate the quality of other generated content. All LLM-based columns support multi-modal context, allowing you to incorporate images into the generation process using vision-capable models.

Before You Start#

Setup#

Before getting started, ensure you have the Data Designer client and configuration builder set up:

import os
from nemo_microservices.data_designer.essentials import (
    DataDesignerConfigBuilder,
    LLMCodeColumnConfig,
    LLMJudgeColumnConfig,
    LLMStructuredColumnConfig,
    LLMTextColumnConfig,
    NeMoDataDesignerClient,
    Score
)

data_designer_client = NeMoDataDesignerClient(
    base_url=os.environ["NEMO_MICROSERVICES_BASE_URL"]
)

config_builder = DataDesignerConfigBuilder(model_configs="path/to/your/model_configs.yaml")

Note

For detailed information on using images with LLM columns, refer to the Multi-Modal Context guide.

LLM Text Generation#

Generates natural language text based on prompts and context from other columns.

Typed API

config_builder.add_column(
    LLMTextColumnConfig(
        name="product_description",
        model_alias="text",
        prompt="Generate a detailed description for a {{product_category}} product named {{product_name}}.",
    )
)

Simplified API

config_builder.add_column(
    name="product_description",
    column_type="llm-text",
    model_alias="text", 
    prompt="Generate a detailed description for a {{product_category}} product named {{product_name}}.",
)

LLM Code Generation#

Generates code in various programming languages.

Typed API

config_builder.add_column(
    LLMCodeColumnConfig(
        name="python_function",
        model_alias="code",
        prompt="Write a Python function that {{function_description}}.",
        code_lang="python"
    )
)

Simplified API

config_builder.add_column(
    name="sql_query",
    column_type="llm-code",
    model_alias="code",  
    prompt="Generate a Python function that {{function_description}}.",
    output_format="python"  
)

LLM Structured Generation#

Generates structured data that conforms to predefined schemas using Pydantic models or JSON schemas.

Typed API

from pydantic import BaseModel

class UserProfile(BaseModel):
    bio: str
    skills: list[str]
    experience_years: int

builder.add_column(
    LLMStructuredColumnConfig(
        name="user_profile",
        model_alias="structured",
        prompt="Generate a user profile for {{user_type}} in {{industry}}.",
        output_format=UserProfile  
    )
)

Simplified API

from pydantic import BaseModel

class UserProfile(BaseModel):
    bio: str
    skills: list[str]
    experience_years: int


builder.add_column(
    name="user_profile",
    column_type="llm-structured",
    model_alias="structured",
    prompt="Generate a user profile for {{user_type}} in {{industry}}.",
    output_format=UserProfile 
)

For more details on structured outputs, see the Structured Outputs guide.

LLM Judge#

Evaluates and scores the quality of generated content using large language models.

Typed API

config_builder.add_column(
    LLMJudgeColumnConfig(
        name="response_quality",
        model_alias="judge",  
        prompt="Evaluate the quality of this response: {{response_text}}",
        scores=[  
            Score(
                name="accuracy",
                description="Is the response factually correct?",
                options={
                    "4": "Completely accurate with no errors",
                    "3": "Mostly accurate with minor errors", 
                    "2": "Somewhat accurate with some errors",
                    "1": "Mostly inaccurate with major errors",
                    "0": "Completely inaccurate"
                }
            ),
            Score(
                name="clarity",
                description="Is the response clear and easy to understand?",
                options={
                    "4": "Extremely clear and well-structured",
                    "3": "Clear with good structure",
                    "2": "Reasonably clear with adequate structure",
                    "1": "Somewhat unclear with poor structure",
                    "0": "Very unclear and confusing"
                }
            ),
            Score(
                name="completeness",
                description="Does the response fully address the question?",
                options={
                    "4": "Completely addresses all aspects",
                    "3": "Addresses most aspects thoroughly",
                    "2": "Addresses some aspects adequately",
                    "1": "Addresses few aspects superficially",
                    "0": "Does not address the question"
                }
            )
        ]
    )
)

Simplified API

config_builder.add_column(
    name="code_quality_score",
    column_type="llm-judge",
    prompt="Evaluate the quality of this response: {{response_text}}",
    scores=[  
            {
                "name": "accuracy",
                "description": "Is the response factually correct?",
                "options": {
                    "4": "Completely accurate with no errors",
                    "3": "Mostly accurate with minor errors", 
                    "2": "Somewhat accurate with some errors",
                    "1": "Mostly inaccurate with major errors",
                    "0": "Completely inaccurate"
                }
            },
            {
                "name": "clarity",
                "description": "Is the response clear and easy to understand?",
                "options": {
                    "4": "Extremely clear and well-structured",
                    "3": "Clear with good structure",
                    "2": "Reasonably clear with adequate structure",
                    "1": "Somewhat unclear with poor structure",
                    "0": "Very unclear and confusing"
                }
            },
            {
                "name": "completeness",
                "description": "Does the response fully address the question?",
                "options": {
                    "4": "Completely addresses all aspects",
                    "3": "Addresses most aspects thoroughly",
                    "2": "Addresses some aspects adequately",
                    "1": "Addresses few aspects superficially",
                    "0": "Does not address the question"
                }
            }
        ]
    model_alias="judge"  
)

Code Validation#

Validates code syntax and execution in LLM-generated code columns.

Typed API

config_builder.add_column(
    ValidationColumnConfig(
        name="python_validation",
        code_lang="python",
        target_column="generated_python_code"
    )
)

Simplified API

config_builder.add_column(
    name="python_validation",
    column_type="validation",
    code_lang="python",
    target_column="generated_python_code"
)

Reference Table#

Simplified API Type	Typed API Equivalent	Description
`"llm-text"`	`LLMTextColumnConfig`	LLM-generated text content
`"llm-code"`	`LLMCodeColumnConfig`	LLM-generated code content
`"llm-structured"`	`LLMStructuredColumnConfig`	LLM-generated structured content
`"llm-judge"`	`LLMJudgeColumnConfig`	LLM-based evaluation
`"validation"`	`ValidationColumnConfig`	Validation