LLM-Based Columns#
LLM-based columns use large language models to generate intelligent, contextual content. These columns can create text, code, structured data, and even evaluate the quality of other generated content. All LLM-based columns support multi-modal context, allowing you to incorporate images into the generation process using vision-capable models.
Before You Start#
Setup#
Before getting started, ensure you have the Data Designer client and configuration builder set up:
import os
from nemo_microservices.data_designer.essentials import (
DataDesignerConfigBuilder,
LLMCodeColumnConfig,
LLMJudgeColumnConfig,
LLMStructuredColumnConfig,
LLMTextColumnConfig,
NeMoDataDesignerClient,
Score
)
data_designer_client = NeMoDataDesignerClient(
base_url=os.environ["NEMO_MICROSERVICES_BASE_URL"]
)
config_builder = DataDesignerConfigBuilder(model_configs="path/to/your/model_configs.yaml")
Note
For detailed information on using images with LLM columns, refer to the Multi-Modal Context guide.
LLM Text Generation#
Generates natural language text based on prompts and context from other columns.
config_builder.add_column(
LLMTextColumnConfig(
name="product_description",
model_alias="text",
prompt="Generate a detailed description for a {{product_category}} product named {{product_name}}.",
)
)
config_builder.add_column(
name="product_description",
column_type="llm-text",
model_alias="text",
prompt="Generate a detailed description for a {{product_category}} product named {{product_name}}.",
)
LLM Code Generation#
Generates code in various programming languages.
config_builder.add_column(
LLMCodeColumnConfig(
name="python_function",
model_alias="code",
prompt="Write a Python function that {{function_description}}.",
code_lang="python"
)
)
config_builder.add_column(
name="sql_query",
column_type="llm-code",
model_alias="code",
prompt="Generate a Python function that {{function_description}}.",
output_format="python"
)
LLM Structured Generation#
Generates structured data that conforms to predefined schemas using Pydantic models or JSON schemas.
from pydantic import BaseModel
class UserProfile(BaseModel):
bio: str
skills: list[str]
experience_years: int
builder.add_column(
LLMStructuredColumnConfig(
name="user_profile",
model_alias="structured",
prompt="Generate a user profile for {{user_type}} in {{industry}}.",
output_format=UserProfile
)
)
from pydantic import BaseModel
class UserProfile(BaseModel):
bio: str
skills: list[str]
experience_years: int
builder.add_column(
name="user_profile",
column_type="llm-structured",
model_alias="structured",
prompt="Generate a user profile for {{user_type}} in {{industry}}.",
output_format=UserProfile
)
For more details on structured outputs, see the Structured Outputs guide.
LLM Judge#
Evaluates and scores the quality of generated content using large language models.
config_builder.add_column(
LLMJudgeColumnConfig(
name="response_quality",
model_alias="judge",
prompt="Evaluate the quality of this response: {{response_text}}",
scores=[
Score(
name="accuracy",
description="Is the response factually correct?",
options={
"4": "Completely accurate with no errors",
"3": "Mostly accurate with minor errors",
"2": "Somewhat accurate with some errors",
"1": "Mostly inaccurate with major errors",
"0": "Completely inaccurate"
}
),
Score(
name="clarity",
description="Is the response clear and easy to understand?",
options={
"4": "Extremely clear and well-structured",
"3": "Clear with good structure",
"2": "Reasonably clear with adequate structure",
"1": "Somewhat unclear with poor structure",
"0": "Very unclear and confusing"
}
),
Score(
name="completeness",
description="Does the response fully address the question?",
options={
"4": "Completely addresses all aspects",
"3": "Addresses most aspects thoroughly",
"2": "Addresses some aspects adequately",
"1": "Addresses few aspects superficially",
"0": "Does not address the question"
}
)
]
)
)
config_builder.add_column(
name="code_quality_score",
column_type="llm-judge",
prompt="Evaluate the quality of this response: {{response_text}}",
scores=[
{
"name": "accuracy",
"description": "Is the response factually correct?",
"options": {
"4": "Completely accurate with no errors",
"3": "Mostly accurate with minor errors",
"2": "Somewhat accurate with some errors",
"1": "Mostly inaccurate with major errors",
"0": "Completely inaccurate"
}
},
{
"name": "clarity",
"description": "Is the response clear and easy to understand?",
"options": {
"4": "Extremely clear and well-structured",
"3": "Clear with good structure",
"2": "Reasonably clear with adequate structure",
"1": "Somewhat unclear with poor structure",
"0": "Very unclear and confusing"
}
},
{
"name": "completeness",
"description": "Does the response fully address the question?",
"options": {
"4": "Completely addresses all aspects",
"3": "Addresses most aspects thoroughly",
"2": "Addresses some aspects adequately",
"1": "Addresses few aspects superficially",
"0": "Does not address the question"
}
}
]
model_alias="judge"
)
Code Validation#
Validates code syntax and execution in LLM-generated code columns.
config_builder.add_column(
ValidationColumnConfig(
name="python_validation",
code_lang="python",
target_column="generated_python_code"
)
)
config_builder.add_column(
name="python_validation",
column_type="validation",
code_lang="python",
target_column="generated_python_code"
)
Reference Table#
Simplified API Type |
Typed API Equivalent |
Description |
---|---|---|
|
|
LLM-generated text content |
|
|
LLM-generated code content |
|
|
LLM-generated structured content |
|
|
LLM-based evaluation |
|
|
Validation |