# Boltz-2 NIM Inference Example with Interactive Visualization
Copyright (c) 2025, NVIDIA CORPORATION. Licensed under the Apache License, Version 2.0 (the "License") you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

## Prerequisites
This notebook leverages NVIDIA BioNeMo Boltz-2 NIM hosted locally. It is also possible to use NVIDIA-hosted NIM to run this workflow.  
Visit https://build.nvidia.com for instructions to run self-hosted or NVIDIA-hosted NIMs and system requirements for individual NIMs.

### Steps to launch the Boltz-2 NIM locally
Execute the following code snippets in a bash terminal.
```bash
docker login nvcr.io
Username: $oauthtoken
Password: <PASTE_API_KEY_HERE>

export NGC_API_KEY=<your personal NGC key>
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p $LOCAL_NIM_CACHE

docker run -it \
    --runtime=nvidia \
    -p 8000:8000 \
    -e NGC_API_KEY \
    -v "$LOCAL_NIM_CACHE":/opt/nim/.cache \
    nvcr.io/nim/mit/boltz2:1.0.0
```

--- 

__This notebook demonstrates how to perform protein structure prediction using the Boltz-2 NIM running locally and visualize the results interactively.__

**Example**: Structure of a transcription factor and DNA complex (https://www.rcsb.org/structure/5GNJ)


### API Information
- **Local Endpoint**: `http://localhost:8000/biology/mit/boltz2/predict`
- **Documentation**: `http://localhost:8000/docs`
- **Output Format**: mmCIF (macromolecular Crystallographic Information File)

### Other Requirements
- `httpx` for async HTTP requests (will be auto-installed)
- `py3Dmol` for interactive 3D visualization (will be auto-installed)
- Local Boltz-2 NIM running on port 8000

## Setup and Imports

In [1]:
# Install required packages if not available
import subprocess
import sys

def install_package(package):
    try:
        __import__(package)
        print(f"‚úÖ {package} is already installed")
    except ImportError:
        print(f"üì¶ Installing {package}...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        print(f"‚úÖ {package} installed successfully")

# Install required packages
install_package("httpx")
install_package("py3Dmol")

print("\nüéâ All packages are ready!")

‚úÖ httpx is already installed
‚úÖ py3Dmol is already installed

üéâ All packages are ready!


In [2]:
import asyncio
import json
import os
import time
from pathlib import Path
from datetime import datetime
from typing import Dict, Any, Optional, List
import httpx
import py3Dmol
from IPython.display import display, HTML

print("All imports successful!")

All imports successful!


### Configuration

In [3]:
# Local Boltz-2 NIM endpoint
BOLTZ2_URL = "http://localhost:8000/biology/mit/boltz2/predict"
HEALTH_URL = "http://localhost:8000/v1/health/live"

print(f"Boltz-2 Endpoint: {BOLTZ2_URL}")
print(f"Health Check: {HEALTH_URL}")

Boltz-2 Endpoint: http://localhost:8000/biology/mit/boltz2/predict
Health Check: http://localhost:8000/v1/health/live


### Health Check

Let's first verify the NIM is running and accessible:

In [4]:
async def check_nim_health():
    """Check if the Boltz-2 NIM is running and accessible."""
    try:
        async with httpx.AsyncClient(timeout=10.0) as client:
            response = await client.get(HEALTH_URL)
            if response.status_code == 200:
                print("‚úÖ Boltz-2 NIM is running and accessible")
                return True
            else:
                print(f"‚ö†Ô∏è Health check returned status {response.status_code}")
                return False
    except Exception as e:
        print(f"‚ùå Cannot connect to Boltz-2 NIM: {e}")
        return False

# Check NIM health
nim_healthy = await check_nim_health()

‚úÖ Boltz-2 NIM is running and accessible


### API Client Functions

In [5]:
async def make_boltz2_prediction(request_data: Dict[str, Any], timeout: int = 300) -> Optional[Dict]:
    """
    Make a prediction request to the local Boltz-2 NIM.
    
    Args:
        request_data: The prediction request payload
        timeout: Request timeout in seconds
    
    Returns:
        Response data or None if failed
    """
    headers = {
        "Content-Type": "application/json"
    }
    
    async with httpx.AsyncClient(timeout=timeout) as client:
        print(f"üöÄ Making prediction request to {BOLTZ2_URL}")
        print(f"‚è±Ô∏è Timeout set to {timeout} seconds")
        
        try:
            start_time = time.time()
            
            response = await client.post(BOLTZ2_URL, json=request_data, headers=headers)
            
            end_time = time.time()
            duration = end_time - start_time
            
            print(f"üì° Response received in {duration:.2f} seconds")
            print(f"üìä Status code: {response.status_code}")
            
            if response.status_code == 200:
                print("‚úÖ Prediction successful!")
                return response.json()
            else:
                print(f"‚ùå Prediction failed: {response.status_code}")
                print(f"Error details: {response.text}")
                return None
                
        except httpx.TimeoutException:
            print(f"‚è∞ Request timed out after {timeout} seconds")
            return None
        except Exception as e:
            print(f"‚ùå Request failed: {e}")
            return None

print("API client functions defined successfully!")

API client functions defined successfully!


### 3D Visualization Functions

In [6]:
def visualize_structure(structure_data: str, title: str = "Protein Structure", 
                       width: int = 800, height: int = 600, 
                       style: str = "cartoon", color_scheme: str = "spectrum"):
    """
    Visualize a protein structure using py3Dmol.
    
    Args:
        structure_data: mmCIF structure data as string
        title: Title for the visualization
        width: Viewer width in pixels
        height: Viewer height in pixels
        style: Visualization style ('cartoon', 'stick', 'sphere', 'line')
        color_scheme: Color scheme ('spectrum', 'chain', 'residue', 'atom')
    
    Returns:
        py3Dmol viewer object
    """
    # Create viewer
    viewer = py3Dmol.view(width=width, height=height)
    
    # Add structure
    viewer.addModel(structure_data, 'cif')
    
    # Set style based on parameters
    if style == "cartoon":
        viewer.setStyle({'cartoon': {'color': color_scheme}})
    elif style == "stick":
        viewer.setStyle({'stick': {'color': color_scheme}})
    elif style == "sphere":
        viewer.setStyle({'sphere': {'color': color_scheme}})
    elif style == "line":
        viewer.setStyle({'line': {'color': color_scheme}})
    else:
        # Default to cartoon
        viewer.setStyle({'cartoon': {'color': color_scheme}})
    
    # Center and zoom
    viewer.zoomTo()
    
    # Add title
    display(HTML(f"<h3 style='text-align: center; color: #2E86AB;'>{title}</h3>"))
    
    return viewer

def visualize_protein_ligand_complex(structure_data: str, title: str = "Protein-Ligand Complex",
                                   width: int = 800, height: int = 600):
    """
    Specialized visualization for protein-ligand complexes.
    
    Args:
        structure_data: mmCIF structure data as string
        title: Title for the visualization
        width: Viewer width in pixels
        height: Viewer height in pixels
    
    Returns:
        py3Dmol viewer object
    """
    viewer = py3Dmol.view(width=width, height=height)
    
    # Add structure
    viewer.addModel(structure_data, 'cif')
    
    # Style protein as cartoon
    viewer.setStyle({'and': [{'resn': ['ALA', 'ARG', 'ASN', 'ASP', 'CYS', 'GLN', 'GLU', 'GLY', 'HIS', 'ILE', 
                                      'LEU', 'LYS', 'MET', 'PHE', 'PRO', 'SER', 'THR', 'TRP', 'TYR', 'VAL']}]}, 
                    {'cartoon': {'color': 'spectrum'}})
    
    # Style ligands as sticks
    viewer.setStyle({'and': [{'not': {'resn': ['ALA', 'ARG', 'ASN', 'ASP', 'CYS', 'GLN', 'GLU', 'GLY', 'HIS', 'ILE', 
                                              'LEU', 'LYS', 'MET', 'PHE', 'PRO', 'SER', 'THR', 'TRP', 'TYR', 'VAL']}}]}, 
                    {'stick': {'color': 'red', 'radius': 0.3}})
    
    # Center and zoom
    viewer.zoomTo()
    
    # Add title
    display(HTML(f"<h3 style='text-align: center; color: #A23B72;'>{title}</h3>"))
    
    return viewer

def create_multi_view_visualization(structures: List[str], titles: List[str] = None,
                                  width: int = 400, height: int = 400):
    """
    Create side-by-side visualization of multiple structures.
    
    Args:
        structures: List of mmCIF structure data strings
        titles: List of titles for each structure
        width: Width of each viewer
        height: Height of each viewer
    
    Returns:
        List of py3Dmol viewer objects
    """
    if titles is None:
        titles = [f"Structure {i+1}" for i in range(len(structures))]
    
    viewers = []
    
    # Create HTML for side-by-side layout
    html_content = "<div style='display: flex; flex-wrap: wrap; justify-content: center;'>"
    
    for i, (structure, title) in enumerate(zip(structures, titles)):
        viewer = py3Dmol.view(width=width, height=height)
        viewer.addModel(structure, 'cif')
        viewer.setStyle({'cartoon': {'color': 'spectrum'}})
        viewer.zoomTo()
        
        viewers.append(viewer)
        
        # Add title for each structure
        display(HTML(f"<h4 style='text-align: center; color: #F18F01;'>{title}</h4>"))
        viewer.show()
    
    return viewers

print("3D visualization functions defined successfully!")

3D visualization functions defined successfully!


## Prepare Request Data

Based on the API schema, here are the key parameters:

**Required**:
- `polymers`: List of polymers (DNA, RNA, or Protein) - max 5

**Optional**:
- `ligands`: List of ligands - max 5
- `constraints`: Pocket or bond constraints
- `recycling_steps`: 1-6 (default: 3)
- `sampling_steps`: 10-1000 (default: 50)
- `diffusion_samples`: 1-5 (default: 1)
- `step_scale`: 0.5-5.0 (default: 1.638)
- `without_potentials`: boolean (default: false)
- `output_format`: "mmcif" (default)
- `concatenate_msas`: boolean (default: false)

In [7]:
# Example protein sequence
# sequence = "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"

# Prepare the request payload according to the API schema
request_data = {
    "polymers": [
        {
            "id": "A",
            "molecule_type": "protein", 
            "sequence": "MGREEPLNHVEAERQRREKLNQRFYALRAVVPNVSKMDKASLLGDAIAYINELKSKVVKTESEKLQIKNQLEEVKLELAGRLEHHHHHH",
            "cyclic": False,
            "modifications": []  # No modifications
        },
        {
            "id": "B",
            "molecule_type": "protein", 
            "sequence": "MGREEPLNHVEAERQRREKLNQRFYALRAVVPNVSKMDKASLLGDAIAYINELKSKVVKTESEKLQIKNQLEEVKLELAGRLEHHHHHH",
            "cyclic": False,
            "modifications": []  # No modifications
        },
        {
            "id": "C",
            "molecule_type": "dna", 
            "sequence": "TGGGTCACGTGTTCC",
            "cyclic": False,
            "modifications": []  # No modifications
        },
        {
            "id": "D",
            "molecule_type": "dna", 
            "sequence": "AGGAACACGTGACCC",
            "cyclic": False,
            "modifications": []  # No modifications
        }
    ],
    "constraints": [],  # No constraints
    "recycling_steps": 3,  # Default value
    "sampling_steps": 50,  # Default value
    "diffusion_samples": 1,  # Default value
    "step_scale": 1.638,  # Default value
    "without_potentials": False,  # Include potentials
    "output_format": "mmcif",  # mmCIF format
    "concatenate_msas": False  # Don't concatenate MSAs
}

print(f"üìã Request Summary:")
# print(f"   Protein sequence length: {len(sequence)} amino acids")
# print(f"   Ligand: {request_data['ligands'][0]['smiles']} (Aspirin)")
print(f"   Recycling steps: {request_data['recycling_steps']}")
print(f"   Sampling steps: {request_data['sampling_steps']}")
print(f"   Diffusion samples: {request_data['diffusion_samples']}")
print(f"   Step scale: {request_data['step_scale']}")
print(f"   Output format: {request_data['output_format']}")

üìã Request Summary:
   Recycling steps: 3
   Sampling steps: 50
   Diffusion samples: 1
   Step scale: 1.638
   Output format: mmcif


## Make the Prediction Request

In [8]:
# Only proceed if NIM is healthy
if nim_healthy:
    print(f"üéØ Starting Boltz-2 prediction at {datetime.now()}")
    # print(f"üß¨ Sequence: {sequence[:30]}...{sequence[-10:]}")
    
    # Make the prediction
    prediction_result = await make_boltz2_prediction(request_data, timeout=600)  # 10 minute timeout
    
    if prediction_result:
        # Save the result
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        output_file = f"boltz2_prediction_{timestamp}.json"
        
        with open(output_file, 'w') as f:
            json.dump(prediction_result, f, indent=2)
        
        print(f"üíæ Results saved to: {output_file}")
    else:
        print("‚ùå Prediction failed")
        prediction_result = None
else:
    print("‚ùå Cannot proceed - NIM is not accessible")
    prediction_result = None

üéØ Starting Boltz-2 prediction at 2025-06-05 11:38:15.259851
üöÄ Making prediction request to http://localhost:8000/biology/mit/boltz2/predict
‚è±Ô∏è Timeout set to 600 seconds
üì° Response received in 6.35 seconds
üìä Status code: 200
‚úÖ Prediction successful!
üíæ Results saved to: boltz2_prediction_20250605_113821.json


## Analyze Results and Create Visualizations

In [9]:
if prediction_result:
    print("\nüî¨ === PREDICTION RESULTS ANALYSIS ===")
    
    # Show available keys in the response
    print(f"üìä Available data fields: {list(prediction_result.keys())}")
    
    # Analyze structures
    if 'structures' in prediction_result:
        structures = prediction_result['structures']
        print(f"\nüèóÔ∏è Structure Analysis:")
        print(f"   Number of structures: {len(structures)}")
        
        for i, structure in enumerate(structures):
            print(f"\n   Structure {i+1}:")
            print(f"     Format: {structure.get('format', 'Unknown')}")
            print(f"     Size: {len(structure.get('structure', ''))} characters")
            print(f"     Name: {structure.get('name', 'Unnamed')}")
            print(f"     Source: {structure.get('source', 'Unknown')}")
    
    # Analyze confidence scores
    if 'confidence_scores' in prediction_result:
        scores = prediction_result['confidence_scores']
        print(f"\nüéØ Confidence Analysis:")
        print(f"   Number of scores: {len(scores)}")
        
        if scores:
            print(f"   Average confidence: {sum(scores)/len(scores):.3f}")
            print(f"   Min confidence: {min(scores):.3f}")
            print(f"   Max confidence: {max(scores):.3f}")
            print(f"   All scores: {[f'{s:.3f}' for s in scores]}")
    
    # Show metrics if available
    if 'metrics' in prediction_result and prediction_result['metrics']:
        print(f"\nüìà Runtime Metrics:")
        for key, value in prediction_result['metrics'].items():
            print(f"   {key}: {value}")
    
    # Save individual structures
    structure_files = []
    if 'structures' in prediction_result:
        for i, structure in enumerate(prediction_result['structures']):
            if structure.get('format') == 'mmcif':
                structure_file = f"boltz2_structure_{i+1}_{timestamp}.cif"
                with open(structure_file, 'w') as f:
                    f.write(structure['structure'])
                structure_files.append(structure_file)
                print(f"üíæ Structure {i+1} saved to: {structure_file}")
        
else:
    print("‚ùå No results to analyze - prediction failed or was not attempted.")
    structure_files = []


üî¨ === PREDICTION RESULTS ANALYSIS ===
üìä Available data fields: ['structures', 'metrics', 'confidence_scores']

üèóÔ∏è Structure Analysis:
   Number of structures: 1

   Structure 1:
     Format: mmcif
     Size: 173710 characters
     Name: 
     Source: oJeTY8_model_0.cif

üéØ Confidence Analysis:
   Number of scores: 1
   Average confidence: 0.949
   Min confidence: 0.949
   Max confidence: 0.949
   All scores: ['0.949']
üíæ Structure 1 saved to: boltz2_structure_1_20250605_113821.cif


### üé® Interactive 3D Visualization

Now let's visualize the predicted structures interactively!

In [10]:
if prediction_result and 'structures' in prediction_result:
    structures = prediction_result['structures']
    
    print("üé® Creating interactive 3D visualizations...\n")
    
    # Visualize each structure
    for i, structure in enumerate(structures):
        if structure.get('format') == 'mmcif':
            structure_data = structure['structure']
            
            # Get confidence score for this structure if available
            confidence_info = ""
            if 'confidence_scores' in prediction_result and i < len(prediction_result['confidence_scores']):
                confidence = prediction_result['confidence_scores'][i]
                confidence_info = f" (Confidence: {confidence:.3f})"
            
            title = f"Boltz-2 Predicted Structure {i+1}{confidence_info}"
            
            # Create main visualization
            print(f"\nüìç Structure {i+1} Visualization:")
            
            # Check if we have ligands in the request
            has_ligands = 'ligands' in request_data and len(request_data['ligands']) > 0
            
            if has_ligands:
                # Use protein-ligand complex visualization
                viewer = visualize_protein_ligand_complex(
                    structure_data, 
                    title=title + " - Protein-Ligand Complex",
                    width=900, 
                    height=600
                )
            else:
                # Use standard protein visualization
                viewer = visualize_structure(
                    structure_data, 
                    title=title,
                    width=900, 
                    height=600,
                    style="cartoon",
                    color_scheme="spectrum"
                )
            
            viewer.show()
            
            print(f"\nüéõÔ∏è Interactive controls:")
            print(f"   ‚Ä¢ Mouse: Rotate structure")
            print(f"   ‚Ä¢ Scroll: Zoom in/out")
            print(f"   ‚Ä¢ Right-click + drag: Pan")
            
else:
    print("‚ùå No structures available for visualization")

üé® Creating interactive 3D visualizations...


üìç Structure 1 Visualization:



üéõÔ∏è Interactive controls:
   ‚Ä¢ Mouse: Rotate structure
   ‚Ä¢ Scroll: Zoom in/out
   ‚Ä¢ Right-click + drag: Pan


### üíæ Export Visualization

Save visualizations as images (requires additional setup):

In [11]:
# Note: Image export requires additional browser setup
# This cell provides instructions for manual export

print("üì∏ To save visualizations as images:")
print("\n1. Right-click on any 3D visualization above")
print("2. Select 'Save image as...' or 'Copy image'")
print("3. Choose your desired location and format")
print("\nAlternatively, you can:")
print("‚Ä¢ Take screenshots of the visualizations")
print("‚Ä¢ Use browser developer tools to export canvas")
print("‚Ä¢ Load the .cif files in external software like PyMOL or ChimeraX")

if structure_files:
    print(f"\nüìÅ Structure files saved for external visualization:")
    for file in structure_files:
        print(f"   ‚Ä¢ {file}")
    print("\nThese can be opened in:")
    print("   ‚Ä¢ PyMOL: pymol structure_file.cif")
    print("   ‚Ä¢ ChimeraX: open structure_file.cif")
    print("   ‚Ä¢ VMD: vmd structure_file.cif")

üì∏ To save visualizations as images:

1. Right-click on any 3D visualization above
2. Select 'Save image as...' or 'Copy image'
3. Choose your desired location and format

Alternatively, you can:
‚Ä¢ Take screenshots of the visualizations
‚Ä¢ Use browser developer tools to export canvas
‚Ä¢ Load the .cif files in external software like PyMOL or ChimeraX

üìÅ Structure files saved for external visualization:
   ‚Ä¢ boltz2_structure_1_20250605_113821.cif

These can be opened in:
   ‚Ä¢ PyMOL: pymol structure_file.cif
   ‚Ä¢ ChimeraX: open structure_file.cif
   ‚Ä¢ VMD: vmd structure_file.cif


## Summary

This notebook demonstrates comprehensive Boltz-2 NIM usage with interactive visualization:

### ‚úÖ **Key Features:**
1. **Local NIM Integration** - Direct connection to your local Boltz-2 instance
2. **Health Checking** - Verifies NIM availability before making requests
3. **Interactive 3D Visualization** - py3Dmol integration for structure viewing
4. **Multiple Visualization Styles** - Cartoon, stick, sphere, and line representations
5. **Biomolecular Complex Prediction** - Specialized visualization for complexes
6. **Confidence Score Analysis** - Visual and statistical confidence assessment
7. **File Output** - Saves both JSON results and individual mmCIF structure files
8. **Parameter Flexibility** - Easy to adjust prediction quality vs. speed

### ÔøΩÔøΩ **API Parameters:**
- **recycling_steps**: 1-6 (affects accuracy, default: 3)
- **sampling_steps**: 10-1000 (affects quality, default: 50)
- **diffusion_samples**: 1-5 (multiple predictions, default: 1)
- **step_scale**: 0.5-5.0 (temperature, default: 1.638)

### üìÅ **Output Files:**
- `boltz2_prediction_YYYYMMDD_HHMMSS.json` - Complete API response
- `boltz2_structure_N_YYYYMMDD_HHMMSS.cif` - Individual structure files

### üöÄ **Next Steps:**
1. Experiment with different protein sequences
2. Try various ligands using SMILES notation
3. Adjust parameters for your speed/quality needs
4. Export structures for external analysis
5. Compare multiple predictions side-by-side
6. Analyze confidence scores for structure quality assessment