Accessing AI Models#

Overview#

Leverage the GRID AI models to process the sensor data. Each model provides a different type of environmental understanding. In this section, we provide some examples of how to use models of different categories. Feel free to choose and play with other models or combine these models in creative ways to achieve intelligent capabilities!

Tip

You can find more information on the available AI models in the documentation.

Visual Language Model: MoonDream#

Use the MoonDream VLM to interpret the scene by answering a natural language question based on the RGB image. GRID allows you to access state-of-the-art vision-language intelligence in just 2-3 lines of code.

from grid.model.perception.vlm.moondream import MoonDream 

vlm = MoonDream() 
vlm.run(rgb_image.data, "What do you see?")

Try a few different prompts to see how it responds.

Segmentation Model: OneFormer#

Segment the scene to distinguish different objects or regions. This code block performs panoptic segmentation, which returns all the categories visible.

from grid.model.perception.segmentation.oneformer import OneFormer
seg_model = OneFormer()
seg_mask = seg_model.run(rgb_image.data, mode="panoptic")

import rerun as rr
rr.log("segmentation_model", rr.SegmentationImage(seg_mask))

Depth Estimation Model: Metric3D#

Monocular depth, which is the idea of going from RGB images directly to depth without having a depth camera, is a rapidly advancing field. With this AI model, you can enable a constrained robot to use a neural network to generate a depth map from a monocular RGB image to understand the distance to various parts of the scene.

Use the following snippet to import Metric3D.

from grid.model.perception.depth.metric3d import Metric3D
depth_model = Metric3D()
depth_image = depth_model.run(rgb_image.data)
rr.log("depth_model", rr.DepthImage(depth_image))

Feel free to experiment with this workflow or try out different AI models.

Object Detection Model: OWLv2#

OWLv2 is an object detection model that can enable your robot to detect different objects—in this example, a forklift—from the RGB image.

from grid.model.perception.detection.owlv2 import OWLv2
det_model = OWLv2()

boxes, scores, labels = det_model.run(rgbimage=rgb_image.data, prompt=”forklift”)

The forklift might not be visible directly from where the robot is. Try to combine this with rotation or other movement commands to ‘search’ for the forklift!