Accessing AI Models#
Overview#
Leverage the GRID AI models to process the sensor data. Each model provides a different type of environmental understanding. In this section, we provide some examples of how to use models of different categories. Feel free to choose and play with other models or combine these models in creative ways to achieve intelligent capabilities!
Tip
You can find more information on the available AI models in the documentation.
Visual Language Model: MoonDream#
Use the MoonDream VLM to interpret the scene by answering a natural language question based on the RGB image. GRID allows you to access state-of-the-art vision-language intelligence in just 2-3 lines of code.
1from grid.model.perception.vlm.moondream import MoonDream
2
3vlm = MoonDream()
4vlm.run(rgb_image.data, "What do you see?")
Try a few different prompts to see how it responds.
Segmentation Model: OneFormer#
Segment the scene to distinguish different objects or regions. This code block performs panoptic segmentation, which returns all the categories visible.
1from grid.model.perception.segmentation.oneformer import OneFormer
2seg_model = OneFormer()
3seg_mask = seg_model.run(rgb_image.data, mode="panoptic")
4
5import rerun as rr
6rr.log("segmentation_model", rr.SegmentationImage(seg_mask))
Depth Estimation Model: Metric3D#
Monocular depth, which is the idea of going from RGB images directly to depth without having a depth camera, is a rapidly advancing field. With this AI model, you can enable a constrained robot to use a neural network to generate a depth map from a monocular RGB image to understand the distance to various parts of the scene.
Use the following snippet to import Metric3D.
1from grid.model.perception.depth.metric3d import Metric3D
2depth_model = Metric3D()
3depth_image = depth_model.run(rgb_image.data)
4rr.log("depth_model", rr.DepthImage(depth_image))
Feel free to experiment with this workflow or try out different AI models.
Object Detection Model: OWLv2#
OWLv2 is an object detection model that can enable your robot to detect different objects—in this example, a forklift—from the RGB image.
1from grid.model.perception.detection.owlv2 import OWLv2
2det_model = OWLv2()
3
4boxes, scores, labels = det_model.run(rgbimage=rgb_image.data, prompt=”forklift”)
The forklift might not be visible directly from where the robot is. Try to combine this with rotation or other movement commands to ‘search’ for the forklift!