Similarity Search

When the perception service -as a prerequisite to this service- is working, it keeps sending predictions to the message broker. The similarity search service grabs those messages and updates its cache (i.e., a buffer of the most recent predictions). On an event, the similarity search service performs two searches:

Finding the prediction from the cache
Finding the closest item from the similarity search database.

The result of the second search is published to the message broker. The sections below provide more information about some of the concepts mentioned above.

Continuously Analyzing the Video Feed and Caching Results

The perception service continuously detects an object in video frames. For any detected object, an embedding vector is calculated. An embedding model is trained to separate visually different objects and an embedding vector is the inference result of such a model. We can also think that an embedding vector here is the mathematical representation of a detected object in a very high dimensional space. Comparing the distance of the embeddings of two objects is assumed to be the (dis)similarity metric for those objects. The assumption relies on the case where similar items are in closer proximity with each other compared to dissimilar items.

The caching part is simply keeping a buffer of the most recent predictions from the perception service for easy access. We follow first-in-first-out (FIFO) for this caching mechanism, so there is a queue controlling the cache to have the most recent predictions only and when a prediction is expired, a new one takes over its place.

On an Event

The similarity search service acts upon receiving an event signal. This signal is when a barcode is read and it is passed on to the service. Conventional barcode scanners read 1D or 2D barcodes and they are usually attached to a computer or sort. When an item is hovering around the barcode scanner, the machine recognizes the barcode and transmits that information to a computer or sort (it might also be sending it to a remote location as a message).

The similarity search service needs the time of the read and the barcode to function. There is an API endpoint allocated for this purpose: /ext/signals/barcode. A piece of code can forwards the data from the barcode scanner to the similarity search service. Check out Quickstart for an example.

Similarity Search

When the signal is received in the service, it uses the timestamp attached with the barcode information to search the cache to find the prediction that happened around the same time. Remember that, the system is continuously analyzing the video feed, finding bounding boxes, and calculating the embedding vectors.

Once the prediction is identified from the cache, the application then uses its embeddings to perform a search in the similarity search database. The similarity search database is a place that holds several items’ embeddings. The results of this search will give us the name/class of the item. In other words, the query embedding vector is predicted by its closest (i.e., most similar) match in the similarity search database. Once we have a visual prediction, we can compare it with the barcode read. If they are the same, that is great. If there is a misalignment, there would be an issue such as barcode data does not belong to the scanned object.