YOLO World

A LLM/VLM tool to aid BVI people in day-to-day navigation

    Python

    PyTorch

    LLM

    Jupyter

    Magic Leap

Key Achievements

  • Designed a pipeline to generate activity graphs from video input

  • Integrated pipeline with Magic Leap hardware

Project Background

This project is intended to be an application, which takes real-time video data, describes the contents of the video data, and allows the user to query for past information in natural language.

Use Case

Imagine a blind-vision-impaired individual is using the YOLO World application to navigate their daily life: they scan their home, receiving the description, "There is a desk with a red book sitting on it".

They then walk to a coffee shop, receiving audio descriptions along the way, "There is a white car stopped next to a stop sign". At the coffee shop, they order a drink and sit down "There is a cup sitting on a wooden table".

Suddenly, they think, "A book would go perfect with this coffee", but they cannot remember where they left their book.

They could ask YOLO World, "Where did I leave my book?" and YOLO World would reply, "The book is sitting on a desk at your home".

Affiliated With

ACM Research