YOLO World
A LLM/VLM tool to aid BVI people in day-to-day navigation


Python
PyTorch
LLM
Jupyter
Magic Leap
Key Achievements
Designed a pipeline to generate activity graphs from video input
Integrated pipeline with Magic Leap hardware
Project Background
This project is intended to be an application, which takes real-time video data, describes the contents of the video data, and allows the user to query for past information in natural language.
Use Case
Imagine a blind-vision-impaired individual is using the YOLO World application to navigate their daily life: they scan their home,
receiving the description, "There is a desk with a red book sitting on it".
They then walk to a coffee shop, receiving audio descriptions
along the way, "There is a white car stopped next to a stop sign". At the coffee shop, they order a drink and sit down "There is a cup sitting
on a wooden table".
Suddenly, they think, "A book would go perfect with this coffee", but they cannot remember where they left their book.
They could ask YOLO World, "Where did I leave my book?" and YOLO World would reply, "The book is sitting on a desk at your home".
Affiliated With
