Vidi2 advances video understanding with fine-grained spatio-temporal grounding and extends capability to video question answering, enabling comprehensive multimodal reasoning.
Driving the next generation of video understanding and creation.
Duration distribution of videos in the proposed VUE-STG evaluation benchmark.
Duration distribution of videos in the proposed VUE-TR-V2 evaluation benchmark.
The distribution of query modality and format in the VUE-TR-V2 benchmark.
Demonstration of Vidi2's capabilities
The man wearing a brown suit who is playing drums in an indoor setting
The gorilla which is driving with two men.
a woman in glasses who is walking on street
The boy who stands outside a charming house with warm lights, beneath a starry night sky featuring a full moon.
The glowing blue water beads in which the mango seed is placed, with its germination into a root and shoot visualized through a time-lapse sequence against a dark background
basketball statue
gymnasium
people assembling sculptures on beach
Euripides, has most surviving work like 'Medea' and 'The Bacchae', debut in 455 BC. He is a corner stone of greek education in the Hellenistic period.
Jennifer Nagel self-introduction
divine wind
FTC resources
North Devon Marine Pioneer
Driving the next generation of video understanding and creation.
Locate objects and events in both space and time with precision.
Find specific moments in videos using natural language queries.
Answer complex questions about video content.
Process and understand videos up to 30 minutes long.
Generate timestamps and bounding boxes for target objects.
Assist in video editing, reframing, and content generation.