About

Over the years, there have been a variety of visual reasoning tasks that evaluate machines’ ability to understand and reason about visual scenes. However, these benchmarks mostly focus on classification of objects and items that exist in a scene. Common sense reasoning – an understanding of what might happen next, or what gave rise to the scene – is often absent in these benchmarks. Humans, on the other hand, are highly versatile, adept in numerous high-level cognition-related visual reasoning tasks that go beyond pattern recognition and require common sense (e.g., physics, causality, functionality, psychology, etc).

In order to design systems with human-like visual understanding of the world, we would like to emphasize benchmarks and tasks that evaluate common sense reasoning across a variety of domains, including but not limited to:

Video recordings


Challenge Tracks

There will be two core tracks in the machine vision common sense challenge:

STAR Challenge

STAR (accepted by NeurIPS2021). STAR Benchmark is a novel dataset for Situated Reasoning, which provides challenging question-answering tasks, symbolic situation descriptions and logic-grounded diagnosis via real-world video situations. Reasoning in the real world is not divorced from situations. A key challenge is to capture the present knowledge from surrounding situations and reason accordingly. Situated reasoning is a cognitive process where a person uses context and environmental cues to make decisions and solve problems.

Download Link
STAR Challenge Evaluation

CLEVRER & ComPhy Challenge

CLEVRER (accepted by ICLR2020) and ComPhy (accepted by ICLR 2022). CLEVRER is a diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. Motivated by the theory of human casual judgment, CLEVRER includes four types of question: descriptive (e.g., “what color"), explanatory (”what’s responsible for"), predictive (”what will happen next"), and counterfactual (“what if"). ComPhy takes a step further and requires machines to learn the new compositional visible and hidden physical properties from only a few examples. ComPhy includes three types of questions: factual questions for the composition between visible and hidden physical properties, counterfactual questions on objects’ physical properties like mass and charge, and predictive questions for objects’ future movement.

Download Link (CLEVRER)
Download Link (ComPhy)
Evaluation Server (CLEVRER)
Evaluation Server (ComPhy)

Invited Speakers

Jitendra Malik

Jacob Andreas

Jiasen Lu

Rowan Zellers


Timeline

Organizers

Yining Hong

Bo Wu

Zhenfang Chen

Qinhong Zhou

Mingyu Ding

Chuang Gan


Contact Info

E-mail: yininghong@cs.ucla.edu