About

Over the years, there have been a variety of visual reasoning tasks that evaluate machines’ ability to understand and reason about visual scenes. However, these benchmarks mostly focus on classification of objects and items that exist in a scene. Common sense reasoning – an understanding of what might happen next, or what gave rise to the scene – is often absent in these benchmarks. Humans, on the other hand, are highly versatile, adept in numerous high-level cognition-related visual reasoning tasks that go beyond pattern recognition and require common sense (e.g., physics, causality, functionality, psychology, etc).

In order to design systems with human-like visual understanding of the world, we would like to emphasize benchmarks and tasks that evaluate common sense reasoning across a variety of domains, including but not limited to:

Intuitive Physics:
A general understanding and expectations about the physical world (e.g., how things support, collide, fall, contain, become unstable etc.)
Intuitive Psychology & Social Science:
A basic understanding of inter-relations and interaction of agents; An understanding of instrumental actions (e.g., assistance, imitation, speech etc.); The ability to reason about hidden mental variables that drive observable actions.
Affordance & Functionality:
What actions of agents can be applied to objects; What functions objects provide for the agents.
Causality & Counterfactual Thinking:
Understanding of causes and effects; Mental representations of alternatives to past or future events, actions, or states.

Video recordings

Challenge Tracks

There will be two core tracks in the machine vision common sense challenge:

STAR Challenge

STAR (accepted by NeurIPS2021). STAR Benchmark is a novel dataset for Situated Reasoning, which provides challenging question-answering tasks, symbolic situation descriptions and logic-grounded diagnosis via real-world video situations. Reasoning in the real world is not divorced from situations. A key challenge is to capture the present knowledge from surrounding situations and reason accordingly. Situated reasoning is a cognitive process where a person uses context and environmental cues to make decisions and solve problems.

Download Link
STAR Challenge Evaluation

CLEVRER & ComPhy Challenge

CLEVRER (accepted by ICLR2020) and ComPhy (accepted by ICLR 2022). CLEVRER is a diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. Motivated by the theory of human casual judgment, CLEVRER includes four types of question: descriptive (e.g., “what color"), explanatory (”what’s responsible for"), predictive (”what will happen next"), and counterfactual (“what if"). ComPhy takes a step further and requires machines to learn the new compositional visible and hidden physical properties from only a few examples. ComPhy includes three types of questions: factual questions for the composition between visible and hidden physical properties, counterfactual questions on objects’ physical properties like mass and charge, and predictive questions for objects’ future movement.

Download Link (CLEVRER)
Download Link (ComPhy)
Evaluation Server (CLEVRER)
Evaluation Server (ComPhy)

Invited Speakers

Jitendra Malik

UC Berkeley

Jacob Andreas

MIT

Jiasen Lu

AI2

Rowan Zellers

Open AI

Timeline

Workshop
June 19, 2023 (afternoon)
Challenge
March 25 - May 20, 2023
Challenge Submission Deadlines
May 20, 2023. Check each challenge for the specific date.
Notification of Winner & Paper Invitation Deadline
before June 1, 2023 (Anywhere on Earth)