Introduction
Over the years, there have been a variety of visual reasoning tasks that evaluate machines’ ability to understand and reason about visual scenes. However, these benchmarks mostly focus on classification of objects and items that exist in a scene. Common sense reasoning – an understanding of what might happen next, or what gave rise to the scene – is often absent in these benchmarks. Humans, on the other hand, are highly versatile, adept in numerous high-level cognition-related visual reasoning tasks that go beyond pattern recognition and require common sense (e.g., physics, causality, functionality, psychology, etc). In order to design systems with human-like visual understanding of the world, we would like to emphasize benchmarks and tasks that evaluate common sense reasoning across a variety of domains, including but not limited to:
- Intuitive Physics: A general understanding and expectations about the physical world (e.g., how things support, collide, fall, contain, become unstable etc.)
- Intuitive Psychology & Social Science: A basic understanding of inter-relations and interaction of agents; An understanding of instrumental actions (e.g., assistance, imitation, speech etc.); The ability to reason about hidden mental variables that drive observable actions.
- Affordance & Functionality: What actions of agents can be applied to objects; What functions objects provide for the agents.
- Causality & Counterfactual Thinking: Understanding of causes and effects; Mental representations of alternatives to past or future events, actions, or states.
Challenges
There will be five tracks in the machine vision common sense challenge:
- Physion: This track is on the Physion dataset accepted by NeurIPS2021 benchmark track. The Physion dataset measures machines' ability to make predictions about commonplace real world physical events, and covers a wide variety of physical phenomena – rigid and soft-body collisions, stable multi-object configurations, rolling and sliding, projectile motion.
Download link: https://github.com/cogtoolslab/physics-benchmarking-neurips2021 - PTR: This track is on the PTR dataset accepted by NeurIPS 2021. The PTR dataset is focused on common sense reasoning on parts and objects. It includes five types of questions: concept, relation (geometric & spatial), analogy, arithmetic and physics. PTR requires machines to answer these questions based on synthetic RGBD scenes.
Download link: http://ptr.csail.mit.edu/
Evaluation Server: https://eval.ai/web/challenges/challenge-page/1428/overview - AGENT & BIB: This track is on the AGENT benchmark accepted by ICML2021 and the BIB benchmark accepted by NeurIPS2021. AGENT is a dataset for Machine Social Common Sense. It consists of a large dataset of procedurally generated 3D animations, AGENT (Action, Goal, Efficiency, coNstraint, uTility), structured around four scenarios (goal preferences, action efficiency, unobserved constraints, and cost-reward trade-offs) that probe key concepts of core intuitive psychology. The Baby Intuitions Benchmark (BIB) challenges machines to predict the plausibility of an agent's behavior based on the underlying causes of its actions.
Download link: https://www.tshu.io/AGENT/
https://www.kanishkgandhi.com/bib
- CLEVRER & ComPhy: This track is the challenge on the CLEVRER accepted by ICLR2020 and ComPhy dataset accepted by ICLR 2022. CLEVRER is a diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. Motivated by the theory of human casual judgment, CLEVRER includes four types of question: descriptive (e.g., “what color"), explanatory (”what’s responsible for"), predictive (”what will happen next"), and counterfactual (“what if"). ComPhy takes a step further and requires machines to learn the new compositional visible and hidden physical properties from only a few examples. ComPhy includes three types of questions: factual questions for the composition between visible and hidden physical properties, counterfactual questions on objects’ physical properties like mass and charge, and predictive questions for objects’ future movement.
Download link: http://clevrer.csail.mit.edu/#Dataset/ https://comphyreasoning.github.io/#dataset
Evaluation Server: https://eval.ai/web/challenges/challenge-page/667/overview
Organizers
|
|
|
|
Yining Hong |
Fish Tung |
Kevin Smith |
Zhenfang Chen |
|
|
|
|
|
|
|
|
Tianmin Shu |
Elias Wang |
Kanishk Gandhi |
|
|
|
|
|
Senior Organizers
|
|
|
|
Joshua B. Tenenbaum |
Antonio Torralba |
Dan Yamins |
Judith Fan |
|
|
|
|
|
|||
Chuang Gan |
|||
|
|||
Invited Speakers
|
|
|
|
Jiajun Wu |
Leslie Kaebling |
Nick Haber |
Tao Gao |
|
|
|
|
|
|||
Moira Dillon |
|||
|
|||
Contact Info
E-mail: yininghong@cs.ucla.edu