Difficulty-designated Dataset for Active Recognition

*Important notice

The proposed dataset for active recognition solely includes JSON files containing locations, class labels, difficulty levels, etc.
To effectively utilize this dataset, you will need the original Matterport3D scene dataset along with the Habitat simulator.
This page does not provide or grant access to the original Matterport3D scene dataset. Please review the Matterport3D dataset agreement and contact the dataset team to download the original scene dataset.

Overview

This is a dataset to evaluate active recognition agent in indoor environments. A total of 13,200 testing samples with 27 common object categories are contained in this dataset.
Moreover, to show the advantage of active recognition in handling recognition challenges by intelligently moving, we assign a recognition difficulty level to each testing sample, considering visibility, distance and observed pixels.

Two testing examples from the proposed dataset.

In the above figure, the target object is covered by a green mask, allowing for box, point or mask queries during testing. The targets are respectively occluded by the wall and the bed, thus visibility is calculated as the ratio of observed pixels to total pixels belonging to the target. The agent is then allowed to freely move in the scene to avoid negative recognition conditions and achieve better recognition performance.

Active Recognition

The agent is given the ability to move intelligently in order to perceive.

An agent can intelligently adjust its viewpoint to correct errors in recognition.

Why did we build this dataset?

Active recognition is supposed to address recognition challenges that cannot be resolved by passive recognition.

Ambiguous viewpoints

Heavy occlusions

Out-of-view conditions

Distant views

However, we do not have a dataset to evaluate the advantage of active recognition on handling these challenges. To better facilitate evaluation of active recognition in indoor simulator, we collect and propose this dataset.

How to use this dataset?

Obtain access and download Matterport3D scene dataset.
Setup a simulator that allows exploring Matterport3D dataset. We recommend using Habitat simulator which allows you to build a recognition agent to freely move in indoor scenes.
Download the proposed difficulty-designated dataset.
Customize a loading function to load JSON files in the proposed dataset. And then place the agent to the location given in the JSON file.

JSON file contains all testing samples (episodes) for the scene. And each testing sample looks like this.


                {

                  'episode_id': 0, # The episode id or testing sample id.

                  'scene_id': '17DRP5sb8fy', # The corresponding Matterport3D scene id of this testing sample.

                  'start_position': [-4.457194805145264, 0.07244700193405151, -0.34131553769111633], # Starting location of the agent in meters.

                  'start_rotation': [0.7880107536067219, 0.0, 0.6156614753256583, 0.0], # Starting rotation of the agent in quaternion.

                  'target_categrory': 'shower', # Semantic lable of the target.

                  'target_id': 51, # Instance id in the current scene.

                  'target_center': [-7.461709976196289, 0.07809627056121826, 1.849059820175171], # The location of the target center in meters.

                  'euclidean_distance': 3.7181832017830274, # Distance between the agent and the target in meters

                  'unocc_percent': 0.6022278117139778, # The ratio between visible pixels and all pixels (visible or occluded) belonging to the target in the current viewing window.

                  'vis_percent': 0.1463435931019428, # The ratio between visible pixels and all pixels (visible or occluded) belonging to the target. It also considers pixels that are out of the viewing window.

                  'difficulty': 2, Designated recognition difficulty level. 0 - Easy | 1 - Moderate | 2 - Hard 

                  'obs_pixels_unocc': 3352, Number of unocludded pixels. 

                  'obs_pixels_vis_all': 5566, Number of pixels belonging to the target in the current viewing window. 

                  'obs_pixels_all': 22905 Number of pixels belonging to the target. 

                }

The corresponding visualization of this testing sample. The target 'shower' is covered by a green mask.

For the agent, we have the following setup.


                'camera_resolution': [640, 800],
                'camera_height': 1.0

Please send me an email if you have any questions or encounter any trouble using this dataset.

How did we assign recognition difficulty levels?

We consider three aspects, i.e., visibility, relative distance and observed pixels, to assign the difficulty level. For details, please refer to Section 1.1 in our supplementary.

More Statistics

Instances for each category.

Instances of different distance ranges.

Instances of different difficulty levels for each category.

Instances of different distances for each category.

Instances of different visibility ranges for each category.

Instances of different occlusion ranges for each category.

BibTeX

@article{fan2023evidential,
  title={Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception},
  author={Fan, Lei and Liang, Mingfu and Li, Yunxuan and Hua, Gang and Wu, Ying},
  journal={arXiv preprint arXiv:2311.13793},
  year={2023}
}