This is a dataset to evaluate active recognition agent in indoor environments. A total of 13,200 testing samples with 27 common object categories are contained in this dataset.
Moreover, to show the advantage of active recognition in handling recognition challenges by intelligently moving, we assign a recognition difficulty level to each testing sample, considering visibility, distance and observed pixels.
The agent is given the ability to move intelligently in order to perceive.
Active recognition is supposed to address recognition challenges that cannot be resolved by passive recognition.
{
'episode_id': 0, # The episode id or testing sample id.
'scene_id': '17DRP5sb8fy', # The corresponding Matterport3D scene id of this testing sample.
'start_position': [-4.457194805145264, 0.07244700193405151, -0.34131553769111633], # Starting location of the agent in meters.
'start_rotation': [0.7880107536067219, 0.0, 0.6156614753256583, 0.0], # Starting rotation of the agent in quaternion.
'target_categrory': 'shower', # Semantic lable of the target.
'target_id': 51, # Instance id in the current scene.
'target_center': [-7.461709976196289, 0.07809627056121826, 1.849059820175171], # The location of the target center in meters.
'euclidean_distance': 3.7181832017830274, # Distance between the agent and the target in meters
'unocc_percent': 0.6022278117139778, # The ratio between visible pixels and all pixels (visible or occluded) belonging to the target in the current viewing window.
'vis_percent': 0.1463435931019428, # The ratio between visible pixels and all pixels (visible or occluded) belonging to the target. It also considers pixels that are out of the viewing window.
'difficulty': 2, Designated recognition difficulty level. 0 - Easy | 1 - Moderate | 2 - Hard
'obs_pixels_unocc': 3352, Number of unocludded pixels.
'obs_pixels_vis_all': 5566, Number of pixels belonging to the target in the current viewing window.
'obs_pixels_all': 22905 Number of pixels belonging to the target.
}
'camera_resolution': [640, 800],
'camera_height': 1.0
We consider three aspects, i.e., visibility, relative distance and observed pixels, to assign the difficulty level. For details, please refer to Section 1.1 in our supplementary.
@article{fan2023evidential,
title={Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception},
author={Fan, Lei and Liang, Mingfu and Li, Yunxuan and Hua, Gang and Wu, Ying},
journal={arXiv preprint arXiv:2311.13793},
year={2023}
}