Data Card
Learn more about the contents and data collection methodology of the FHIBE Dataset.
Fields | Details |
|---|---|
Fields Key application for use | Details Human-centric computer vision, machine learning fairness and robustness testing |
Fields Intended use cases (e..g, the CV tasks one can evaluate against) | Details FHIBE is an evaluation dataset collected with the following tasks in mind: human pose estimation, face and body detection, face and body alignment, face and body parsing, face verification and image editing. The dataset was created to evaluate fairness and robustness across dimensions, such as data subject attributes, instruments and environments. Except for the development of tools designed only to assess fairness and mitigate biases, FHIBE is strictly an evaluation dataset and may not be used as training data for an AI system. |
Fields Primary Data Type | Details Images |
Fields Nature of content | Details
|
Fields Dataset Characteristics | Details
|
Fields License | Details FHIBE is owned by Sony AI who distribute the dataset under applicable terms of use and a custom license agreement. The dataset use is restricted to evaluation, with the one exception that it can be used for training bias detection or mitigation methods, for both research and commercial purposes. |
Fields Data collection sources | Details Crowdsourced data collection via third party vendors |
Fields Data selection criteria | Details Data vendors were presented with a set of diversity specifications that the collected images (in the aggregate) had to meet. These diversity specifications included attributes such as age, ancestry, pronouns, head and body poses, subject-object interactions, as well as various environment and camera conditions. Data subjects were recruited and tasked to submit images to meet the specifications as closely as possible. All appropriate rights and consents to the data had to be provided in writing from eligible data subjects (e.g., minimum age of majority, English proficient). |
Fields Sampling Methods, data distribution | Details The dataset contains many demographic attributes for image subjects. FHIBE was specified to have a diverse distribution across a wide variety of attributes, including age, pronouns, and ancestry. FHIBE is approximately balanced on pronouns and apparent skin color, though lighter skin colors are under-represented. Similarly, African and Asian ancestries, and younger age groups are overrepresented, while older age groups are underrepresented. |
Fields Validation methods | Details Validated by automated and manual means. Manual checks were done for:
|
Annotation Labels and Labelling Methods
Labelling Method | Label Categories |
|
|---|---|---|
Labelling Method From photograph’s EXIF metadata | Label Categories Camera manufacturer Camera model Capture time Capture date Capture place |
Image width Image height Shutter speed Aperture ISO Focal length |
Labelling Method Self-identified / reported by each consensual image subject | Label Categories Ancestry Natural skin tone Natural eye color(s) |
Natural head hair type Natural head hair color(s) Natural face hair color(s) |
Labelling Method Self-identified / reported by each consensual image subject at the time of data capture | Label Categories Age (biological) Pronouns Nationality(ies) Country of residence Disability(ies) (optional) Height Weight Pregnancy status (optional) Biologically related image subjects Body pose |
Apparent skin tone Apparent eye color(s) Apparent head hair type Head hairstyle Apparent head hair color(s) Facial hairstyle Apparent face hair color(s) Facial marks Subject-object interaction(s) Subject-subject interaction(s) |
Labelling Method Reported by primary consensual image subject only at the time of data capture | Label Categories Weather Camera position Illumination Image capture scene |
Image capture date Image capture time window Image capture place Subject position |
Labelling Method Obtained from human annotators | Label Categories Head pose Segmentation masks Facial bounding box Camera distance |
Keypoints Nonconsensual person segmentation mask Nonconsensual person bounding box |
