Data Card

Learn more about the contents and data collection methodology of the FHIBE Dataset.

For detailed information, please refer to FHIBE Datasheet and FHIBE Crowdwork Sheet.

Fields

Details

Fields

Key application for use

Details

Human-centric computer vision, machine learning fairness and robustness testing

Fields

Intended use cases (e..g, the CV tasks one can evaluate against)

Details

FHIBE is an evaluation dataset collected with the following tasks in mind: human pose estimation, face and body detection, face and body alignment, face and body parsing, face verification and image editing. The dataset was created to evaluate fairness and robustness across dimensions, such as data subject attributes, instruments and environments. Except for the development of tools designed only to assess fairness and mitigate biases, FHIBE is strictly an evaluation dataset and may not be used as training data for an AI system.  


Fields

Primary Data Type

Details

Images 

Fields

Nature of content

Details

  • Diverse, crowdsourced digital photographs of 1 or 2 people, conducting actions in real-world environments, under various environmental conditions and camera devices, across the world.
  • In addition to the primary dataset, there are two derivative face datasets: an unaligned face-cropped dataset and an aligned face-cropped dataset.

Fields

Dataset Characteristics

Details

  • Total number of images: 10,318
  • Total number of data subjects: 1,981
  • Maximum number of images/subject: 10
  • Total dataset size
    • Compressed: 182.5 GB
    • Full resolution: 532.2 GB

Fields

License

Details

FHIBE is owned by Sony AI who distribute the dataset under applicable terms of use and a custom license agreement. The dataset use is restricted to evaluation, with the one exception that it can be used for training bias detection or mitigation methods, for both research and commercial purposes.

Fields

Data collection sources

Details

Crowdsourced data collection via third party vendors

Fields

Data selection criteria

Details

Data vendors were presented with a set of diversity specifications that the collected images (in the aggregate) had to meet. These diversity specifications included attributes such as age, ancestry, pronouns, head and body poses, subject-object interactions, as well as various environment and camera conditions. Data subjects were recruited and tasked to submit images to meet the specifications as closely as possible. All appropriate rights and consents to the data had to be provided in writing from eligible data subjects (e.g., minimum age of majority, English proficient).

Fields

Sampling Methods, data distribution

Details

The dataset contains many demographic attributes for image subjects. FHIBE was specified to have a diverse distribution across a wide variety of attributes, including age, pronouns, and ancestry. FHIBE is approximately balanced on pronouns and apparent skin color, though lighter skin colors are under-represented. Similarly, African and Asian ancestries, and younger age groups are overrepresented, while older age groups are underrepresented.

Fields

Validation methods

Details

Validated by automated and manual means. Manual checks were done for:

  • Systematic errors on self-reported attributes
  • Correctness of labels of observable attributes
  • Removal of PII, IP and objectionable content
  • Validation of consent forms

Annotation Labels and Labelling Methods

Labelling Method

Label Categories

 

Labelling Method

From photograph’s EXIF metadata

Label Categories

Camera manufacturer

Camera model

Capture time

Capture date

Capture place

 

Image width

Image height

Shutter speed

Aperture

ISO

Focal length

Labelling Method

Self-identified / reported by each consensual image subject

Label Categories

Ancestry

Natural skin tone

Natural eye color(s)

 

Natural head hair type

Natural head hair color(s)

Natural face hair color(s)

Labelling Method

Self-identified / reported by each consensual image subject at the time of data capture

Label Categories

Age (biological)

Pronouns

Nationality(ies)

Country of residence

Disability(ies) (optional)

Height

Weight

Pregnancy status (optional)

Biologically related image subjects

Body pose

 

Apparent skin tone

Apparent eye color(s)

Apparent head hair type

Head hairstyle

Apparent head hair color(s)

Facial hairstyle

Apparent face hair color(s)

Facial marks

Subject-object interaction(s)

Subject-subject interaction(s)

Labelling Method

Reported by primary consensual image subject only at the time of data capture

Label Categories

Weather

Camera position

Illumination

Image capture scene

 

Image capture date

Image capture time window

Image capture place

Subject position

Labelling Method

Obtained from human annotators

Label Categories

Head pose

Segmentation masks

Facial bounding box

Camera distance

 

Keypoints

Nonconsensual person segmentation mask

Nonconsensual person bounding box