FHIBE Datasheet | The Fair Human-Centric Image Benchmark

Fair Human-Centric Image Benchmark (FHIBE) Datasheet

Document authors: Wiebke Hutiri, Austin Hoag

Document reviewers: Jerone Andrews, Rebecca Bourke, William Thong, Victoria Matthews, Shruti Nagpal, Aida Rahmattalabi, Jinru Xue, Tiffany Georgievski, Alice Xiang

Last updated: 12 September 2025

1. Motivation

1.1 For what purpose was the dataset created? Was there a specific task in mind? Was there a specific gap that needed to be filled? Please provide a description.

The Fair Human-Centric Image Benchmark (FHIBE) dataset was created to evaluate fairness and robustness in human-centric computer vision applications across dimensions, such as data subjects (e.g., demographic information), instruments (e.g., camera hardware and software), and environments (e.g., illumination, camera distance), which are known to influence model performance.

The creation of FHIBE is motivated by the need for ethically-sourced, human-centric image datasets to audit and mitigate bias in computer vision models for a variety of relevant tasks. FHIBE is the first publicly available, demographically balanced, fairness benchmark with appropriate informed consent, licensing, compensation, and diverse, global representation. FHIBE was created intentionally with the following human-centric applications in mind:

Human Pose Estimation
Face and Body Detection
Face and Body Alignment
Face and Body Parsing
Face Verification
Image Editing

FHIBE is strictly an evaluation dataset and may not be used to train machine learning (ML) or artificial intelligence (AI) software, algorithms, or other technologies. The one exception is that the dataset can be used for training bias detection or mitigation methods. The dataset is intended to support research, development, evaluation and improvement of commercial and noncommercial ML and AI software, algorithms, and other technologies, provided that these are aligned with the terms of use of the dataset. FHIBE can enable researchers and practitioners to carry out disaggregated analysis of computer vision models along relevant axes, such as ancestry and pronouns, to identify the specific groups for which a model may be underperforming. Furthermore, FHIBE can enable efforts to develop corrective measures that address these biases and minimize the potential harm that model performance disparities can cause.

1.2 Who created this dataset (e.g., which team, research group) and on behalf of which entity (e.g., company, institution, organization)?

The dataset was created by Alice Xiang, Jerone T.A. Andrews, Rebecca L. Bourke, William Thong, Julienne M. LaChance, Tiffany Georgievski, Apostolos Modas, Aida Rahmattalabi, Yunhao Ba, Shruti Nagpal, Orestis Papakyriakopoulos, Dora Zhao, Jinru Xue, Victoria Matthews, Linxia Gong, Austin T. Hoag, Mircea Cimpoi, Swami Sankaranarayanan, Wiebke Hutiri, Morgan K. Scheuerman, Albert S. Abedi, Peter Stone, Peter R. Wurman, Hiroaki Kitano, Michael Spranger at Sony AI.

1.3 What support was needed to make this dataset? (e.g. who funded the creation of the dataset? If there is an associated grant, provide the name of the grantor and the grant name and number, or if it was supported by a company or government agency, give those details.)

The dataset was funded by Sony AI.

1.4 Any other comments?

FHIBE can be accessed free of charge here.

2. Composition

2.1 What do the instances that comprise the dataset represent (e.g., documents, photos, people, countries)? Are there multiple types of instances (e.g., movies, users, and ratings; people and interactions between them; nodes and edges)? Please provide a description.

FHIBE consists of diverse, crowdsourced digital photographs of 1 or 2 people, conducting actions in real-world environments, under various environmental conditions and camera devices, across the world. In addition to the primary dataset, we provide two derivative face datasets: an unaligned face-cropped dataset and an aligned face-cropped dataset.

2.2 How many instances are there in total (of each type, if appropriate)?

FHIBE contains 10,318 images of 1,981 unique consensual image subjects. Of these, 1,711 are primary image subjects and 417 are secondary subjects. On average there are 6 images per primary subject. The cropped face dataset has 10,941 images for 1,981 image subjects. The cropped-and-aligned set has 8,370 images from 1,824 subjects.

2.3 Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set? If the dataset is a sample, then what is the larger set? Is the sample representative of the larger set (e.g., geographic coverage)? If so, please describe how this representativeness was validated/verified. If it is not representative of the larger set, please describe why not (e.g., to cover a more diverse range of instances, because instances were withheld or unavailable).

FHIBE contains unique instances that have been directly collected from human subjects. The data collection from which FHIBE was curated contains more images than those included in the dataset.We excluded submitted images where consent could not be validated or that materially contained third party IP, including logos and copyrighted materials.

2.4 What data does each instance consist of? “Raw” data (e.g., unprocessed text or images) or features? In either case, please provide a description.

Each instance consists of a digital photograph file in PNG format. Accompanying each photograph is a separate metadata file stored in JavaScript Object Notation (JSON). The JSON metadata files contain information about the image subjects such as annotations of their physical attributes and actions, the camera settings used, and environmental factors such as lighting and the weather. An example photograph and the accompanying metadata are shown in Fig. 1 of the paper introducing FHIBE. An example JSON metadata file can be found on the FHIBE web portal where the dataset will be hosted.

2.5 Is there a label or target associated with each instance? If so, please provide a description.

The JSON files contain the following pixel level observational annotations for each consensual image subject in each image, obtained from a demographically diverse set of human annotators:

Face and body keypoint landmarks
Facial bounding box
Segmentation mask(s)

To accommodate fairness evaluations of a wide range of computer vision tasks, there are many labels associated with each instance, all contained within the JSON metadata file accompanying each image file. Refer to Section 3.1 for a full list of the additional labels collected for each image.

2.6 Is any information missing from individual instances? If so, please provide a description, explaining why this information is missing (e.g., because it was unavailable). This does not include intentionally removed information, but might include, e.g., redacted text.

No information is missing from individual instances.

2.7 Are relationships between individual instances made explicit (e.g., users’ movie ratings, social network links)? If so, please describe how these relationships are made explicit.

The relationship between each image and its corresponding JSON metadata file is made explicit through FHIBE’s file structure and file naming convention. Each primary consensual image subject has a unique identifier and folder, which contains subfolders for metadata and images of that subject. In addition, within each JSON metadata file, the relationship between images containing the same subject is made explicit.

2.8 Are there recommended data splits (e.g., training, development/validation, testing)? If so, please provide a description of these splits, explaining the rationale behind them.

FHIBE is strictly an evaluation dataset (with the narrow exception of training bias detection and mitigation methods) and may not be used to train machine learning or artificial intelligence software, algorithms, or other technologies. There are thus no recommended dataset splits.

2.9 Are there any errors, sources of noise, or redundancies in the dataset? If so, please provide a description.

The quality of the images and annotations was validated as described in the Methods section of the FHIBE paper. The self-reported attributes in the dataset, such as gender pronouns and age, are taken at face value as there is no way to validate them. The dataset also contains pixel level annotations drawn by humans, which contain some level of inherent noise.

An intentional redundancy is that the same subject appears in multiple photographs, as this is useful for evaluating some computer vision tasks. Furthermore, we recommend excluding images with two image subjects from face verification tasks, as primary and secondary image subject IDs have been allocated separately, meaning that in rare cases, a primary image subject can also be a secondary image subject with a different subject ID.

2.10 Is the dataset self-contained, or does it link to or otherwise rely on external resources (e.g., websites, tweets, other datasets)? If it links to or relies on external resources, a) are there guarantees that they will exist, and remain constant, over time; b) are there official archival versions of the complete dataset (i.e., including the external resources as they existed at the time the dataset was created); c) are there any restrictions (e.g., licenses, fees) associated with any of the external resources that might apply to a future user? Please provide descriptions of all external resources and any restrictions associated with them, as well as links or other access points, as appropriate.

FHIBE is entirely self-contained.

2.11 Does the dataset contain data that might be considered confidential (e.g., data that is protected by legal privilege or by doctor-patient confidentiality, data that includes the content of individuals’ non-public communications)? If so, please provide a description.

No.

2.12 Does the dataset contain data that, if viewed directly, might be offensive, insulting, threatening, or might otherwise cause anxiety?

Vendors were explicitly instructed that images should not contain objectionable content, and QA specialists were hired by Sony to verify that such content was not present including, but not limited to, graphic violence, nudity, explicit sexual activity and/or themes, cruelty, and/or obscene gestures. The vendor’s annotators flagged images if they contained objectionable content, including, but not limited to, graphic violence, nudity, explicit sexual activity and/or themes, cruelty, and/or obscene gestures. Any image that was flagged was removed before delivery. As an additional check, the FHIBE authors and QA specialists verified the contents of the images delivered by the vendors and removed any remaining objectionable images. Before releasing the dataset, we performed an automated check for child sexual abuse materials (CSAM) against the National Center for Missing & Exploited Children’s hashed database of known CSAM.

2.13 Does the dataset relate to people? If not, you may skip the remaining questions in this section.

Yes, FHIBE is a human-centric image dataset. Each image contains 1 or 2 consensual image subjects.

2.14 Does the dataset identify any subpopulations (e.g., by age, gender)? If so, please describe how these subpopulations are identified and provide a description of their respective distributions within the dataset.

The dataset contains many demographic attributes for image subjects. While FHIBE was specified to have an approximately uniform distribution across 50 intersectional groups based on age, pronouns, and ancestry, the actual distribution diverges as a result of our crowdsourced data collection process. FHIBE is approximately balanced on pronouns and apparent skin color, though lighter skin colors are under-represented. Similarly, African and Asian ancestries, and younger age groups are overrepresented, while older age groups are underrepresented.The distribution of FHIBE across intersectional groups is shown in Section G.2 of the Supplementary Information accompanying the paper.

Subpopulations were identified based on the rationales described in the FHIBE paper and “Ethical Considerations for Responsible Data Curation”, Andrews et al., 2023.

2.15 Is it possible to identify individuals (i.e., one or more natural persons), either directly or indirectly (i.e., in combination with other data) from the dataset? If so, please describe how.

Neither subjects nor annotators' names and email addresses are shared publicly, reducing the risk of re-identification. Additionally, we took privacy-enhancing measures to reduce the risk of re-identification, including: eliminating sensitive information related to the non-consensual individuals through full-body anonymization and text/multimodal personally identifiable information (PII) redaction, as well as redacting PII information from images with consensual subjects. We used a Subject ID and Annotator ID for unique (but not personal) identification:

Subject ID: Unique subject identifier which can be used for matching images of the same subject within the public dataset. The identifier is distinct from any identifier used internally by the vendor and Sony AI.
Annotator ID: Unique annotator identifier which can be used for matching annotations performed by the same annotator within the public dataset. The identifier is distinct from any identifier used internally by the vendor and Sony AI.

While these measures reduce the risk of re-identification and our terms of use prohibit re-identification, anonymity of image subjects and annotators cannot be guaranteed. The dataset contains personally identifiable information (namely, images), and it will be made public. Image subjects and annotators were informed of the low risk of re-identification when they read and signed their consent forms to participate in the dataset.

2.16 Does the dataset contain data that might be considered sensitive in any way (e.g., data that reveals racial or ethnic origins, sexual orientations, religious beliefs, political opinions or union memberships, or locations; financial or health data; biometric or genetic data; forms of government identification, such as social security numbers; criminal history)? If so, please provide a description.

Yes, for each consensual image subject, FHIBE contains the following data that is sensitive:

Age
Ancestry
Nationality
Pronouns
Skin tone
Height and Weight (combined into BMI and only reported in aggregate)
Difficulty/disability (optionally provided & only reported in aggregate)
Pregnancy status (optionally provided & only reported in aggregate)
Country of residence (optionally provided & only reported in aggregate)
Biologically related subjects (only reported in aggregate)

Each annotator optionally provided the following sensitive demographic information:

Age
Ancestry
Nationality
Pronouns
Country of residence (optionally provided & only reported in aggregate)

2.17 Any other comments?

No.

3. Collection

3.1 How was the data associated with each instance acquired? Was the data directly observable (e.g., raw text, movie ratings), reported by subjects (e.g., survey responses), or indirectly inferred/derived from other data (e.g., part-of-speech tags, model-based guesses for age or language)? If data was reported by subjects or indirectly inferred/derived from other data, was the data validated/verified? If so, please describe how.

The auxiliary data associated with each image was acquired using one of the following means:

Obtained from EXIF metadata of the photograph OR
Self-identified / reported by the primary consensual image subject OR
Self-identified / reported by the secondary consensual image subject OR
Obtained from human annotators OR
Annotated by the authors of FHIBE.

The table below lists the means through which each specific metadata instance was acquired. Further details are available in Section A of FHIBE’s supplementary materials.

Labelling Method	Label Categories
Labelling Method From photograph’s EXIF metadata	Label Categories Camera manufacturer Camera model Capture time Capture date Capture place	Image width Image height Shutter speed Aperture ISO Focal length
Labelling Method Self-identified / reported by each consensual image subject	Label Categories Ancestry Natural skin tone Natural eye color(s)	Natural head hair type Natural head hair color(s) Natural face hair color(s)
Labelling Method Self-identified / reported by each consensual image subject at the time of data capture	Label Categories Age (biological) Pronouns Nationality(ies) Country of residence Disability(ies) (optional) Height Weight Pregnancy status (optional) Biologically related image subjects Body pose	Apparent skin tone Apparent eye color(s) Apparent head hair type Head hairstyle Apparent head hair color(s) Facial hairstyle Apparent face hair color(s) Facial marks Subject-object interaction(s) Subject-subject interaction(s)
Labelling Method Reported by primary consensual image subject only at the time of data capture	Label Categories Weather Camera position Illumination Image capture scene	Image capture date Image capture time window Image capture place Subject position
Labelling Method Obtained from human annotators	Label Categories Head pose Segmentation masks Facial bounding box Camera distance	Keypoints Nonconsensual person segmentation mask Nonconsensual person bounding box

In addition to metadata, contact information including names and emails, and a signed consent form were collected from data subjects. These will not be released and are stored securely, separate from the remaining dataset.

Contact information
Consent form

The following demographic information has been self-reported by data annotators:

Annotator Age
Annotator Pronouns
Annotator Ancestry
Annotator Nationality
Annotator Country of Residence [only released as aggregate statistics or on request]
Annotator Contact Information [not released and stored separately]
Annotator Consent Form [not released and stored separately]

While all data entries were validated by vendors, quality assurance (QA) workers and the dataset creators, checks of many self-reported annotations were for systematic errors only, as these attributes cannot be validated by looking at the submitted image. For apparent skin tone, eye color(s), head hair type, face and head hair color(s), facial and head hairstyles, facial marks and body pose, QA workers could provide their own annotations if they believed the original annotator’s work to be incorrect. Similarly, QA workers could provide their own annotations for labels reported by primary consensual image subjects only, and for labels obtained from human annotators.

3.2 Over what timeframe was the data collected? Does this timeframe match the creation timeframe of the data associated with the instances (e.g., recent crawl of old news articles)? If not, please describe the timeframe in which the data associated with the instances was created. Finally, list when the dataset was first published.

The images were collected from 23 April 2023 onwards. The dataset was first published in 2025. Data subjects were able to submit photos they had previously taken, provided they were captured with a digital device released in or after 2011. The images in the dataset thus span a time frame from 2011 - 2024.

3.3 What mechanisms or procedures were used to collect the data (e.g., hardware apparatus or sensor, manual human curation, software program, software API)? How were these mechanisms or procedures validated?

Data vendors were carefully selected and vetted by Sony, following a three-stage process. The initial vendor vetting process assessed the crowd diversity, annotation offering, timelines, and pricing of 10 candidate vendors. In addition, we checked online for any significant complaints or legal action against vendors by any previously contracted or employed image subjects and annotators.

Six shortlisted vendors then participated in a small-scale trial, where the quality of their deliverables, adherence to requirements and specifications, privacy standards and operational efficiency were evaluated. In the final stage, four vendors were selected after an in-depth review of their logistical setup for security-related risks and ability to meet relevant regulations, in particular with regards to privacy.

The following criteria were considered in the selection of vendors:

Services offered
Ability to collect informed consent from data subjects
Diversity of crowd worker pool
Diversity of sample images, including diversity of human actions and groups of people
Data and annotation quality of sample
Lead time
Cost

Prior to launching the data collection process, Sony validated for each participating vendor that:

Consent could be properly collected from image subjects and data annotators.
Mechanisms existed for excluding residents from restricted or sanctioned countries from data collection and annotation.
Image subjects were informed of payment rates in advance of starting the project.
Data annotators and image subjects will be paid at least minimum wage in their country of residence for their work.
Mechanisms were in place for image subjects to agree not to submit data to multiple vendors for this project.
Mechanisms were in place to collect demographic metadata, labels, and reporting requirements for data annotations as per the specifications provided by Sony

Image subject and data annotator recruitment

The recruitment of data subjects was conducted by data vendors. Sony provided the data vendors with a set of diversity specifications that the collected images have to meet. These diversity requirements were placed at the aggregate level and not at the individual subject level. Vendors were required to abide by Sony's guidelines for recruitment.

Data subjects were eligible to participate if:

they were above the age of majority in their country of residence.
they did not reside in California, Illinois, Washington, or Texas; (OFAC) sanctioned countries or mainland China.

All project materials were provided in English. Data vendors were required to validate the English language ability of all data subjects. The English language proficiency test consisted of the vendor selecting three multiple choice questions to present to each data subject before they began working on the project. Vendors were provided with a question bank, and required to randomly select three questions to ask each data subject. In order for a data subject to qualify to participate in the project, they had to answer at least two out of the three questions correctly. Data subjects had two opportunities to take and pass the test. The results of the English language test were not shared with Sony.

In total, 296 annotators and 184 QA workers participated in the project.

In addition to vendor-contracted workers, we contracted QA specialists who worked with us internally to validate collected data.

Image creation

Image subjects could use a camera of their choice to create photographic images. As the type of camera used presents important instrument level metadata, image subjects had to report this, and it is included as metadata with each data instance. To ensure that images are of high enough quality to use in the dataset, data subjects were instructed to take images with a device that meets the following requirements:

The device must be digital such as a smartphone, DSLR camera, or compact camera. For example, an iPhone XR, Google Pixel 5, Fujifilm X-T4, Sony A7 III, Nikon Coolpix B500, Ricoh GR II.
The device must be able to record Exif data when capturing images.
The device must capture images using at least an 8-megapixel camera.
The device must have been released in the year 2011 or later.

In addition:

Images should be uploaded using the device's default viewable output format
Images must not be post-processed including any additional compression.
Images must not be panoramas.
The aspect ratio of an image must be less than 2:1.
Images must not be captured using a fisheye lens or any other lens that result in spherical distortions.
Images should not be captured using digital zoom; however, optical zoom is permitted.
Images must not be captured while using filters.
Excessively blurry images caused by motion blur, or otherwise, were rejected unless some degree of motion blur is unavoidable due to the action that a subject is performing (e.g., “Running/jogging”).
Images must not be captured using the “Bokeh” (shallow depth-of-field) effect. That is, the majority of the image, in particular the background and primary subject (and if relevant also the secondary subject), must be in focus.
Image subjects were informed that they should not submit any images that are reflections or drawings of themselves

Uploaded images were retrospectively checked to ensure that they meet these requirements. For the majority of requirements the checks were automated.

3.4 What was the resource cost of collecting the data? (e.g. what were the required computational resources, and the associated financial costs, and energy consumption) - estimate the carbon footprint.

The collection of the images cost around $308,500 USD. There were additional fixed costs of around $450,000 USD for quality assurance, legal services, and the cost of building the data platform. These cost estimates do not include the work of the 25 researchers, engineers, and program managers who contributed to the project.

3.5 If the dataset is a sample from a larger set, what was the sampling strategy (e.g., deterministic, probabilistic with specific sampling probabilities)?

We included all collected images that met our quality assessment, consent and privacy criteria in the released dataset. Furthermore, data records were selected to ensure that the final dataset has a balanced representation across intersectional demographic groups. For intersectional groups that had more images than what were required, we selected images using random downsampling.

3.6 Who was involved in the data collection process (e.g., students, crowd workers, contractors) and how were they compensated (e.g., how much were crowd workers paid)?

The data collection was conducted by external, professional data vendors, including their in-house annotators and QA workers, and by existing and new crowd workers recruited by data vendors. The crowdworkers are image subjects and may be data annotators. We refer to image subjects and data annotators that provide personal information to the project as data subjects. Data vendors were permitted to subcontract data collection with prior authorization from Sony. All data subjects were compensated for their time at rates at or above the equivalent hourly minimum wage in their country of residence. Data vendors reported on their data collection progress on a regular basis.

3.7 Were any ethical review processes conducted (e.g., by an institutional review board)? If so, please provide a description of these review processes, including the outcomes, as well as a link or other access point to any supporting documentation.

Yes, ethical review processes and a data privacy impact assessment were conducted by in-house legal counsel, outside counsel, and Sony's AI Ethics Office. Before the start of the data collection, the project obtained IRB approval, which was independently reviewed by WCG (https://www.wcgclinical.com/about/). The collection of the data was considered to pose no more than minimal risk to data subjects, as it is unlikely that the action of image capture would result in physical or financial harm beyond the risks associated with the data subject's own daily life. The project also underwent Sony Research’s internal Research Ethics Review Process.

3.8 Does the dataset relate to people? If not, you may skip the remainder of the questions in this section.

Yes, FHIBE is a human-centric image dataset.

3.9 Did you collect the data from the individuals in question directly, or obtain it via third parties or other sources (e.g. websites)?

The dataset was collected directly from individuals, through a compensated crowdsourcing approach. The crowd workers were recruited and managed by data vendors.

3.10 Were the individuals in question notified about the data collection? If so, please describe (or show with screenshots or other information) how notice was provided, and provide a link or other access point to, or otherwise reproduce, the exact language of the notification itself.

Yes, data subjects were aware of the data collection. Further, crowdsourced data subjects could only participate in the data collection after signing a consent form. In the consent form image data subjects were informed that their data was being collected on behalf of Sony. No data was collected from image subjects whose ability to protect themself from neglect, abuse, or violence could have been significantly impaired on account of disability, illness, cognitive impairment, or otherwise (i.e., vulnerable adults). Sony provided specifications for the content to be included in messaging sent to potential primary image data subjects. Messages sent to image subjects had to state the following:

An invitation to participate in an image collection and annotation project as image subjects.
The goal of the project, namely to build a large set of annotated images to help ML and AI systems perform more fairly and ethically for diverse people.
That image subjects’ images, demographics and related annotations would be shared publicly in order to develop and evaluate ML and AI software, algorithms and other technologies.
That the image subject’s name and email address will not be shared publicly.
That re-identification of image subjects is prohibited, and data users must consent to this.
Requirements for the images to be submitted, and the type of self-reported demographics to be provided.
Requirements for the recording device (i.e. camera or smartphone).
Requirement for image subjects to sign a consent form, and to be of majority age in their country of residence.
Pay range.
Potential risks from participating in the project where their images would be made publicly available such as possible use of their images for unauthorized or unintended purposes, such as catfishing, deep fakes, or other commercial or political activities.
Contact details for further information.

3.11 Did the individuals in question consent to the collection and use of their data? If so, please describe (or show with screenshots or other information) how consent was requested and provided, and provide a link or other access point to, or otherwise reproduce, the exact language to which the individuals consented.

Sony required explicit informed consent from each person depicted in or otherwise identifiable by the dataset using a form designated by Sony. Image data subjects voluntarily consented to the following:

The sharing of their data, including their facial, body, biometric, and other images and information about them and their surroundings.
The use of their data for the purposes outlined in the Motivation section.
The transfer of their data to the US and other jurisdictions that may not provide an equivalent level of protection as their home country.

The consent form was made available to, and signed by, image subjects electronically. All subjects were provided with a hardcopy or an electronic version of the consent form.

3.12 If consent was obtained, were the consenting individuals provided with a mechanism to revoke their consent in the future or for certain uses? If so, please provide a description, as well as a link or other access point to the mechanism (if appropriate)

Withdrawal from the project during data collection

While data collection was ongoing, data subjects could discontinue their participation without any repercussions by exiting out of the user interface. All data of data subjects was deleted if they submitted a request for this.

Consent revocation after data collection

Sony has made provisions to fulfill ongoing operational obligations to service consent revocation requests from data subjects within a 30 day timeframe for as long as the dataset is available. In this way we uphold a relationship with, and enable ongoing participation from, data subjects, as well as dataset users, who must agree to and comply with data processing obligations while they are licensees of the dataset.

The provisions that have been made entail the following:

Individuals can submit consent revocation and deletion requests via email and the FHIBE platform. We will also process revocation requests submitted in a different format.
Deletion requests can be for all or some of the individual’s personal data that Sony controls.
Deletion requests will be processed in a timeframe that corresponds with applicable law and Sony’s data handling and retention protocols.
Data subjects can contact Sony via the email address included in the consent form or via the publicly available FHIBE website.
Before deleting information, the identity of the data subject will be verified and the consent revocation will be confirmed.
An updated version of the dataset will be uploaded, omitting the relevant images and related annotations, and all dataset users will be instructed via email to delete the prior version. Consent revocation will also be applied to the data controlled by Sony (e.g., internal records), as appropriate.
An updated dataset will be provided and all dataset users will be informed via email to delete the prior version.
Consent withdrawal and removal from the dataset has no impact on any data subject compensation.

Data subjects were informed and agreed that while a revocation of consent would result in the deletion of their data from the dataset maintained by Sony, the revocation could not affect any publications, software, algorithms or technology derived from their data by third parties prior to the revocation.

In addition to the right to revoke consent, individuals can request a copy of their data, can submit complaints, and may have other rights depending on the applicable law of their country of residence. Data subjects were provided with a copy of their Participant Informed Consent and Release Form.

3.13 Has an analysis of the potential impact of the dataset and its use on data subjects (e.g., a data protection impact analysis) been conducted? If so, please provide a description of this analysis, including the outcomes, as well as a link or other access point to any supporting documentation.

The various assessments that Sony conducted are described in Question 3.7. Here we list risks that have been identified, and actions taken by Sony to mitigate these risks.

Risk for data subjects

There exists a possibility that upon public release of the dataset, image information or annotations could be:

Used for unauthorized or unintended purposes, such as catfishing or creating deep fakes
Sold to third parties for various purposes, such as for advertising or other commercial or political activities.
Metadata and annotations about data subjects can be matched to their images. These metadata can pose a risk to data subjects. For example, in certain jurisdictions, information about someone's self-identified pronouns may be sensitive and create the potential for discrimination on the basis of gender identity.

The likelihood of harm from these activities is comparable to the risks that individuals take in everyday life when posting images and personal information to public websites or on social media platforms. Sony has taken extensive mitigating actions to protect data subjects:

Self-identification: Data subjects could moderate their risk by self-selecting categories for personal and demographic information that is acceptable to their context and circumstances.
Voluntary disclosure: For several self-reported demographic attributes (pronouns, difficulty/disability, pregnancy, sub-continental ancestry) a “Prefer not to say” response option was provided during data collection to make the disclosure of sensitive information voluntary.
Risk awareness: Data subjects were informed of the potential, but unlikely risks associated with submitting their data.
Consent revocation: Data subjects can revoke consent to have their personal information removed.
Removal of sensitive metadata: Certain collected metadata will not be made publicly available and only provided in aggregate form. These metadata are: disability, pregnancy, height, weight, country of residence, biologically related subjects.
Secure data storage: All data is stored and hosted securely on a S3 server located in the United States, following established data security protocols of Sony.
Anonymization: Data subjects’ names and email addresses are stored separately from the images and metadata and will not be shared publicly. The unique identifiers with which data subjects can be linked to their names and email addresses have been anonymized.
Terms of use: Individuals accessing the image dataset will be contractually restricted from attempting to re-identify subjects, from using FHIBE for training AI software, algorithms, or other technologies (with the narrow exception of training bias detection or mitigation methods), and from using FHIBE for specific purposes that fall outside of its intended scope.

Data access control: The dataset will be made available on request to users after they agree to our terms of use.

Risk for annotators

Data annotators might be exposed to images that are offensive or triggering.
It is possible that metadata of annotators could be acquired by third parties with malicious intent. Without access to internal documents, the risk of re-identification and subsequent harm from such activities is however very low.

Mitigating actions taken by Sony to protect data annotators:

Restricted image content: Sony explicitly prohibited data subjects from submitting images that contain offensive content whatsoever, including, but not limited to: explicit nudity or sexual content; violence; visually disturbing content; rude gestures; drugs; hate symbols; and vulgar text.
Manual checks for offensive content: All images were reviewed manually by Sony's internal QA specialists.
Automated violent content check: Automated checks were done to check for violent and explicit content.
Consent revocation: Annotators can revoke consent to have their demographic metadata removed.
Secure data storage: All data is stored and hosted securely, following established data security protocols of Sony.
Anonymization: Data annotators’ names and email addresses are stored separately from their demographic information, and will not be shared publicly. The unique identifiers with which data annotators can be linked to their names and email addresses have been anonymized.

Other risks

There exists a possibility that some of the images that appear in the FHIBE dataset were submitted by bad actors who scraped them from the web.

Mitigating actions taken by Sony to protect users and people from other risks:

Sony deployed a reverse image match using Google’s image web search API. While helpful, such automated checks are not totally accurate when applied to person detection.

3.14 Any other comments?

No.

4. Preprocessing / Cleaning / Labeling

4.1 Was any preprocessing/cleaning/labeling of the data done(e.g., discretization or bucketing, tokenization, part-of-speech tagging, SIFT feature extraction, removal of instances, processing of missing values)? If so, please provide a description. If not, you may skip the remainder of the questions in this section.

Preprocessing

Prior to the release of the dataset, we took extensive privacy-preserving measures to reduce the risk of re-identification of data subjects. This includes the following:

Data subjects’ names and email addresses in the metadata have been removed.
The exact GPS location information from each image’s Exif metadata has been replaced with the city and country of image capture, as self-reported by the primary image subject.
All sensitive information related to non-consensual individuals has been eliminated through full-body anonymization
Personally identifiable information in text and images has been redacted from images with consensual subjects.

In particular, based on a list of 42 image-level privacy attributes assembled from sources such as the US Privacy Act 1974 and the EU Data Protection Directive 95/46/EC, the following image-level attributes were identified as containing personally identifiable information, and have been redacted:

Access card/badge
Birth dates
Business cards
Computer/Phone/Tablet screen (if the screen is turned on)
Credit cards, cheque books
Email addresses
Exact addresses
Health insurance cards
License plates
Medical records
Names
Passports, driving licenses, ID cards
Phone numbers

Post/mail
QR codes
Receipts, prescriptions
Signatures
Street address plate (in residential areas)
Street number
Text on documents/notebooks
Text on labels/tags (e.g., on a drawer/filing cabinet)
Text on envelopes
Text on postcards
Text on sticky notes
Text on whiteboards

Labeling

Image subjects self-identified the following personal attributes:

identity, age, gender identity, biological sex, nationality, ancestry, skin tone, disability, and pregnancy status, eye color, head hair type, head hairstyle, head hair color, facial hairstyle, facial hair color, height, weight, and facial marks.

The primary image subject self-reported the following image attributes:

body pose, subject-object interactions, subject-subject interactions, date, time, weather, illumination, scene, camera, camera position, and city and country of image capture.

Furthermore, the following observational annotations and target labels were obtained from a demographically diverse set of annotators:

head pose, face and body keypoint landmarks, face and body bounding boxes, and segmentation mask

Annotators self-identified all demographic data. If annotators were employed by vendors, this was optional.

4.2 Was the “raw” data saved in addition to the preprocessed/cleaned/labeled data (e.g., to support unanticipated future uses)? If so, please provide a link or other access point to the “raw” data.

Yes, the raw data has been saved but is not publicly available.

4.3 Is the software used to preprocess/clean/label the instances available? If so, please provide a link or other access point.

https://github.com/SonyResearch/fairness-benchmark

4.4 Any other comments?

Users are able to create derivatives of the dataset annotations, provided that these annotations are explicitly identified as having been added by the user, and that they comply with the dataset's acceptable uses. We do not prescribe a license for derived annotations. For clarity, users are not able redistribute or create derivative works of the dataset images themselves.

5. Uses

5.1 Has the dataset been used for any tasks already? If so, please provide a description.

Yes, the dataset has been validated for use as an evaluation dataset on the following tasks:

Pose estimation
Person segmentation
Person detection
Face detection
Face parsing
Face verification
Face reconstruction
Face super-resolution
Visual question answering

5.2 Is there a repository that links to any or all papers or systems that use the dataset? If so, please provide a link or other access point.

The dataset and all supplementary information are available at https://ai.sony/fairness-benchmark. We will keep an internal record of all third parties who download the dataset, their affiliation and intended use so that third parties can receive dataset updates in connection with consent revocations. We will not track technical usage of the dataset.

5.3 What (other) tasks could the dataset be used for?

The dataset could also be used for:

Body and Face Landmark Detection
Face Alignment
Image Editing
Person parsing
Face hallucination

5.4 Is there anything about the composition of the dataset or the way it was collected and preprocessed/cleaned/labeled that might impact future uses? For example, is there anything that a future user might need to know to avoid uses that could result in unfair treatment of individuals or groups (e.g., stereotyping, quality of service issues) or other undesirable harms (e.g., financial harms, legal risks) If so, please provide a description. Is there anything a future user could do to mitigate these undesirable harms?

Images capture a moment in time. In this regard, FHIBE reflects the image subjects and their environment as they were at the moment of data capture, not as they will be in future. This may be particularly relevant to ornamental, cultural, contextual and environmental information contained in images. While these changes may only be observable over time horizons of a decade or more, they may result in distribution shifts that lead to outdated and stereotypical representations of people.

It is unlikely that the action of image capture would result in physical or financial harm beyond the risks associated with the subject's own daily life, as subjects are able to submit pre-existing images. Subjects’ names and email addresses will not be shared publicly, and individuals accessing the image set will be contractually restricted from attempting to re-identify subjects, along with other reasonable use restrictions.

5.5 Are there tasks for which the dataset should not be used? If so, please provide a description.

FHIBE is strictly to be used for evaluation and for fairness purposes. FHIBE cannot be used to train machine learning or artificial intelligence software, algorithms, or other technologies. The sole exception is the usage of the data for the explicit development of tools to assess fairness and mitigate biases in machine learning models and implementations.

It is prohibited to attempt to re-identify data subjects. Furthermore, the extraction, inference or prediction of image subjects' sensitive or subjective, non-observable features, such as gender, race, ethnicity, political opinion, religious beliefs, genetic data and any other group membership is forbidden.

The dataset may not be used to evaluate tools that perform biometric categorisation, emotion recognition, physiognomic recognition, that are used for law enforcement purposes, suspicious behavior detection, and specific high risk applications that appear in settings such as education, work, social credit systems, immigration, and law.

5.6 Any other comments?

No.

6. Distribution

6.1 Will the dataset be distributed to third parties outside of the entity (e.g., company, institution, organization) on behalf of which the dataset was created? If so, please provide a description.

Yes, FHIBE will be publicly available for download and use by varying audiences, including research, academic, and for-profit entities.

6.2 How will the dataset be distributed (e.g., tarball on website, API, GitHub)?

FHIBE is accessible via an online platform. To download the data, users will need to sign up and create a user account on the platform by providing their name, organization, intention for using the data, and email address. Users will also need to agree to the license and terms of use of the FHIBE dataset. The FHIBE platform contains additional information on how to obtain the dataset.

6.3 Does the dataset have a digital object identifier (DOI)?

FHIBE will be released with a DOI.

6.4 When will the dataset be distributed?

FHIBE will be made publicly available here.

6.5 Will the dataset be distributed under a copyright or other intellectual property (IP) license, and/or under applicable terms of use (ToU)?

FHIBE is entirely owned by Sony AI and is distributed under applicable terms of use and custom license.

6.6 If so, please describe this license and/or ToU, and provide a link or other access point to, or otherwise reproduce, any relevant licensing terms or ToU, as well as any fees associated with these restrictions.

FHIBE Terms of Use can be found here.

6.7 Have any third parties imposed IP-based or other restrictions on the data associated with the instances? If so, please describe these restrictions, and provide a link or other access point to, or otherwise reproduce, any relevant licensing terms, as well as any fees associated with these restrictions.

No.

6.8 Do any export controls or other regulatory restrictions apply to the dataset or to individual instances?

Users are restricted from accessing FHIBE from a country that is subject to a U.S. Government embargo.

6.9 If so, please describe these restrictions, and provide a link or other access point to, or otherwise reproduce, any supporting documentation.

N/A

6.10 Any other comments?

No.

7. Maintenance

7.1 Who is supporting/hosting/maintaining the dataset?

FHIBE is supported, maintained, and hosted by Sony AI.

7.2 How can the owner/curator/manager of the dataset be contacted (e.g., email address)?

Sony AI can be contacted using the contact form on the FHIBE platform.

7.3 Is there an erratum? If so, please provide a link or other access point.

There is not an explicit erratum. In the event that any errors are found, FHIBE will be updated and a new version will be released. Updates regarding FHIBE will be communicated via the FHIBE platform and via email to FHIBE users (i.e., those who agreed to the FHIBE license) if necessary.

7.4 Will the dataset be updated (e.g., to correct labeling errors, add new instances, delete instances)? If so, please describe how often, by whom, and how updates will be communicated to users (e.g., mailing list, GitHub)?

FHIBE will be updated when image subjects withdraw their consent. In such a case, instances where consent has been revoked will be replaced and a new dataset will be released under a new version. When FHIBE is updated, FHIBE users will be contacted via email and instructed to delete the prior version of FHIBE. Further details on how images will be replaced in the case of consent revocation are detailed in FHIBE’s Methods.

7.5 If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)? If so, please describe these limits and explain how they will be enforced.

FHIBE is designed to comply with the GDPR, including its data minimization principles. The retention of an individual’s particular data in FHIBE is subject to applicable law and data subject revocation, and in no way will be retained for more than 20 years. The FHIBE project itself may exist beyond 20 years, as data gets replaced over time.

7.6 Will older versions of the dataset continue to be supported/hosted/maintained? If so, please describe how. If not, please describe how its obsolescence will be communicated to users.

Older versions of FHIBE will be withdrawn. FHIBE users must delete the prior version of FHIBE as agreed in Terms of Use.

7.7 If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? If so, please provide a description. Will these contributions be validated/verified? If so, please describe how. If not, why not? Is there a process for communicating/distributing these contributions to other users? If so, please provide a description.

Yes, the terms of use permit the addition and modification of the annotations, provided that the modifications do not otherwise violate the terms of use. Any modified version of the annotations must clearly identify the modifications from the official FHIBE dataset. Sony does not intend to validate or verify versions of the dataset created by others containing modified annotations.

7.8 Any other comments?

No.