Active Evaluation Software for Selection of Ground Truth Labels

This software repository contains a python package Aegis (Active Evaluator Germane Interactive Selector) package that allows us to evaluate machine learning systems's performance (according to a metric such as accuracy) by adaptively sampling trials to label from an unlabeled test set to minimize the number of labels needed. This includes sample (public) data as well as a simulation script that tests different label-selecting strategies on already labelled test sets. This software is configured so that users can add their own data and system outputs to test evaluation.

Data and Resources

DOI Access for Active Evaluation Software for...
Explore
- More information
- Go to resource

Field	Value
accessLevel	public
bureauCode	{006:55}
catalog_@context	https://project-open-data.cio.gov/v1.1/schema/data.json
catalog_conformsTo	https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy	https://project-open-data.cio.gov/v1.1/schema/catalog.json
identifier	ark:/88434/mds2-2227
issued	2020-07-09
landingPage	https://github.com/usnistgov/active-evaluation
language	{en}
license	https://www.nist.gov/open/license
modified	2020-04-28 00:00:00
programCode	{006:045}
publisher	National Institute of Standards and Technology
resource-type	Dataset
source_datajson_identifier	true
source_hash	12e1285c4b4fb916d3e1c426cbfd61768bc581e0
source_schema_version	1.1
theme	{"Information Technology:Data and informatics"}
Groups	AmeriGEOSS National Provider North America
Tags	active-evaluation amerigeo amerigeoss ar ckan geo geoss machine-learning national north-america united-states
isopen	False
license_id	other-license-specified
license_title	other-license-specified
maintainer	Peter Fontana
maintainer_email	peter.fontana@nist.gov
metadata_created	2025-11-22T12:30:42.607955
metadata_modified	2025-11-22T12:30:42.607959
notes	This software repository contains a python package Aegis (Active Evaluator Germane Interactive Selector) package that allows us to evaluate machine learning systems's performance (according to a metric such as accuracy) by adaptively sampling trials to label from an unlabeled test set to minimize the number of labels needed. This includes sample (public) data as well as a simulation script that tests different label-selecting strategies on already labelled test sets. This software is configured so that users can add their own data and system outputs to test evaluation.
num_resources	1
num_tags	11
title	Active Evaluation Software for Selection of Ground Truth Labels