POPMAPS: An R package to estimate ancestry probability surfaces

This software code was developed to estimate the probability that individuals found at a geographic location will belong to the same genetic cluster as individuals at the nearest empirical sampling location for which ancestry is known. POPMAPS includes 5 main functions to calculate and visualize these results (see Table 1 for functions and arguments). Population assignment coefficients and a raster surface must be estimated prior to using POPMAPS functions (see Fig. 1a and b). With these data in hand, users can run a jackknife function to choose an optimal parameter combination that reconstructs empirical data best (Figs. 2 and S2). Pertinent parameters include 1) how many empirical sampling localities should be used to estimate ancestry coefficients and 2) what is the influence of empirical sites on ancestry coefficient estimation as distance increases (Fig. 2). After choosing these parameters, a user can estimate the entire ancestry probability surface (Fig. 1c and d, Fig. 3). This package can be used to estimate ancestry coefficients from empirical genetic data across a user-defined geospatial layer. Estimated ancestry coefficients are used to calculate ancestry probabilities, which together with 'hard population boundaries,' compose an ancestry probability surface. Within a hard boundary, the ancestry probability informs a user of the confidence that they can have of genetic identity matching the principal population if they were to find individuals of the focal organism at a location. Confidence can be modified across the ancestry probability surface by changing parameters influencing the contribution of empirical data to the estimation of ancestry coefficients. This information may be valuable to inform decision-making for organisms having management needs.

Data and Resources

Original MetadataXML
The metadata original format
Explore
- Preview
- Download
Digital DataXML
Landing page for access to the data
Explore
- Preview
- Download

Field	Value
accessLevel	public
bureauCode	{010:12}
catalog_@context	https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
catalog_conformsTo	https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy	https://project-open-data.cio.gov/v1.1/schema/catalog.json
identifier	USGS:627e7b24d34e3bef0c9a2cc2
metadata_type	geospatial
modified	20220524
old-spatial	-180.000, 90.0000, 180.000, 90.0000
publisher	U.S. Geological Survey
publisher_hierarchy	Department of the Interior > U.S. Geological Survey
resource-type	Dataset
source_datajson_identifier	true
source_hash	627ae16774f20acb9d41310260fb4906ae182130
source_schema_version	1.1
spatial	{"type": "Polygon", "coordinates": [[[-180.000, 90.0000], [-180.000, 90.0000], [ 180.000, 90.0000], [ 180.000, 90.0000], [-180.000, 90.0000]]]}
theme	{geospatial}
Groups	AmeriGEOSS National Provider North America
Tags	amerigeo amerigeoss biogeography biota ckan demographics environmental-gradients evolution genetic-diversity geo geoss national native-plant-materials-development native-species north-america phylogeny phylogeography restoration seed-transfer-guidelines united-states usgs-627e7b24d34e3bef0c9a2cc2
isopen	False
license_id	notspecified
license_title	License not specified
maintainer	Robert T Massatti
maintainer_email	rmassatti@usgs.gov
metadata_created	2025-11-21T19:37:14.285020
metadata_modified	2025-11-21T19:37:14.285025
notes	This software code was developed to estimate the probability that individuals found at a geographic location will belong to the same genetic cluster as individuals at the nearest empirical sampling location for which ancestry is known. POPMAPS includes 5 main functions to calculate and visualize these results (see Table 1 for functions and arguments). Population assignment coefficients and a raster surface must be estimated prior to using POPMAPS functions (see Fig. 1a and b). With these data in hand, users can run a jackknife function to choose an optimal parameter combination that reconstructs empirical data best (Figs. 2 and S2). Pertinent parameters include 1) how many empirical sampling localities should be used to estimate ancestry coefficients and 2) what is the influence of empirical sites on ancestry coefficient estimation as distance increases (Fig. 2). After choosing these parameters, a user can estimate the entire ancestry probability surface (Fig. 1c and d, Fig. 3). This package can be used to estimate ancestry coefficients from empirical genetic data across a user-defined geospatial layer. Estimated ancestry coefficients are used to calculate ancestry probabilities, which together with 'hard population boundaries,' compose an ancestry probability surface. Within a hard boundary, the ancestry probability informs a user of the confidence that they can have of genetic identity matching the principal population if they were to find individuals of the focal organism at a location. Confidence can be modified across the ancestry probability surface by changing parameters influencing the contribution of empirical data to the estimation of ancestry coefficients. This information may be valuable to inform decision-making for organisms having management needs.
num_resources	2
num_tags	21
title	POPMAPS: An R package to estimate ancestry probability surfaces