Skip to contents

Overview

CatMapper is a platform for organizing and harmonizing category systems across datasets. It currently includes two applications: SocioMap, which focuses on social-science categories (for example ethnicities, languages, religions, and administrative units), and ArchaMap, which focuses on archaeological categories (for example artifact types and cultural periods). CatMapR is an API wrapper for the CatMapper API, and all package functions are also available through the CatMapper web application at https://catmapper.org.

This vignette shows a standard CatMapR workflow:

  1. list available dataset metadata,
  2. inspect metadata helpers,
  3. search and inspect categories,
  4. translate external labels,
  5. run basic quality checks,
  6. join harmonized tables.

By default, chunks run in offline demo mode for reproducible rendering.

Why Use CatMapper?

Researchers often work with datasets that describe the same category systems in different and incompatible ways. CatMapper helps by:

  • identifying likely matches across naming variants and coding systems,
  • preserving context needed for disambiguation (for example place, domain, or period),
  • reducing manual harmonization effort before analysis,
  • making joins across independently structured datasets more reproducible.

CatMapR provides these capabilities in scriptable R workflows, including translation, quality checks, and joins can be rerun consistently as data updates.

What CatMapR Returns

CatMapR wrappers return CatMapper API responses, including:

  • dataset catalog metadata (for example CMID, CMName, citation fields, years),
  • category/entity metadata and relationship details,
  • translation and join outputs,
  • metadata/property discovery tables.

CatMapR does not fetch raw dataset source files managed outside CatMapper. User-owned/raw datasets are external inputs to your R workflow.

UI-to-R Function Mapping

  • In routes like /:database/explore, :database means the app path segment, sociomap or archamap (for example /sociomap/explore).
CatMapperJS route Typical UI step CatMapR function(s)
/:database/explore Search categories/entities and inspect details search_database(), get_cmid_info(), get_domains()
/:database/translate Upload labels, run matching, review translated outputs translate_rows()
/:database/merge Propose merge keys and join aligned datasets propose_merge_links(), join_datasets()
/:database/edit Authenticated edit upload with automatic waiting-USES contextual refresh get_upload_properties(), upload_rows()

UI Workflow Crosswalk (Detailed)

User intent CatMapper UI path UI action CatMapR call pattern
Browse available dataset records /:database and /:database/explore Open database and review dataset catalog entries list_datasets(database)
Inspect one known dataset CMID /:database/:cmid Open dataset info page get_dataset_metadata(database, cmid, domain, children)
Browse metadata helpers /:database/explore and /:database/edit Inspect available domains and property fields get_domains(database), get_upload_properties(database)
Find entities/categories /:database/explore Run search filters and inspect hits search_database(...), then get_cmid_info(...)
Harmonize labels /:database/translate Upload table and run translation translate_rows(rows, database, term, ...)
Build merge candidates /:database/merge Propose merge linkfile propose_merge_links(categoryLabel, datasetChoices, ...)
Join aligned sources /:database/merge Run join on matched keys join_datasets(database, joinLeft, joinRight, ...)
Submit metadata/category edits /:database/edit Upload edit rows and auto-trigger contextual USES refresh upload_rows(...)

Setup

library(CatMapR)

# Optional API override:
# Sys.setenv(CATMAPR_API_URL = "https://api.catmapper.org")

run_live
#> [1] FALSE

Candidate Dataset Pairings

Suggested pairings for a worked example:

  • SocioMap ADM0: SD1 (GADM 3.6) + SD2 (GeoNames 202005)
  • SocioMap ETHNICITY: SD2176 (GeoEPR 2021) + SD461930 (LEDA)
  • SocioMap LANGUAGE: SD3 (Glottolog 4.2.1) + SD2196 (Wikidata)
  • ArchaMap artifact/period: AD37767 (Andrefsky 2005) + AD37770 (DAI_FeatureTypes)

Dataset IDs above were confirmed on 2026-03-18 and may change as catalogs update.

1) List Datasets

if (run_live) {
  datasets <- list_datasets(database = "SocioMap")
} else {
  datasets <- data.frame(
    CMID = c("SD1", "SD2", "SD2176", "SD461930"),
    CMName = c("GADM 3.6", "GeoNames Version 202005", "GeoEPR 2021", "LEDA"),
    ApplicableYears = c("2018", "2020", "1946-2021", NA),
    stringsAsFactors = FALSE
  )
}

head(datasets)
#>       CMID                  CMName ApplicableYears
#> 1      SD1                GADM 3.6            2018
#> 2      SD2 GeoNames Version 202005            2020
#> 3   SD2176             GeoEPR 2021       1946-2021
#> 4 SD461930                    LEDA            <NA>

2) Inspect Metadata Helpers

if (run_live) {
  domains <- get_domains(database = "SocioMap")
  upload_props <- get_upload_properties(database = "SocioMap")
} else {
  domains <- data.frame(
    domain = c("DISTRICT", "DISTRICT", "ETHNICITY"),
    subdomain = c("ADM0", "ADM1", "ETHNICITY"),
    description = c("Administrative district", "Administrative district", "Ethnicity category"),
    stringsAsFactors = FALSE
  )
  upload_props <- list(
    database = "SocioMap",
    nodeProperties = data.frame(property = "DatasetCitation", description = "Dataset citation", stringsAsFactors = FALSE),
    usesProperties = data.frame(property = "yearStart", description = "Starting date", stringsAsFactors = FALSE)
  )
}

head(domains)
#>      domain subdomain             description
#> 1  DISTRICT      ADM0 Administrative district
#> 2  DISTRICT      ADM1 Administrative district
#> 3 ETHNICITY ETHNICITY      Ethnicity category
head(upload_props$nodeProperties)
#>          property      description
#> 1 DatasetCitation Dataset citation
head(upload_props$usesProperties)
#>    property   description
#> 1 yearStart Starting date

get_properties() is also available for deployments that expose /metadata/properties/<database>, but rollout timing may vary by API deployment.

3) Search and Inspect a Category

if (run_live) {
  hits <- search_database(
    database = "SocioMap",
    domain = "ADM0",
    term = "Ghana",
    property = "Name"
  )
  details <- get_cmid_info(database = "SocioMap", cmid = "SM1")
} else {
  hits <- list(
    count = 1,
    data = data.frame(
      CMID = "SM1",
      CMName = "Ghana",
      domain = "ADM0",
      stringsAsFactors = FALSE
    )
  )
  details <- list(node = list(CMID = "SM1", CMName = "Ghana", domain = "ADM0"))
}

hits
#> $count
#> [1] 1
#> 
#> $data
#>   CMID CMName domain
#> 1  SM1  Ghana   ADM0
str(details)
#> List of 1
#>  $ node:List of 3
#>   ..$ CMID  : chr "SM1"
#>   ..$ CMName: chr "Ghana"
#>   ..$ domain: chr "ADM0"

4) Prepare External Input

raw_df <- data.frame(
  country_label = c("Ghana", "Cote dIvoire", "Tanzania"),
  year = c(2019, 2019, 2019),
  indicator_value = c(10.2, 9.7, 8.4),
  stringsAsFactors = FALSE
)

raw_df
#>   country_label year indicator_value
#> 1         Ghana 2019            10.2
#> 2  Cote dIvoire 2019             9.7
#> 3      Tanzania 2019             8.4

5) Translate Labels

if (run_live) {
  translated <- translate_rows(
    rows = raw_df,
    database = "SocioMap",
    domain = "ADM0",
    term = "country_label",
    property = "Name",
    query = "false"
  )
} else {
  translated <- list(
    file = data.frame(
      country_label = c("Ghana", "Cote dIvoire", "Tanzania"),
      generated_CMID = c("SM1", "SM2", "SM3"),
      generated_CMName = c("Ghana", "Cote d'Ivoire", "Tanzania"),
      score = c(1, 1, 1),
      stringsAsFactors = FALSE
    )
  )
}

translated$file
#>   country_label generated_CMID generated_CMName score
#> 1         Ghana            SM1            Ghana     1
#> 2  Cote dIvoire            SM2    Cote d'Ivoire     1
#> 3      Tanzania            SM3         Tanzania     1

6) Post-Translation Checks

translated_df <- translated$file

if (!is.data.frame(translated_df) || !"generated_CMID" %in% names(translated_df)) {
  message("No `generated_CMID` column found in translation output; verify API output shape before QA checks.")
} else {
  # Rows without a CMID assignment
  translated_df[is.na(translated_df$generated_CMID), , drop = FALSE]

  # Simple duplicate check by assigned CMID
  sort(table(translated_df$generated_CMID), decreasing = TRUE)
}
#> 
#> SM1 SM2 SM3 
#>   1   1   1

7) Join Harmonized Tables

if (run_live) {
  left_df <- data.frame(
    datasetID = "SD1",
    country = c("Ghana", "Tanzania"),
    GID = c("GHA", "TZA"),
    metric_left = c(10, 20),
    stringsAsFactors = FALSE
  )

  right_df <- data.frame(
    datasetID = "SD2",
    country = c("Ghana", "Tanzania"),
    geonameid = c("2300660", "149590"),
    metric_right = c(2, 5),
    stringsAsFactors = FALSE
  )

  joined <- join_datasets(
    database = "SocioMap",
    joinLeft = left_df,
    joinRight = right_df,
    domain = "CATEGORY"
  )
} else {
  joined <- data.frame(
    generated_CMID = c("SM1", "SM3"),
    left_metric = c(10, 20),
    right_metric = c(2, 5),
    stringsAsFactors = FALSE
  )
}

joined
#>   generated_CMID left_metric right_metric
#> 1            SM1          10            2
#> 2            SM3          20            5

Optional: Authenticated Upload Flow

Write operations require a valid API key tied to a registered CatMapper account and are shown as a template only. The server identifies the acting user from the API key and enforces permissions for write endpoints. CatMapR does not manage username/password login flows.

Upload calls always use standard key expressions. Provide full key expressions in the selected key column (for example VARIABLE == VALUE).

upload_rows() triggers contextual USES relationship refresh in the graph database based on USES properties that connect to other CMIDs. This trigger is fire-and-forget and is not polled for completion.

upload_payload <- data.frame(
  CMName = "Yoruba",
  Name = "Yoruba",
  CMID = "",
  Key = "Type == Adamana Brown",
  datasetID = "SD1",
  label = "ETHNICITY",
  stringsAsFactors = FALSE
)

result <- upload_rows(
  df = upload_payload,
  database = "SocioMap",
  form_data = list(
    domain = "ETHNICITY",
    subdomain = "ETHNICITY",
    datasetID = "SD1",
    cmNameColumn = "CMName",
    categoryNamesColumn = "Name",
    alternateCategoryNamesColumns = character(0),
    cmidColumn = "CMID",
    keyColumn = "Key"
  ),
  action = "add_uses",
  api_key = Sys.getenv("CATMAPR_API_KEY"),
  poll_interval_seconds = 1,
  timeout_seconds = 600
)

head(result)
update_add_payload <- data.frame(
  CMID = "SM123",
  Key = "Type == Adamana Brown",
  datasetID = "SD1",
  variable = "CeramicType",
  stringsAsFactors = FALSE
)

upload_rows(
  df = update_add_payload,
  database = "SocioMap",
  form_data = list(
    domain = "ETHNICITY",
    subdomain = "ETHNICITY",
    datasetID = "SD1",
    cmNameColumn = "CMName",
    categoryNamesColumn = "Name",
    cmidColumn = "CMID",
    keyColumn = "Key"
  ),
  action = "update_add",
  properties = c("variable"),
  api_key = Sys.getenv("CATMAPR_API_KEY")
)