WHG::Architecture

Pipeline, architecture, and software

Software stack

Django 4 (Python framework)
Python 3.10.7
PostgreSQL 15 (relational database)
Elasticsearch 8 (index)
Nginx, Gunicorn (web server)
Celery/Redis (task queueing)
Ubuntu 22.10 (operating system)
Front-end: Javascript, MapLibreGL, Webpack, Leaflet, JQuery, Bootstrap 5, D3

Data workflow

WHG has two data stores: a relational database (db) and a high-speed index (idx).
Interfaces to this data include a graphical web application (GUI) and APIs.
Contributed data in Linked Places or LP-TSV format is uploaded by registered users to the database.
Once uploaded, datasets are managed in a private set of screens, where they can be browsed and reconciled against Wikidata.
Reconciliation entails initiating a task managed by Celery/Redis and reviewing the candidate matches returned.
Confirming matches to Wikidata records augments the contributed dataset by adding new place_link and, if desired, place_geom records. NOTE: The original contribution can always be retrieved in its original state, as it was uploaded.
Once an uploaded dataset is reconciled and as many place_link records are generated for it as possible, it can be flagged as "public" and at that point it becomes a browseable and searchable data publication
As a further step, published datasets can be accessioned to the WHG index, a process that links individual records for the same (or "closely matched") places from multiple datasets.
The accessioning task will be initiated by WHG staff, but review of results will be by the dataset owner and any designated collaborators.
Accessioning to the WHG index is another reconciliation process with two steps: initiating the task and reviewing results. Incoming records that share a link to an external gazetteer (e.g. wikidata, geonames, etc.) with a record already in our index are queued separately and can be added automatically, associating it with that match and any other similarly linked "siblings."
Incoming records that don't share one or more links to existing index items become new "seed" records in the index, referred to internally as "parents."

System Architecture and Technical Summary