Empowering discovery by connecting place names across time and language
The mission of the World Historical Gazetteer is to build a platform for open linked historical place data in order to foster deeper understanding of place history and to improve discovery of places that have had many names over time. The WHG place index is accessible through tools and services that permit users to search and browse information about places, augment and contribute their own place records, assemble and publish information about places, and use information about places in teaching and communication.
The first step for working with your own data in WHG is to prepare an upload file in one of two formats: Linked Places format (LPF) if your data is relatively complex, or LP-TSV, a simpler delimited file format (e.g. a spreadsheet, CSV, or TSV). See Choosing an upload data format: LPF or LP-TSV? for more information.
After making that choice, the steps are as follows:
The Dataset Collection feature in WHG allows an individual or collaborative group to create a historical gazetteer resource from multiple datasets, linking those records that refer to the same place. Some examples, currently at various stages of development, include:
Published Dataset Collections can be browsed in a rich publication page including an explanatory essay, and in the near future will be independently searchable via the WHG API — for example as a resource in annotation software. See this step-by-step guide, Create and publish a Dataset Collection
Note that in order for a dataset to be added to a Dataset Collection, it must have been fully accessioned in the WHG union index. See the "WHG Union Index" section in Explaining the WHG indexes guide.
The Place Collection feature in WHG is designed for two use scenarios: as a teaching or workshop exercise, or as a form of authored publication. The steps for creating a Place Collection are the same in all cases, as detailed in the Create and publish a Place Collection guide.
Registered WHG users can request "group leader" permissions, which allows them to create and manage a WHG Collection Group. This is a private space where students or workshop participants can create and share collections of places, annotated with custom keywords, notes, dates, and images. The group leader can review and comment on the collections, and can nominate exceptional collections for inclusion in the WHG Student Gallery. Students or workshop participants join the group by entering an access key created and distributed by the instructor or workshop leader.
See "Create and manage a class or workshop Collection Group".
As a member of a Collection Group, an option appears on all of their collection builder pages to submit the collection when complete—for review by the instructor or workshop leader.
Any registered WHG user can create a thematic annotated Place Collection (guide), and request its publication on the WHG platform, subject to review and approval by WHG editorial staff.
World Historical Gazetteer supports uploads of both Linked Places format (LPF; v1.2.2 specification) and its delimited file derivative, LP‑TSV, which is more useful for relatively simple data (v0.5 specification). In both cases, some level of transformation has to happen between your source data and the chosen format. Both formats require that there be one record per place. The main distinctions can be summarized this way:
Choose LPF if:
Choose LP-TSV if:
If you have a list of distinct places with a name or names and basic attributes of the place, like coordinates, and place type in a spreadsheet, database table, etc., the task of preparing an upload file for WHG is straightforward. In almost all cases your format choice will be LP-TSV, and you can copy/paste columns from your file into WHG's LP-TSV spreadsheet template, as explained in the file itself. See also, "Quick Start" on the "Upload dataset" page
However, the data for most spatial historical projects is not only about places or locations, but principally about events or artifacts for which location is an important dimension.
Both LPF and LP-TSV require that there be one record per place. But for many projects, a single place can have multiple rows in a spreadsheet, or multiple features in a shapefile—each recording for example a change in some attribute at a given time. For this reason, data often takes the form of one row per event, or artifact, or observation of some kind, with a column for place name, and/or for latitude and longitude. In this case location information is often repeated on each row that is about that event, or artifact, etc. The task is to extract the distinct places, into a separate places-only table or worksheet.
Conflating multiple place references to a single place record often requires disambiguation or normalization, with several kinds of decisions only the data creator can make, e.g.:
Apart from conflating multiple place references to a single place record, converting data from a delimited format like a spreadsheet or shapefile attribute table to the JSON-base LPF will almost certainly require a script—using e.g. Python or SQL if a database is involved. A how-to for this is beyond the scope of this document, but this CSV > JSON tool demonstrates how this will look, and a web search will locate other tools that may help.
WHG maintains three high-speed indexes for use in the platform, "Wikidata+GeoNames", the "WHG Union Index," and the "Pub" index.
This index of over 13 million place records from Wikidata (3.6m) and GeoNames (10m) is used for initial intitial reconciliation of uploaded datasets, enabling their augmentation with
The WHG Union Index is where individual records for the same or "closely matched" places coming from different datasets are linked. Search results privilege these linked sets or "clusters" of records, and present them in Place Portal pages like this one for Glasgow.
Records from published datasets make their way into the union index by means of a second reconciliation step, following that for the Wikidata+Geonames index. This step is initiated by WHG editorial staff, and when complete the dataset is considered fully accessioned. See "Accessioning to the WHG Index" in Individual datasets for details.
When a dataset has been reconciled to the Wikidata+Geonames index and published, it is automatically added to the "Pub" index so that its records can be discovered not only via browsing its publication page, but in search and via our Application Programming Interface (API). If and when the dataset is reconciled to the union index, its records are removed from "Pub," as they are now linked where possible and will appear in Place Portal pages.
After a reconciliation task is run, the prospective matches to your records are presented for review. For each of your records that had one or more "hits," those hit records from Wikidata and/or GeoNames are presented in a list on the right of the screen, with your record on the left. The dataset owner and any designated collaborators decide, for each of its records, whether one or more of the hits is a "close match." Clicking the save button records those closeMatch/no match decisions and advances to the next record from the dataset. It is also possible to defer a decision, and placed in a separate queue that can be revisited, possibly by someone with more relevant expertise. It is also possible to add a note to the record for future reference.
The information displayed and options offered are explained below.
The meaning of closeMatch derives from the Simple Knowledge Organization System (SKOS) vocabulary, a widely used data model. Practically speaking, for WHG asserting a "closeMatch" serves as a linking "glue." Specifically, records that share one or more common linked asserted as (closeMatch) are joined/linked in our "union index" and returned together in response to queries. For example, records for Abyssinia and Ethiopia share two closeMatch links, to a DBPedia record and a TGN record. Therefore, they appear together when searching for either Abyssinia or Ethiopia. We have determined there is not a clear enough distinction with SKOS:exactMatch to offer that choice.
From the SKOS specification:
Furthermore, closeMatch is a super-property of exactMatch; that is, every exactMatch is also a closeMatch. Remember, the purpose of the assertion is to ensure records that should intuitively appear together, do.
Review of results for accessioning to the WHG index is similar to review for reconciliation but differs in the following ways:
Place Collections in the WHG are annotated sets of place records from published datasets. Places can be added to a collection in three ways:
Once places have been added, they can be annotated in the following way:
At any time, add the following elements to the collection as a whole:
Choose visualization options to control how temporal information will appear in the collection's map and table displays (you can preview how your collection will display at any time). Options include:
If you have joined a collection group class or workshop, you have the option to submit it to the
instructor or workshop leader for review. If the group has a gallery, once reviewed, the collection will appear there.
Instructors have the option to nominate exceptional collections for the WHG Student Gallery.
If your collection is
A WHG Dataset Collection is a set of published, indexed datasets in WHG, whose place records have been linked with others for the same place where they occur. Its potential purposes and possibilities are outlined in the Multiple datasets pathway section of this documentation.
All datasets in a Dataset Collection must be published and fully accessioned — that is, indexed in the WHG union index. This is because the linking of records for the same place from multiple datasets occurs during the final indexing step. See "Accessioning to the WHG union index" in the Individual datasets section.
The steps in creating a Dataset Collection are as follows:
The Collection Group feature in WHG is designed primarily for instructional scenarios, but can also be used for workshops. Any registered user can request "group leader" permissions, which allow them to create and manage a WHG Collection Group. This is a private space where students or workshop participants can create and share collections of places (WHG Place Collections), annotated with custom keywords, notes, dates, and images. The group leader can review submitted collections, and can nominate exceptional collections for inclusion in the WHG Student Gallery. Students or workshop participants join the group by entering an access key created and distributed by the instructor or workshop leader.
The workflow in both cases is very similar: