World Historical Gazetteer Beta 0.5: A Tour

Welcome to our third public beta release (February, 2020). The Version 1 launch is planned for spring of 2020 and in the meantime we will be making further beta releases to give early adopters and contributors a preview of the capabilities of this platform and a chance to experiment with it. Notes about future improvements work-in-progress appear highlighted like this.

Home page

Register/Login/Profile

Registration and login are required to be able to upload datasets, designate collaborators, and use our reconciliation services to find matches in modern placename authorities (Getty Thesaurus of Geographic Names (TGN) and Wikidata so far), and ultimately to contribute your data to the WHG index. Once logged in, top-level menu options for "My Data" and user profile appear.

Places

Place search runs against a union index of ~1.8m core records and all contributed datasets. Autosuggest presents possibilities after typing a few characters. For performance reasons, autosuggest for place search has been temporarily disabled. Typing a name then pressing the {Enter} key performs a search, presents a list of results, and maps those which have geometry (not all do). Clicking a result item will highlight it on the map, providing further context. Clicking the name link takes you to the index record's "portal" page where any number of attestation "cards" drawn from our core datasets (grey banner) and multiple contributed datasets (beige banner) are gathered.

Traces

Integration of trace data is at an early experimental stage, and a Linked Traces annotation format (LT-Anno) is in active development. We welcome suggestions for how trace data can be better integrated into the WHG interface.

Search for trace data runs against a separate index. At this time there are only a few example trace records. . Try typing 'buddhist' to see a couple of examples. Selecting an auto-suggestion will perform the search and the places in the trace record are mapped. As with places, clicking a result item highlights the place on the map, and clicking its name takes you to its place portal page.

My Data page

The My Data page presents three lists of user-generated items: Datasets, Study Areas, and Collections (future). Upon registering, only two read-only "core" datasets are listed, which can be viewed but not acted upon. In each case, clicking the "add new" link () starts the process of creating and managing a dataset.

Datasets
  • Place data

    Registered users can create Place datasets by uploading files in one of two formats: the expressive GeoJSON-LD based Linked Places format ("LP format" for short), or the simpler LP-TSV format. Considerations for making the choice are found in the "Choosing an upload data format" tutorial.

    Help icons () for each field in the "Upload dataset" form provide popup instruction for filling it.

    Uploaded data files are validated for adherence to the relevant format spec, LP or LP-TSV. If there are formatting errors, their details are displayed and after correcting them the file upload can be attempted again.

    Upon successful upload, a Dataset Portal page is displayed.

  • Trace data

    NOTE: Trace data uploads are not yet enabled.

Study Areas

These are user-created named polygon bounds used to constrain the searches used in the reconciliation process. See Reconciliation below.

Collections

Not yet available; to be implemented by v1 launch. Users will be able to add place and trace records to personal collections, which can be mapped, edited, and optionally, shared.

Dataset Portal page

This page contains several tabbed sections for managing datasets: Metadata, Browse, Reconciliation, and Sharing.

Metadata

This tab section provides metadata about the dataset and its most recent uploaded data file source. The title, base URI and description fields can be edited. Updating of the datasets can be initiated (only LP-TSV delimited files at present). Statistics are displayed on the right side of the screen: initial counts of rows, name variants, links, and geometries, as well as counts of link and geometry records added during the reconciliation review process.

Browse

This section combines a sortable, searchable list of the records currently in the dataset, and a map displaying any geometry it includes. Note that links and geometry from authority records matched in the reconciliation review process are reflected here, as dataset augmentation written as new place_link and place_geom records.

Reconciliation

Reconciliation "tasks" are initiated and managed from this tab section.

Upon completion of each reconciliation task, a summary is generated and displayed in a list, with links to (a) review the its results, and (b) clear matches confirmed in review work so far or delete the task and its results entirely. Caution! There is no recovery from these clearing actions!

Sharing

Owners of a dataset can name registered users in WHG as collaborators, giving them permission to view the dataset and to perform review of prospective matches generated by a reconciliation task.

Contributing a dataset to WHG

After a dataset has been uploaded and, using our reconciliation services, augmented with as many links (matches) to modern authorities as possible (see below), it can be considered for accessioning. In that step, each record is compared with the WHG index to see if the referenced place already has one or more attestations from another dataset. If it does, it is marked as a "child" record of the first attestation we received. If it does not, it is considered a new "parent." Accessioning relies on Place records having as many associated "place_link" records as can be obtained. At this stage, all accessioning is performed internally by the WHG project team.

Initiate Reconciliation page

Reconciliation is the process of identifying matches of your Place records to existing records in online place name authorities. So far, reconciliation to Getty TGN and Wikidata are offered. DBpedia and GeoNames will likely be added before v1 launch. The purpose is to augment a dataset with associated "place_link" records, and optionally, geometry ("place_geom" records) derived from the authority. It is therefore possible to upload a dataset having no geometry, and use this reconciliation service to make it mappable, at least in part.

NOTE: Making your dataset as rich with links to authorities as possible is a crucial step in making it ultimately a solid contribution to the WHG index.

In each case, the authority data store is queried for matches with your dataset records, one by one. Each query actually consists of multiple "passes," at first including as much context as your records may contain: name plus all variants; place type; one or more modern country ; a user-defined Study Area; coordinate geometry for the feature; and name(s) of "parent" entities). Subsequent passes (two for TGN, one for Wikidata) relax the query if no potential matches (hits) are found. Resulting hits for all records are queued for review by the dataset creator.

Getty TGN

WHG maintains a locally indexed copy of the 2.5 million place records retrieved from a TGN dump file in March, 2018. Because it is local, the process is considerably faster than for Wikidata. We hope to periodically update this index in the future, or to use the newly announce TGN Open Refine endpoint if its results are comparable and speed is acceptable.

Almost all TGN records include a point geometry, but no concordances with other authorities.

Wikidata

The Wikidata reconciliation is performed against its SPARQL endpoint (https://query.wikidata.org/). At approximately 1 second per record it is slower than that for TGN. Wikidata records often contain geometry and concordances with other authorities. When you confirm a Wikidata match, we create a "place_link" record not only for the Wikidata ID but for TGN, GeoNames, VIAF and Library of Congress IDs, if found.

Study Areas

Many toponyms appear repeatedly in multiple locations, referring to different places, often far apart (e.g. Latin America and the Iberian Peninsula). To aid the reconciliation process for Getty TGN, users can define a Study Area that will constrain the search for matches to particular areas. Therefore, when TGN is selected as the authority, interface options appear allowing users to define a Study Area and use it for that reconciliation task.

Pre-defined areas can be chosen from a dropdown menu, or you can define your own by a) entering a series of 2-letter country codes, which will generate a hull shape, or b) by drawing a polygon on a map. Any study areas you create will appear in the 'user-defined' dropdown following their creation.

Reconciliation Review page

Prospective matches to external authorities are not automatically added to the WHG database during reconciliation; i.e. "place_link" records are created only by the Reconciliation Review step performed by the dataset creator or specified collaborators.

This page presents those dataset records that have one or more prospective matches ("hits') on the left of the screen, with a list of those hits on the right. A small map displays geometry for the record with a green marker, and that of all hits with orange markers. Hovering over the globe symbol () in a hit item highlights its position in the map.

The objective is to determine, for each dataset record, whether any of the hits in fact refer to the same place, making a match/no-match choice, then clicking save to record the choice and advance to the next record. The default is "no match." Links on either side provide additional context to assist making the assessment. The default value for the match buttons is "closeMatch"; alternatives are : "exactMatch" and "related."

closeMatch vs. exactMatch vs. related

The meaning of closeMatch and exactMatch derives from the Simple Knowledge Organization System (SKOS) vocabulary, a data model commonly used in linked data applications. NOTE: The "related" relation is not yet defined formally, and assertions of it will not yet appear in the interface. For WHG, a Place record refers to a SKOS:Concept, so assertions of a match between your record and that of an external authority indicates:

  • closeMatch: "...(the) two concepts are sufficiently similar that they can be used interchangeably in some information retrieval applications"
  • exactMatch: "...a high degree of confidence that two concepts can be used interchangeably across a wide range of information retrieval applications."

Furthermore, closeMatch is a super-property of exactMatch; that is, every exactMatch is also a closeMatch. Clear? Oh well. Practically speaking, for WHG both of these will serve as a linking "glue." Specifically, when we generate our union index, records that share one or more common links (closeMatch OR exactMatch) will be joined/linked, and returned together in response to queries. For example, records for Abyssinia and Ethiopia share two links, to a DBPedia record and a TGN record. Therefore, the appear together when searching for either Abyssinia or Ethiopia.

About pages

Basic information about the project.

Tutorials

Two have been completed so far, more to come...