World Historical Gazetteer Beta 0.2: A Tour

Welcome to our first public beta release. The Version 1 launch is planned for Spring 2020 and in the meantime we will be making a series of beta releases to give early adopters and contributors an early view of the capabilities of this platform and a chance to experiment. Notes about future improvements work-in-progress appear highlighted like this.

Home page

Register/Login/Profile

Registration and login are required to be able to upload datasets, use our reconciliation service to find matches in modern placename authorities, Getty Thesaurus of Geographic Names (TGN) and Wikidata at this point), and ultimately to contribute your data to our index. Once logged in, top-level menu options for "My Places" and user profile appear.

These functions are rudimentary now and will be upgraded soon. Features will include sharing dataset access with other registered users granting authority to review reconciliation results, etc.

Search for Places and Traces
Places

Place search runs against a union index of ~1.8m core records and all contributed datasets. Autosuggest presents possibilities after typing a few characters. For performance reasons, autosuggest has been temporarily disabled for places. Typing a name then pressing the {Enter} key performs a search, presents a list of results, and maps those which have geometry (not all do). Clicking a result item will highlight it on the map, providing further context. Clicking the place name takes you to its "place portal" page. The index record displayed can include any number of attestation "cards" drawn from multiple contributed datasets.

Traces

Search for Trace data runs against a separate index. At this time there are only a few example trace records. A Linked Traces annotation format (LT-Anno) is in active development. Try typing 'buddhist' to see a couple of examples. Selecting an auto-suggestion will perform the search and the places in the trace record are mapped. As with places, clicking a result item highlights the place on the map, and clicking its name takes you to its place portal page.We welcome suggestions for how trace data can be better integrated into the WHG interface.

Map layer selector

A few alternative base maps are available via the layer icon in the upper right corner.

My Data page

The My Data page presents three lists of user-generated items: Datasets, Study Areas, and Collections (future). Upon registering, only two read-only "core" datasets are listed, which can be viewed but not acted upon. In each case, clicking a plus icon () starts the process of creating and managing datasets.

Datasets
  • Place data

    Registered users can create Place datasets by uploading files in one of two formats: the expressive GeoJSON-LD based Linked Places format ("LP format" for short), or the simpler LP-TSV format. Details of these and considerations for making the choice are found in the "Choosing an upload data format" tutorial.

    Help icons () for each field in the "Upload dataset" form provide popup instruction for filling it.

    Uploaded data files are validated for adherence to the relevant format spec, LP or LP-TSV. If there are formatting errors, their details are displayed and after correcting them the file upload can be attempted again.

    Upon successful upload, a Dataset Portal page is displayed.

  • Trace data

    NOTE: Trace data uploads are not yet enabled.

Study Areas

These are user-created named polygon bounds used to constrain the searches used in the reconciliation process. See Reconciliation below.

Collections

Not yet available; to be implemented by v1 launch. Users will be able to add place and trace records to personal collections, which can be mapped, edited, and optionally, shared.

Dataset Portal page

This page provides a summary of the uploaded data file, a link to initiate a reconciliation task (), and maintains counts of Link and Geometry augmentations generated by the reconciliation process (i.e. records added to the place_link and place_geom tables; see the "System Details" page for more information).

Upon completion of each reconciliation task, a summary is generated and displayed in a column on the right, with links to (a) review the reconciliation results, and (b) clear matches confirmed in review work so far or delete the task and its results entirely. Caution, no recovery from these actions!

The page also contains links to browse the dataset records (), and to delete the dataset entirely (); Caution, no recovery from this action!). In the future an option to update the dataset with a new file upload will be available.

Contributing a dataset to WHG

After a dataset has been uploaded and, using our reconciliation services, augmented with as many links (matches) to modern authorities as possible (see below), it can be considered for accessioning. In that step, each record is compared with the WHG index to see if the referenced place already has one or more attestations from another dataset. If it does, it is marked as a "child" record of the first attestation we received. If it does not, it is considered a new "parent." Accessioning relies on Place records having as many associated "place_link" records as can be obtained. At this stage, all accessioning is performed internally by the WHG project team.

Dataset Browse page

This page presents a paged, sortable, and searchable table display of all records in a dataset, and a map of those records which have geometry. Hovering over a table row highlights the place in the map, and clicking the row updates the record contents displayed underneath the map. Note that for large datasets it may take several second to load geometry; longer for linestring and polygon data than for points.

Future features for this page: (a) download dataset as Linked Places GeoJSON for mapping in other software; (b) make the map full-screen w/clickable markers; (c) download a PDF of the current viewport; and ??? (ideas welcome).

Initiate Reconciliation page

Reconciliation is the process of identifying matches of your Place records to existing records in online place name authorities. So far, reconciliation to Getty TGN and Wikidata are offered. DBpedia and GeoNames will likely be added before v1 launch. The purpose is to augment a dataset with associated "place_link" records, and optionally, geometry ("place_geom" records) derived from the authority. It is therefore possible to upload a dataset having no geometry, and use this reconciliation service to make it mappable, at least in part.

NOTE: Making your dataset as rich with links to authorities as possible is a crucial step in making it ultimately a solid contribution to the WHG index.

In each case, the authority data store is queried for matches with your dataset records, one by one. Each query actually consists of multiple "passes," at first including as much context as your records may contain: name plus all variants; place type; one or more modern country ; a user-defined Study Area; coordinate geometry for the feature; and name(s) of "parent" entities). Subsequent passes (two for TGN, one for Wikidata) relax the query if no potential matches (hits) are found. Resulting hits for all records are queued for review by the dataset creator.

Getty TGN

WHG maintains a locally indexed copy of the 2.5 million place records retrieved from a TGN dump file in March, 2018. Because it is local, the process is considerably faster than for Wikidata. We hope to periodically update this index in the future, or to use the newly announce TGN Open Refine endpoint if its results are comparable and speed is acceptable.

Almost all TGN records include a point geometry, but no concordances with other authorities.

Wikidata

The Wikidata reconciliation is performed against its SPARQL endpoint (https://query.wikidata.org/). At approximately 1 second per record it is slower than that for TGN. Wikidata records often contain geometry and concordances with other authorities. When you confirm a Wikidata match, we create a "place_link" record not only for the Wikidata ID but for TGN, GeoNames, VIAF and Library of Congress IDs, if found.

Study Areas

Many toponyms appear repeatedly in multiple locations, referring to different places, often far apart. To aid the reconciliation process for Getty TGN, users can define a Study Area that will constrain the search for matches to particular areas. Therefore, when TGN is selected as the authority, interface options appear allowing users to define a Study Area and use it for that reconciliation task. Options to draw study areas on a map or choose them from region features existing in the WHG index are not yet operational. For the time being, you must enter a series of 2-letter country codes, which will generate a hull shape.

Reconciliation Review page

Prospective matches to external authorities are never automatically added to the WHG database; i.e. "place_link" records are created only by the Reconciliation Review step performed by the dataset creator or authorized members of their team.

This page presents those dataset records that have one or more prospective matches ("hits') on the left of the screen, with a list of those hits on the right. A small map displays geometry for the record with a green marker, and that of all hits with orange markers. Hovering over the globe symbol () in a hit item highlights its position in the map.

The objective is to determine, for each dataset record, whether any of the hits in fact refer to the same place, making a match/no-match choice, then clicking save to record the choice and advance to the next record. The default is "no match." Links on either side provide additional context to assist making the assessment. The default value for the match buttons is "closeMatch"; alternatives are : "exactMatch" and "related."

closeMatch vs. exactMatch vs. related

The meaning of closeMatch and exactMatch derives from the Simple Knowledge Organization System (SKOS) vocabulary, a popular data model. The "related" relation is not yet defined formally, and assertions of it do not yet appear in the interface. For WHG, a Place record refers to a SKOS:Concept, so assertions of a match between your record and that of an external authority indicates:

  • closeMatch: "...(the) two concepts are sufficiently similar that they can be used interchangeably in some information retrieval applications"
  • exactMatch: "...a high degree of confidence that two concepts can be used interchangeably across a wide range of information retrieval applications."

Furthermore, closeMatch is a super-property of exactMatch; that is, every exactMatch is also a closeMatch. Clear? Oh well. Practically speaking, for WHG both of these will serve as a linking "glue." Specifically, when we generate our union index, records that share one or more common links (closeMatch OR exactMatch) will be joined/linked, and returned together in response to queries. For example, records for Abyssinia and Ethiopia share two links, to a DBPedia record and a TGN record. Therefore, the appear together when searching for either Abyssinia or Ethiopia.

Maps page

The heat maps presently appearing here are only serving to indicate the spatial extent of some datasets either already in WHG (HGIS de las Indias) or in the queue to enter soon. In time we will provide mapping that summarize WHG index contents in some way. Ideas (and coding help) are welcome!

About pages

Basic information about the

Tutorials

Two completed so far, more to come...