World Historical Gazetteer Version 2.1: A Guide

Welcome to Version 3.0b of the World Historical Gazetteer (WHG) platform, featuring several significant changes from Version 2. WHG remains a work-in-progress, so expect further refinements, new features and new data on an ongoing basis. Notes about planned near-future updates appear highlighted like this.

Geography in Historical Research

Many scholars and students of the human past ask explicitly geospatial questions in their research, for example about distribution patterns over time and connectivity. Others simply want to make maps—to visualize the geography inherent in their source material. Either case requires making lists of historical place names and determining the coordinates of those places, rendering them mappable and amenable to spatial analysis. Finding even estimated locations for historical places is often a difficult and time-consuming task. A central goal of the World Historical Gazetteer project is to make it as easy as possible. WHG is a platform for gathering in one system, over time, contributions of place name data—large and small, and for all regions and historical periods. WHG provides services for geolocating place names, linking individual attestations for "closely matched" places, and sharing the results of that work as searchable linked datasets.

Teaching

WHG is an excellent resource for teachers and students alike. Teachers can use information in WHG to refine lessons and to make custom maps for lectures or resources. The services offered by WHG will help make historical mapping easier and more accessible to everyone. In short, developing gazetteer datasets is an excellent classroom tool. The "Collections" feature will soon be expanded to facilitate building sets of individual records, and annotating them with explanatory abstracts or blog posts.

Reference

A "union index" of place attestations drawn from historical sources will over time increasingly link the disparate research of contributors on the dimension of place. Furthermore, by bringing together all the known references for a place without privileging any particular one, it decenters colonized name making. For instance, the WHG index contains 133 modern and historical name variants for the contemporary city of Beijing drawn from multiple sources, and 96 for the city of Istanbul. Searching for either Al Quds or Jerusalem returns contributed references for either.

Data

Two Place Data Stores: Database and Index

Data from files uploaded to WHG in either the full Linked Place format (LP) or abbreviated LP-TSV format are imported into a PostgreSQL relational database and made available to the uploader ("owner") for viewing and augmenting via our reconciliation process. Dataset owners can designate other WHG users as co-owners or collaborating "members." Data cannot be edited in WHG, but datasets can be updated. Semi-automated update functionality is a work-in-progress. A staff-assisted process is possible in the interim.

The reconciliation process finds prospective matches to your records in Wikidata and/or Getty TGN. When dataset owner(s) accept an external record as a "close match," one or more place_link and place_geom records are added to the database, reflecting that match. This augments your dataset within the WHG database with concordance identifiers (links), and geographical coordinates, but does not alter the original uploaded content in any way. When a dataset is flagged as public, its records become publicly accessible via the Search page, a public Browse page, and the WHG API.

Separately from the database, WHG maintains a high-speed index that links records for closely matched places from multiple datasets. In the accessioning step, records from a new dataset are either linked with existing records for a given place (if any), or become a new "seed" record if not. The result, for example, is that a search of the index for either Istanbul, Constantinople, or Byzantium will lead users to the same Place Portal page listing attestations for that place by any of those names, as supplied by multiple sources.

Place

As of July, 2021, World Historical Gazetteer includes about 60,000 contributed historical place records and about 1.8 million "core" non-historical records

Public historical datasets (all are in the database; indexing is partial and in progress for some, as noted)

Core (non-historical) data

Several additional historical datasets are in queue for accessioning at this time. We welcome further contributions, large and small. If you have one in the works or in mind, please let us know via the contact form. We will publicize these additions as they occur via our blog and our Twitter account (@WHGazetteer)

Trace annotations (experimental)

Trace annotations are records annotating web resources about historical events, people, works and objects (“traces”) with identifiers for the places that are relevant to them. The annotations assert a relation between the trace and the place (for instance, they note that a place was the waypoint on a journey or the birthplace of an individual) and they attest a year or timespan during which the trace and the place were connected. Each annotation record joins one trace with any number of places. In connection with one another, one or multiple trace records link places together to create spatially explicit historical narratives. Early example datasets include:

  • Hernán Cortes and the Conquest of the Aztec Empire
  • The Journey of Xuanzang, a seventh century Buddhist Monk
  • The Lifepath of Gautama Buddha
  • The Empire of Alexander

Details for how to use the following services appear in the "Page by Page" section of this Guide and in help screens throughout the application.

Linking and geocoding via reconciliation

A core feature of WHG is its reconciliation services, which allow you to upload place records drawn from your historical sources and find potential matches for them in Wikidata. Records from Wikidata include geographic coordinates in almost cases. Potential matches are queued for review, and when you accept a match, your original dataset is augmented with geometry from the authority record as well as the authority identifier. Many Wikidata records also include concordances: identifiers from GeoNames, VIAF, and Library of Congress among others. Those are also added to the original dataset.

The scripts that we have developed to suggest potential matches do not simply match names; they make use of context provided in both the uploaded and authority records, including: a) all name variants, b) modern country or study area bounds, c) place type, and d) any provided coordinates.

Downloading augmented data

Having augmented an uploaded dataset with additional geometry and links, its owner can download it for mapping or further deveopment or any research purposes. Note that the file they download is a revised version of the original and users must manage the different versions.

Sharing (publication)

Registered users can request a dataset be flagged as "public," and after a brief review by WHG editorial staff, it is effectively published as Linked Open Data. Upon upload, each record is assigned a unique permanent numerical identifier within WHG. It also inherits a unique dataset label identifier upon upload, which can be combined with the unique src_id you had given it, forming another permanent identifier within WHG. The Using the API page explains options for that service.

Contributing

All of the above occurs in your private workspace. Once your dataset has as many geometries and links to external authorities as the reconciliation process can discover, you can take the extra important step of accessioning it as a contribution to the WHG union index. Some resons to do this include:

  • Your place attestations now appear in index search results, in many cases linked with other attestations for the same place.
  • Because of this, researchers concerned with a particular place or set of places can learn about other people also interested in the same place(s)
  • If any of your records refer to places that were not previously in the WHG index, they will become new "seed" records. In time, people will spend far less effort geocoding their own place records
  • The WHG API allows the index to be connected to tools such as Pelagios' Recogito. When it is, you and others will have a far easier time annotating historical texts with authority records (and coordinates) for place references.
  • While simply flagging an uploaded dataset as "public" effectively publishes it, the extra step of accessioning to the index makes it that much more useful in the growing ecosystem of linked historical geodata.

Community

The WHG project belongs to a growing community interested in linking information about historical places and linking historical information from multiple disciplines via place. As such we are active partners in the Pelagios Network.

Domains of Interest

Pleiades, the "community-built gazetteer and graph of ancient places," is a trailblazing project that more than a decade ago began gathering, curating and sharing contributed data, focused on the Mediterranean region in antiquity. Its success has been instructive in several ways. The project was and is the product of a community—in its case classicists, archaeologists, and historians of the region and period. It has a dozen volunteer editors, numerous reviewers, and continues to grow in depth and breadth.

We at World Historical Gazetteer anticipate our own data aggregation and publishing platform will grow by virtue of similar geographic and temporal "domains of interest." We aim to prioritize projects about the Global South. Our list of early and prospective contributors bears this out—it is spatiotemporally clustered. For example, Werner Stangl's HGIS de las Indias has seeded an early Latin American domain, and several other contributions of colonial and pre-colonial Latin America data are expected soon. Other emerging clusters include: Dutch History, the "Atlantic World," the Ottoman Empire, the Islamic World, Central Eurasia, and China.

Expanding coverage of linked data resources to include under-represented areas like the Global South are an important priority.

Register/Login/User Profile

Registration and login are required to be able to upload datasets, designate collaborators, to use our reconciliation services to find matches in modern placename authorities (Wikidata and Getty Thesaurus of Geographic Names (TGN) so far), and ultimately, to contribute your data to the WHG index. Once logged in, top-level menu options for "Data", and a user profile appear.

Search :: Places

There are two place data stores in WHG, therefore two search options: our "union index," and the WHG database.

Our union index holds records for about 1.8 million places (having over 3 million names) that have been fully accessioned. That is, to the extent possible, records for the same place are linked, and returned together in a set. Typing a name then pressing the {Enter} key performs a search, presents a list of results, and maps those which have geometry (not all do). Clicking a result item will highlight it on the map, providing further context. Clicking the name link takes you to the index record's "portal" page where any number of attestation "cards" drawn from our core datasets (grey banner) and multiple contributed datasets (beige banner) are gathered.

The database search option queries records in the WHG database from all datasets flagged as "public," whether they have been fully accessioned (i.e. reconciled against the union index) or not. Datasets are made public on request by their owners, and following a review by WHG editorial staff.

Pre-filters are available to constrain both kinds of place searches: a) broad feature class, b) temporal (earliest, latest years), and c) spatial (bounds of world regions, modern countries, and user "study areas."). Results can be sorted and further filtered by specific place type or modern country bounds.

Search :: Traces

Search for trace data runs against a separate index. At this time there are only a few dozen example trace records. Try typing 'empire' or "buddhist" to see a few examples. Selecting one of the "auto-suggestions" will list and map the places referenced in the trace record. As with places, clicking a result item highlights the place on the map, and clicking its name takes you to its place portal page.

Integration of trace data is at an early experimental stage. We welcome suggestions for how trace data can be better integrated into the WHG interface.

The Data dashbord page lists any Datasets created by a user or for which they are a collaborator, as well as any Study Areas and Collections they have created. Clicking an "add new" or "create new" link () starts the process of creating and managing a Dataset, Study Area, or Collection.

Datasets
Place data

Registered users can create Place datasets by uploading files in one of two formats: the expressive GeoJSON-LD based Linked Places format ("LP format" for short), or the simpler LP-TSV format. Considerations for making the choice are found in the "Choosing an upload data format" tutorial.

Help icons () for each field in the "Upload dataset" form provide popup instruction for filling it.

Uploaded data files are validated for adherence to the relevant format spec, LP or LP-TSV. If there are formatting errors, details of the errors are displayed (insofar as possible) and after correcting them the file upload can be attempted again.

Upon successful upload, a Dataset Portal page is displayed.

Trace data

Automated trace data uploads are not yet enabled. Contact the WHG team with inquiries about creating and adding new trace dataesets.

Study Areas

These are user-created named polygon bounds used to constrain the searches used in the reconciliation process. See the Reconciliation section to the right of this one.

Collections

Planned, but not yet available. Users would be able to add place and trace records to personal collections, which can be mapped, edited, and optionally, shared. Would be useful for teaching scenarios.

The owner's Dataset page has several tabbed sections: Metadata, Browse, Reconciliation, Sharing, and Log/Comments.

Metadata

This section displays user-created and auto-generated metadata for a dataset and its most recent uploaded data file source. Several fields can be edited. Status statistics are displayed on the right side of the screen: initial counts of rows, name variants, links, and geometries, as well as counts of link and geometry records added during the reconciliation review process. A feature for updating a dataset with a new file upload is in progress. In the meantime, contact us if this is a consideration.

Browse

This section combines a sortable, searchable list of the records currently in the dataset, and a map displaying any geometry it includes. Once a reconciliation task has been run, a column will appear here to view and filter by review status. Note that new links and geometry from authority records matched in the reconciliation review process are reflected here, as new place_link and/or place_geom records are written with each match.

Reconciliation

Reconciliation tasks are initiated from this section and listed for access to review screens; a process outlined in a later section. A summary of the initial results is generated and displayed in a list, with links provided to access the review screen,

It is also possible to delete the task, its associated hits, and any match records created in review work. Caution! There is no recovery from these clearing actions!

Contributing a dataset to WHG

After a dataset has been uploaded and, using our reconciliation services, augmented with as many links (matches) to modern authorities as possible (see Reconciliation), it can be considered for accessioning—that is, contributed to the WHG "union index." At this time all accessioning will be initiated by the WHG project team, and review performed by the dataset owner and collaborators.

Accessioning a dataset to WHG entails reconciling its records to the WHG index. Each record is compared with the WHG index to see if the referenced place already has one or more attestations from another dataset. If an incoming record has a concordance "link"" in common with with one already in the index, it will be automatically indexed as a "child" or "sibling" of the matched record (owners have the opton to review these or auto-accept them).

Records for which there are no prospective matches are automatically indexed as a "parent," in effect a new seed record for a place. Records for which there are prospective matches are queued for review, just as with the Wikidata and TGN reconciliation tasks. Matched records become part of the set for a given place. Unmatched records become parent "seeds."

In this way, accessioning relies on incoming Place records having as many associated "place_link" records as can be obtained.

Sharing

Owners of a dataset can designate any number of registered WHG users as collaborators, in either a "co-owner" or "member" role. Co-owners have complete control over a dataset; members can view the dataset and perform review of prospective matches generated by reconciliation tasks.

Log & Comments

Actions related to datasets are logged and listed here.

On the Reconciliation Review and Place Portal pages, users can create comments specific to a database record. Comments for all places in a dataset are listed here. NOTE: Comments suggest followup action, e.g. correction of errors. We are contemplating how such corrections might be accomplished within the WHG interface.

Reconciliation is the process of linking your place records to existing records in online place name authorities—including, as a last step to the WHG union index. The external reconciliation sources offered by WHG are Wikidata and Getty TGN. DBpedia and GeoNames may be added in the future. The purpose of reconciliation to external sources is to augment a dataset with new concordances ("place_link" records in WHG), and optionally, new geometry ("place_geom" records) derived from the authority.

The primary motivation for many users is finding geographic coordinates for unlocated place names, in order to make their data more fully mappable and amenable to spatial and network analyses. Beyond that, making a dataset as rich with links to external authorities as possible is a crucial step in making it a solid contribution to the WHG index. Once historical place names are geolocated, it is to everyone's benefit if that work is shared!

In each case, the authority data store is queried for matches with your records, one by one. In fact, each query consists of multiple "passes." The first looks for any authority identifiers in common. If found, the records can be automatically linked. The next pass includes all the context your records may contain: a primary name plus all variants; one or more place type; bounding modern countries or user-defined Study Area as a spatial constraint; coordinate geometry for the feature; and name(s) of "parent" entities. Subsequent passes relax the query if no potential matches (hits) are found. Resulting hits for all records are queued for review by the dataset creator, in batches labeled "pass 0," "pass 1" and so on.

Wikidata

WHG maintains a locally indexed copy of about 3.6 million Wikidata place records. Wikidata reconciliation tasks process 150-180 records per minute. Almost all Wikidata records contain geometry and concordances with other authorities. When you confirm a Wikidata match, we create a "place_link" record not only for the Wikidata ID but for concordances with several other authorities if found, including Getty TGN, Bibliotèque nationale de France, Pleiades, Wikipedia, GeoNames, VIAF, Deutsche National Bibliothek, and Library of Congress.

Getty TGN

WHG also maintains a locally indexed copy of about 1.8 million place records retrieved from a TGN dump file in March, 2018. Almost all TGN records include a point geometry, but have no concordances with other authorities and no structured temporal attributes.

We hope to periodically update these indexes in the future

Study Areas

Many toponyms appear repeatedly in multiple locations, referring to different places, often far apart (e.g. Latin America and the Iberian Peninsula). To aid the reconciliation process, users can define a Study Area that will constrain the search for matches to its bounds, by a) entering a series of 2-letter country codes, which will generate a hull shape, or b) by drawing a polygon on a map. Alternatively, a pre-defined region can be chosen from a separate dropdown menu.

Reviewing "hits"

Prospective matches to external authorities are not automatically added to the WHG database during reconciliation; i.e. new "place_geom" and "place_link" records augmenting the dataset are created only by the Reconciliation Review step performed by the dataset creator or specified collaborators.

The Review page presents those dataset records that got one or more prospective matches ("hits') on the left of the screen, one by one, and a list of those hits on the right. A small map displays geometry for your record with a green marker, and geometries of all hits with orange markers. Hovering over the globe symbol () in a hit item highlights its position in the map.

The objective is to determine, for each dataset record, whether any of the hits are a "closeMatch" to it. By default, "no match" is selected for each hit. The reviewer can optionally change the selection to "closeMatch" for one or more hits. In any case, clicking save records the choice and advances to the next record. Attributes of the dataset record and the hits provide context to assist making the assessment.

What is a Match?

The term closeMatch used by WHG comes from the Simple Knowledge Organization System (SKOS) vocabulary, a data model commonly used in linked data applications. For WHG, a Place is considered a SKOS:Concept, described by data records in Linked Places format, so assertions of a skos:closeMatch between your record and that of an external authority indicates:

"...(the) two concepts are sufficiently similar that they can be used interchangeably in some information retrieval applications"

in our union index" only, and returned together in response to queries. For example, records for Abyssinia and Ethiopia from different sources share two links, to a DBPedia record and a TGN record. Therefore, they appear together when searching for either Abyssinia or Ethiopia. They are not conflated or linked in any way within the WHG database.

What now?

WHG v2.1 includes several significant improvements to v1, but development of the platform is still at a relatively early stage. The highest immediate priority is adding significantly more historical data. Apart from adding content, several additional features are planned, and grant applications to support those activities are pending. We are hopeful that WHG can be sustained and improved over the long term.

What is a gazetteer?

In its simplest form, a gazetteer is a list of place names. Typically, digital gazetteers provide some level of description for each listed place, e.g. its type and geographic coordinates. Historical gazetteers include prior names and some level of temporal information. The Linked Places format (spec ; tutorial) used by WHG allows us to record temporally scoped name variants, coordinates, place types, and relations with other places, as well as related descriptions and depictions.

Who uses historical gazetteers?

Historical gazetteers are useful for anyone concerned with the history of a place or group of places, including researchers, teachers, and students. They help connect our present to our past.

What is a place?

The term has multiple related meanings. A few we like: (i) one answer to a where question, (ii) a setting for events and activity, (iii) "...an object resulting from a shared identification of a location. As an object, it may become a part of a network and participate in events" (Purves, Winter & Kuhn 2019 ). Attributes of places include names, locations, and types—all of which routinely change over time.

What is a place in WHG?

A place record represents one or more attestations of a place found in historical sources or in modern gazetteers and name authorities. Due to our use of Linked Places format, a WHG place record may include any number of names, types, locations (geometry), relations, descriptions, depictions, and links with other records. The entire record can be temporally scoped with a "when" assertion, as can any individual name, type, geometry or relation.

What is the geographic and temporal scope of WHG?

The geographic scope is global, and the temporal scope is roughly the span of written history.

Are some traces places, and vice versa?

A number of spatial-temporal entities could be modeled as either. For example: a dynasty, or an historical route.

Are there "preferred names" in WHG?

No. A place record can include any number of name and language variants; in fact they are encouraged. But we ask that each record have an assigned "title," which serves principally as a headword in lists. The title of records in our union index is the title of its "seed" record.

Does WHG suport multiple languages?

The name variants found in WHG are of numerous languages and scripts. Unfortunately, to date most of the contributed names do not arrive tagged with language-script codes. Separately, the internationalization of the WHG site is a high priority for a future phase of work.

How and why does WHG use modern country boundaries?

Modern country boundaries are used in WHG primarily to filter queries and spatially constrain reconciliation results. Country codes are included in place records for this purpose and do not reflect a given place's historical "containment" in, or association with, administrative areas at any given time in the past.

What is the WHG vocabulary of place types?

We have adopted a set of 163 place type concepts drwan from the approximately 900 contained in the Getty Art & Achitecture Thesuarus . Our focus in selecting these was on settlements (inhabited places), administrative divisions, sites, and natural features. Place types are an important facet for search and filtering of place records.

Can contributions include urban-scale features?

We are not actively soliciting urban-scale place data (e.g. buildings, streets, monuments, plazas), but recognize there is growing interest in systems to manage such information.