The things OpenAlex is made out of
The OpenAlex dataset describes scholarly entities and how those entities are connected to each other. There are five types of entities, each with its own object:
The OpenAlex ID is the primary key for all entities. It's a URL shaped like this:
https://openalex.org/<OpenAlex_key>. Here's a real-world example:
The OpenAlex ID has two parts. The first part is the Base; it's always
https://openalex.org/.The second part is the Key; it's the unique primary key that identifies a given resource in our database.
The key starts with a letter; that letter tells you what kind of entity you've got: W(ork), A(uthor), V(enue), I(nstitution), or C(oncept). The IDs are not case-sensitive, so
w2741809807is just as valid as
W2741809807. So in the example above, the Key is
W2741809807, and the
Wat the front tells us that this is a
Because OpenAlex was launched as a replacement for Microsoft Academic Graph (MAG), OpenAlex IDs are designed to be backwards-compatible with MAG IDs, where they exist. To find the MAG ID, just take the first letter off the front of the unique part of the ID (so in the example above, the MAG ID is
2741809807). Of course this won't yield anything useful for entities that don't have a MAG ID.
An OpenAlex ID is a URL that identifies a resource (data about an entity). You can use content negotiation to request this same resource in different formats. Currently this means either using the ID in its default form to get a webpage, or appending
.jsonto it to get a JSON API response:
Sometimes we have two Entities, and thus two IDs, that refer to the same person or thing in the real world. This poses a problem: If we learn that Entities A and B refer to the same thing, for example if two author IDs refer to the same person, what do we do with them? We can't delete one since both IDs need to work forever, but it needs to be clear that both IDs represent the same person.
Our solution to this problem to "merge" the IDs. If authors A1234 and A5678 are the same person, and we decide to keep A5678 as this person's canonical ID, we change all internal references to A1234 to A5678 and update all relevant data, for example A5678 will be credited with A1234's Works. Inside OpenAlex, A1234 effectively is deleted, but we we have to take a few extra steps to keep A1234 working in our API and in any copies of the snapshot.
Every entity has an OpenAlex ID. Most entities also have IDs in other systems, too. There are hundreds of different ID systems, but we've selected a single external ID system for each entity to provide the Canonical External ID--this is the ID in the system that's been most fully adopted by the community, and is most frequently used in the wild. We support other external IDs as well, but the canonical ones get a privileged spot in the API and dataset.
These are the Canonical External IDs:
The full entity objects can get pretty unwieldy, especially when you're embedding a list of them in another object (for instance, a list of
Concepts in a
Work). For these cases, all the entities except
Works have a dehydrated version. This is a stripped-down representation of the entity that carries only its most essential properties. These properties are documented individually on their respective entity pages.