Last updated
Last updated
For most use cases, the REST API is your best option. However, you can also download () and install a complete copy of the OpenAlex database on your own server, using the database snapshot. The snapshot consists of seven files (split into smaller files for convenience), with one file for each of our seven entity types. The files are in the format; each line is a JSON object, exactly the same as . The properties of these JSON objects are documented in each entity's object section (for example, the object).
The snapshot is updated about once per month; you can read
If you've worked with a dataset like this before, the may be all you need to get going. If not, read on.
The rest of this guide will tell you how to (a) download the snapshot and (b) upload it to your own database. We’ll cover two general approaches:
Load the intact OpenAlex records to a data warehouse (we’ll use as an example) and use native JSON functions to query the , , , , , and objects directly.
Flatten the records into a normalized schema in a relational database (we’ll use ) while preserving the relationships between objects.
We'll assume you're initializing a fresh snapshot. To keep it up to date, you'll have to take the information from and generalize from the steps in the guide.
This is hard. Working with such a big and complicated dataset hardly ever goes according to plan. If it gets scary, try the . In fact, try the REST API first. It can answer most of your questions and has a much lower barrier to entry.
There’s more than one way to do everything. We’ve tried to pick one reasonable default way to do each step, so if something doesn’t work in your environment or with the tools you have available, let us know.
Up next: the snapshot , and .