OpenAlex technical documentation
  • Overview
  • Quickstart tutorial
  • API Entities
    • Entities overview
    • 📄Works
      • Work object
        • Authorship object
        • Location object
      • Get a single work
      • Get lists of works
      • Filter works
      • Search works
      • Group works
      • Get N-grams
    • 👩Authors
      • Author object
      • Get a single author
      • Get lists of authors
      • Filter authors
      • Search authors
      • Group authors
      • Limitations
      • Author disambiguation
    • 📚Sources
      • Source object
      • Get a single source
      • Get lists of sources
      • Filter sources
      • Search sources
      • Group sources
    • 🏫Institutions
      • Institution object
      • Get a single institution
      • Get lists of institutions
      • Filter institutions
      • Search institutions
      • Group institutions
    • 💡Topics
      • Topic object
      • Get a single topic
      • Get lists of topics
      • Filter topics
      • Search topics
      • Group topics
    • 🗝️Keywords
    • 🏢Publishers
      • Publisher object
      • Get a single publisher
      • Get lists of publishers
      • Filter publishers
      • Search publishers
      • Group publishers
    • 💰Funders
      • Funder object
      • Get a single funder
      • Get lists of funders
      • Filter funders
      • Search funders
      • Group funders
    • 🌎Geo
      • Continents
      • Regions
    • Concepts
      • Concept object
      • Get a single concept
      • Get lists of concepts
      • Filter concepts
      • Search concepts
      • Group concepts
    • Aboutness endpoint (/text)
  • How to use the API
    • API Overview
    • Get single entities
      • Random result
      • Select fields
    • Get lists of entities
      • Paging
      • Filter entity lists
      • Search entities
      • Sort entity lists
      • Select fields
      • Sample entity lists
      • Autocomplete entities
    • Get groups of entities
    • Rate limits and authentication
  • Download all data
    • OpenAlex snapshot
    • Snapshot data format
    • Download to your machine
    • Upload to your database
      • Load to a data warehouse
      • Load to a relational database
        • Postgres schema diagram
  • Additional Help
    • Tutorials
    • Report bugs
    • FAQ
Powered by GitBook
On this page
Export as PDF
  1. Download all data

OpenAlex snapshot

PreviousRate limits and authenticationNextSnapshot data format

Last updated 1 year ago

For most use cases, the REST API is your best option. However, you can also download () and install a complete copy of the OpenAlex database on your own server, using the database snapshot. The snapshot consists of seven files (split into smaller files for convenience), with one file for each of our seven entity types. The files are in the format; each line is a JSON object, exactly the same as . The properties of these JSON objects are documented in each entity's object section (for example, the object).

The snapshot is updated about once per month; you can read

If you've worked with a dataset like this before, the may be all you need to get going. If not, read on.

The rest of this guide will tell you how to (a) download the snapshot and (b) upload it to your own database. We’ll cover two general approaches:

  • Load the intact OpenAlex records to a data warehouse (we’ll use as an example) and use native JSON functions to query the , , , , , and objects directly.

  • Flatten the records into a normalized schema in a relational database (we’ll use ) while preserving the relationships between objects.

We'll assume you're initializing a fresh snapshot. To keep it up to date, you'll have to take the information from and generalize from the steps in the guide.

This is hard. Working with such a big and complicated dataset hardly ever goes according to plan. If it gets scary, try the . In fact, try the REST API first. It can answer most of your questions and has a much lower barrier to entry.

There’s more than one way to do everything. We’ve tried to pick one reasonable default way to do each step, so if something doesn’t work in your environment or with the tools you have available, let us know.

Up next: the snapshot , and .

instructions here
JSON Lines
you'd get from our API
Work
release notes for each new update here.
snapshot data format
BigQuery
Work
Author
Source
Institution
Concept
Publisher
PostgreSQL
REST API
data format
downloading the data
getting it into your database
Downloading updated Entities