Last updated
Last updated
First off: anyone can get the data for free. While the files are hosted on and we’ll be using Amazon tools in these instructions, you don’t need an Amazon account.
Many thanks to the . They cover the data-transfer fees (about $70 per download!) so users don't have to.
Before you load the snapshot contents to your database, you’ll need to get the files that make it up onto your own computer. There are exceptions, like or using an ETL product like with an S3 connector. If either of these apply to you, see if the is enough to get you started.
The easiest way to get the files is with the Amazon Web Services Command Line Interface (AWS CLI). Sample commands in this documentation will use the AWS CLI. You can find instructions for installing it on your system here:
You can also browse the snapshot files using the AWS console here: . This browser and the CLI will work without an account.
This shell command will copy everything in the openalex
S3 bucket to a local folder named openalex-snapshot
. It'll take up roughly 300GB of disk space.
If you download the snapshot into an existing folder, you'll need to use the --delete
flag to remove files from any previous downloads. You can also remove the contents of destination folder manually. If you don't, you will see duplicate Entities that have moved from one file to another between snapshot updates.
The size of the snapshot will change over time. You can check the current size before downloading by looking at the output of:
You should get a file structure like this (edited for length - there are more objects in the actual bucket):