Filter and group with API
We'll use Python and requests in these examples. You can use the code as a starting point, or just read the API results in your browser and follow along. How to build the API requests is the important part - you probably won't learn any new Python tricks.
There's now a Jupyter notebook version of this tutorial! And we'll be adding more Jupyter notebook tutorials soon.
More specifically, of the works that:
- were published in the last 10 years
The first thing we'll need to do is filter Works by institution. Looking at the available works filters, that's easy to do using
institutions.id
or institutions.ror
.But what's our institution's ID? First, let's say our institution is the University of Florida. If we want to use
institutions.ror
, we can look that up here: https://ror.org/search?page=1&query=university%20of%20floridahttps://ror.org/02y3ad647 looks right, so we can use that as the first part of our filter:
https://api.openalex.org/works?filter=institutions.ror:https://ror.org/02y3ad647
We can also use the institution's OpenAlex ID. To get that, we'll have to take a detour and filter Institutions using
display_name.search
:
https://api.openalex.org/institutions?filter=display_name.search:university of florida
In python:
institution = requests.get(
'https://api.openalex.org/institutions?filter=display_name.search:university of florida'
).json()['results'][0]
print(institution['display_name'])
print(institution['id'])
'University of Florida'
'https://openalex.org/I33213144'
The first result looks like the one we want, so we can use that to filter on
institutions.id
.
https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I33213144
Adding our other criteria, we build the the
filter
clause:institutions.id:https://openalex.org/I33213144,
is_paratext:false,
type:journal-article,
from_publication_date:2012-04-20
This will give us a list of about 76,000 works. Again, in python:
response_meta = requests.get(
'https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I33213144,is_paratext:false,type:journal-article,from_publication_date:2012-04-20'
).json()['meta']
print(response_meta['count'])
76247
{
"meta": {
"count": 2,
"db_response_time_ms": 76,
"page": 1,
"per_page": 200
},
"results": [],
"group_by": [
{
"key": "true",
"key_display_name": "true",
"count": 40949
},
{
"key": "false",
"key_display_name": "false",
"count": 35298
}
]
}
In python, we can calculate the fraction of the works that is OA:
r = requests.get(
'https://api.openalex.org/works?filter=institutions.id:https://openalex.org/I33213144,is_paratext:false,type:journal-article,from_publication_date:2012-04-20&group_by=is_oa'
)
groups = r.json()['group_by']
total_works = 0
oa_works = 0
for group in groups:
total_works += group['count']
if group['key'] == 'true':
oa_works += group['count']
print('total works: %d' % total_works)
print('oa works: %d' % oa_works)
print('oa percentage: %f' % (100 * oa_works/total_works))
total works: 76299
oa works: 40969
oa percentage: 53.695330
So from one API call, we know that 53.7% of the journal articles published by authors at the University of Florida in the last 10 years are OA.
Last modified 25d ago