Work object
Last updated
Last updated
There's a lot of useful data inside a work. When you use the API to get a or , this is what's returned.
abstract_inverted_index
Object: The abstract of the work, as an , which encodes information about the abstract's words and their positions within the text. , OpenAlex doesn't include plaintext abstracts due to legal constraints.
Newer works are more likely to have an abstract inverted index. For example, over 60% of works in 2022 have abstract data, compared to 45% for works older than 2000. Full chart is below:
alternate_host_venues
(deprecated)authorships
apc_list
value
: Integer
currency
: String
provenance
: String — the source of this data. Currently the only value is “doaj” (DOAJ)
value_usd
: Integer — the APC converted into USD
apc_paid
value
: Integer
currency
: String
provenance
: String — currently either openapc
or doaj
, but more will be added; see below for details.
value_usd
: Integer — the APC converted into USD
best_oa_location
We score open locations to determine which is best using these factors:
Must have is_oa: true
type_:_ "publisher" is better than "repository".
version: "publishedVersion" is better than "acceptedVersion", which is better than "submittedVersion".
pdf_url: A location with a direct PDF link is better than one without.
repository rankings: Some major repositories like PubMed Central and arXiv are ranked above others.
biblio
Object: Old-timey bibliographic info for this work. This is mostly useful only in citation/reference contexts. These are all strings because sometimes you'll get fun values like "Spring" and "Inside cover."
volume
(String)
issue
(String)
first_page
(String)
last_page
(String)
citation_normalized_percentile
cited_by_api_url
cited_by_count
Integer: The number of citations to this work. These are the times that other works have cited this work: Other works ➞ This work.
concepts
Each Concept
object in the list also has one additional property:
Concepts with a score of at least 0.3 are assigned to the work. However, ancestors of an assigned concept are also added to the work, even if the ancestor scores are below 0.3.
corresponding_author_ids
corresponding_institution_ids
countries_distinct_count
counts_by_year
Any citations older than ten years old aren't included. Years with zero citations have been removed so you will need to add those in if you need them.
created_date
display_name
doi
fulltext_origin
This attribute is only available for works with has_fulltext:true
.
fwci
grants
has_fulltext
host_venue
(deprecated)id
ids
Object: All the external identifiers that we know about for this work. IDs are expressed as URIs whenever possible. Possible ID types:
indexed_in
List: The sources this work is indexed in. Possible values: arxiv
, crossref
, doaj
, pubmed
.
institutions_distinct_count
is_paratext
In our context, paratext is stuff that's in a scholarly venue (like a journal) but is about the venue rather than a scholarly work properly speaking. Some examples and nonexamples:
yep it's paratext: front cover, back cover, table of contents, editorial board listing, issue information, masthead.
no, not paratext: research paper, dataset, letters to the editor, figures
Turns out there is a lot of paratext in registries like Crossref. That's not a bad thing... but we've found that it's good to have a way to filter it out.
We determine is_paratext
algorithmically using title heuristics.
is_retracted
Boolean: True if we know this work has been retracted.
keywords
The score for each keyword represents the similarity score of that keyword to the title and abstract text of the work.
We provide up to 5 keywords per work, for all keywords with scores above a certain threshold.
language
A few things to keep in mind about this:
We don't always assign a language if we do not have enough words available to accurately guess.
We report the language of the metadata, not the full text. For example, if a work is in French, but the title and abstract are in English, we report the language as English.
In some cases, abstracts are in two different languages. Unfortunately, when this happens, what we report will not be accurate.
license
String: The license applied to this work at this host. Most toll-access works don't have an explicit license (they're under "all rights reserved" copyright), so this field generally has content only if is_oa
is true
.
locations
locations_count
mesh
open_access
primary_location
primary_topic
Object
publication_date
Where different publication dates exist, we usually select the earliest available date of electronic publication.
publication_year
Integer: The year this work was published.
referenced_works
related_works
sustainable_development_goals
List: List of objects
We display all of the SDGs with a prediction score higher than 0.4.
topics
List: List of objects
title
String: The title of this work.
type
String: The type of the work.
(Note that distinguishing between journals and conferences is a hard problem, one we often get wrong. We are working on improving this, but we also point out that the two have a lot of overlap in terms of their roles as hosts of research publications.)
Works that are hosted primarily on a preprint, or that are identified speicifically as preprints in the metadata we receive, are assigned the type preprint
rather than article
.
Works that represent stuff that is about the venue (such as a journal)—rather than a scholarly work properly speaking—have type paratext
. These include things like front-covers, back-covers, tables of contents, and the journal itself (e.g., https://openalex.org/W4232230324
).
We also have types for letter
, editorial
, erratum
(corrections), libguides
, supplementary-materials
, and review
(currently, articles that come from journals that exclusively publish review articles). Coverage is low on these but will improve.
type_crossref
String: Legacy type information, using Crossref's "type" controlled vocabulary.
Where possible, we just pass along Crossref's type
value for each work. When that's impossible (eg the work isn't in Crossref), we do our best to figure out the type
ourselves.
updated_date
OpenAccess
objectThe OpenAccess
object describes access options for a given work. It's only found as part of the Work
object.
any_repository_has_fulltext
is_oa
Boolean: True
if this work is Open Access (OA).
oa_status
String: The Open Access (OA) status of this work. Possible values are:
gold
: Published in a fully OA journal.
bronze
: Free to read on the publisher landing page, but without any identifiable license.
closed
: All other articles.
oa_url
String: The best Open Access (OA) URL for this work.
This URL might be a direct link to a PDF, or it might be to a landing page that links to the free PDF
The host_venue
and alternate_host_venues
properties have been deprecated in favor of and . The attributes host_venue
and alternate_host_venues
are no longer available in the Work object, and trying to access them in filters or group-bys will return an error.
List: List of objects, each representing an author and their institution. the first 100 authors to maintain API performance.
For more information, see the page.
Object: Information about this work's APC (). The object contains:
This value is the APC list price–the price as listed by the journal’s publisher. That’s not always the price actually paid, because publishers may offer various discounts to authors. Unfortunately we don’t always know this discounted price, but when we do you can find it in .
Currently our only source for this data is , and so doaj
is the only value for apc_list.provenance
, but we’ll add other sources over time.
We currently don’t have information on the list price for hybrid journals (toll-access journals that also provide an open-access option), but we will add this at some point. We do have information for hybrid OA works occasionally.
You can use this attribute to find works published in journals by looking at works where apc_list.value
is zero. See for more info.
Object: Information about the paid APC () for this work. The object contains:
You can find the listed APC price (when we know it) for a given work using . However, authors don’t always pay the listed price; often they get a discounted price from publishers. So it’s useful to know the APC actually paid by authors, as distinct from the list price. This is our effort to provide this.
Our best source for the actually paid price is the project. Where available, we use that data, and so apc_paid.provenance
is openapc
. Where OpenAPC data is unavailable (and unfortunately this is common) we make our best guess by assuming the author paid the APC list price, and apc_paid.provenance will be set to wherever we got the list price from.
Object: A object with the best available open access location for this work.
Object: The percentile of this work's citation count normalized by work type, publication year, and subfield. This field represents the same information as the FWCI expressed as a percentile. Learn more in the reference article: .
String: A URL that uses the filter to display a list of works that cite this work. This is a way to expand into an actual list of works.
List: List of dehydrated .
score
(Float): The strength of the connection between the work and this concept (higher is stronger). This number is produced by AWS Sagemaker, in the last layer of the that assigns concepts.
List: of any authors for which is true
.
List: of any institutions found within an authorship
for which is true
.
Integer: Number of distinct country_codes
among the for this work.
List: for each of the last ten years, binned by year. To put it another way: each year, you can see how many times this work was cited.
String: The date this Work
object was created in the OpenAlex dataset, expressed as an date string.
String: Exactly the same as . It's useful for Work
s to include a display_name
property, since all the other entities have one.
String: The DOI for the work. This is the for works.
Occasionally, a work has more than one DOI--for example, there might be one DOI for a preprint version hosted on , and another DOI for the . However, this field always has just one DOI, the DOI for the published work.
String: If a work's full text is searchable in OpenAlex ( is true
), this tells you how we got the text. This will be one of:
pdf
: We used to get the text from an open-access PDF.
ngrams
: Full text search is enabled using .
Float: The Field-weighted Citation Impact (FWCI), calculated for a work as the ratio of citations received / citations expected in the year of publications and three following years. Learn more in the reference article: .
List: List of grant objects, which include the and the award ID, if available. Our grants data comes from Crossref, and is currently fairly limited.
Boolean: Set to true
if the work's full text is searchable in OpenAlex. This does not necessarily mean that the full text is available to you, dear reader; rather, it means that we have indexed the full text and can use it to help power . If you are trying to find the full text for yourself, try looking in .
We get access to the full text in one of two ways: either using an open-access PDF, or using . You can learn where a work's full text came from at .
The host_venue
and alternate_host_venues
properties have been deprecated in favor of and . The attributes host_venue
and alternate_host_venues
are no longer available in the Work object, and trying to access them in filters or group-bys will return an error.
String: The for this work.
doi
(String: The . Same as )
mag
(Integer: the ID)
openalex
(String: The . Same as )
pmid
(String: The )
pmcid
(String: the )
Integer: Number of distinct among the for this work.
Boolean: True if we think this work is .
We identify works that have been retracted using the public , a public resource made possible by a partnership between Crossref and The Center for Scientific Integrity.
List of objects: Short phrases identified based on works' Topics. For background on how Keywords are identified, see .
String: The language of the work in . The language is automatically detected using the information we have about the work. We use the software library on the words in the work's abstract, or the title if we do not have the abstract. The source code for this procedure is Keep in mind that this method is not perfect, and that in some cases the language of the title or abstract could be different from the body of the work.
List: A list of objects describing all unique places where this work lives.
Integer: Number of for this work.
List: List of tag objects. Only works found in have MeSH tags; for all other works, this is an empty list.
Object: Information about the access status of this work, as an object.
Object: A object with the primary location of this work.
The primary_location
is where you can find the best (closest to the ) copy of this work. For a peer-reviewed journal article, this would be a full text published version, hosted by the publisher at the article's DOI URL.
The top ranked for this work. This is the same as the first item in .
String: The day when this work was published, formatted as an date.
This date applies to the version found at . The other versions, found in , may have been published at different (earlier) dates.
This year applies to the version found at . The other versions, found in , may have been published in different (earlier) years.
List: for works that this work cites. These are citations that go from this work out to another work: This work ➞ Other works.
List: for works related to this work. Related works are computed algorithmically; the algorithm finds recent papers with the most concepts in common with the current paper.
The United Nations' are a collection of goals at the heart of a global "shared blueprint for peace and prosperity for people and the planet." We use a machine learning model to tag works with their relevance to these goals based on our , an mBERT machine learning model developed by the . The score
represents the model's predicted probability of the work's relevance for a particular goal.
The top ranked for this work. We provide up to 3 topics per work.
This is exactly the same as . We include both attributes with the same information because we want all entities to have a display_name
, but there's a longstanding tradition of calling this the "title," so we figured you'll be expecting works to have it as a property.
You can see all of the different types along with their counts in the OpenAlex API here: .
Most works are type article
. This includes what was formerly (and currently in ) labeled as journal-article
, proceedings-article
, and posted-content
. We consider all of these to be article
type works, and the distinctions between them to be more about where they are published or hosted:
Journal articles will have a of journal
Conference proceedings will have a of conference
Preprints or "posted content" will have a of submittedVersion
Other work types follow the Crossref "type" controlled vocabulary—see .
These are the work types that we used to use, before switching to our current system (see ).
You can see all possible values of Crossref's "type" controlled vocabulary via the Crossref api here: .
String: The last time anything in this Work
object changed, expressed as an date string (in UTC). This date is updated for any change at all, including increases in various counts.
Boolean: True
if any of this work's has location.is_oa=true
and location.source.type=repository
.
Use case: researchers want to track Green OA, using a definition of "any repository hosts this." OpenAlex's definition (as used in ) doesn't support this, because as soon as there's a publisher-hosted copy (bronze, hybrid, or gold), oa_status is set to that publisher-hosted status.
So there's a lot of repository-hosted content that the oa_status
can't tell you about. Our calls this "shadowed Green." This feature makes it possible to track shadowed Green.
There are . OpenAlex uses a broad definition: having a URL where you can read the fulltext of this work without needing to pay money or log in. You can use the and fields to narrow your results further, accommodating any definition of OA you like.
: Published in a fully OA journal—one that is indexed by the or that we have determined to be OA—with no article processing charges (i.e., free for both readers and authors).
green
: Toll-access on the publisher landing page, but there is a free copy in an .
hybrid
: Free under an in a toll-access journal.
Although there are , in this context an OA URL is one where you can read the fulltext of this work without needing to pay money or log in. The "best" such URL is the one closest to the version of record.