Comment on page
Get N-grams
N-grams are groups of sequential words that occur in the text of a Work.
N-grams list the words and phrases that occur in the full text of a ) available General Index and use them to enable fulltext searches on the Works that have them, through both the
Work
. We obtain them from Internet Archive's publicly (and generously 👏
fulltext.search
filter, and as an element of the more holistic search
parameter.Note that while n-grams are derived from the fulltext of a Work, the presence of n-grams for a given Work doesn't imply that the fulltext is available to you, the reader. It only means the fulltext was available to Internet Archive for indexing.
Work.open_access
is the place to go for information on public fulltext availability.In addition to enabling fulltext search capabilities, a Work's n-grams are viewable directly through an endpoint that accepts either an OpenAlex ID or a DOI.
Unlike other API endpoints, n-grams are cached via CDN, which means this one is super fast, and you can call it as fast as you want - rate limits don't apply.
{
meta: {
count: 1068,
doi: "https://doi.org/10.1103/physrevb.37.785",
openalex_id: "https://openalex.org/W2023271753"
},
ngrams: [
{
ngram: "energy formula into a functional",
ngram_tokens: 5,
ngram_count: 1,
term_frequency: 0.0005452562704471102
},
{
ngram: "functional of the electron density",
ngram_tokens: 5,
ngram_count: 1,
term_frequency: 0.0005452562704471102
},
...
]
}
The ID-based link is provided in
Work.ngrams_url
if n-grams are available. Works with n-grams can be found using the Work.has_ngrams
filter, which can be combined with other filters using logical expressions.About 57 million works have n-grams coverage through Internet Archive. OurResearch is the first organization to host this data in a highly usable way, and we are proud to integrate it into OpenAlex!
Curious about n-grams used in search? Browse them all via the API. Highly-cited works and less recent works are more likely to have n-grams, as shown by the coverage charts below:
Last modified 10mo ago