Get N-grams
N-grams are groups of sequential words that occur in the text of a Work.
Last updated
N-grams are groups of sequential words that occur in the text of a Work.
Last updated
N-grams list the words and phrases that occur in the full text of a Work
. We obtain them from Internet Archive's publicly (and generously 👏) available General Index and use them to enable fulltext searches on the Works that have them, through both the fulltext.search
filter, and as an element of the more holistic search
parameter.
Note that while n-grams are derived from the fulltext of a Work, the presence of n-grams for a given Work doesn't imply that the fulltext is available to you, the reader. It only means the fulltext was available to Internet Archive for indexing. Work.open_access
is the place to go for information on public fulltext availability.
The n-gram API endpoint is not currently in service. The n-grams are still used on our backend to help power fulltext search. If you have any questions about this, please submit a support ticket.
You can see which works we have full-text for using the has_fulltext
filter. This does not necessarily mean that the full text is available to you, dear reader; rather, it means that we have indexed the full text and can use it to help power searches. If you are trying to find the full text for yourself, try looking in open_access.oa_url
.
We get access to the full text in one of two ways: either using an open-access PDF, or using N-grams obtained from the Internet Archive. You can learn where a work's full text came from at fulltext_origin
.
About 57 million works have n-grams coverage through Internet Archive. OurResearch is the first organization to host this data in a highly usable way, and we are proud to integrate it into OpenAlex!
Curious about n-grams used in search? Browse them all via the API. Highly-cited works and less recent works are more likely to have n-grams, as shown by the coverage charts below: