pybliometrics.scopus.AbstractRetrieval

AbstractRetrieval() implements the Scopus Abstract Retrieval API.

It takes any identifier as main argument: Most of the time it will be a Scopus EID, but DOI, Scopus ID (the last part of the EID), PubMed identifier or Publisher Item Identifier (PII) work as well. AbstractRetrieval tries to infer the class itself - to speed this up you can tell the ID type via ID_type.

The Abstract Retrieval API allows a differing information depth via views, some of which are restricted. The view ‘META_ABS’ is the highest unrestricted view and contains all information from other unrestricted views. It is therefore the default view. The view with the most information content is ‘FULL’, which includes all information available with ‘META_ABS’, but is restricted. In generally you should always try to use view=’FULL’ when downloading an abstract and fall back to the default otherwise.

Table of Contents

Documentation

class pybliometrics.scopus.AbstractRetrieval(identifier=None, refresh=False, view='META_ABS', id_type=None, **kwds)[source]

Interaction with the Abstract Retrieval API.

Parameters
  • identifier (Union[int, str, None], optional) – The identifier of a document. Can be the Scopus EID , the Scopus ID, the PII, the Pubmed-ID or the DOI.

    Default: None

  • refresh (Union[bool, int], optional) – Whether to refresh the cached file if it exists or not. If int is passed, cached file will be refreshed if the number of days since last modification exceeds that value.

    Default: False

  • id_type (Optional[str], optional) – The type of used ID. Allowed values: None, ‘eid’, ‘pii’, ‘scopus_id’, ‘pubmed_id’, ‘doi’. If the value is None, the function tries to infer the ID type itself.

    Default: None

  • view (str, optional) – The view of the file that should be downloaded. Allowed values: META, META_ABS, REF, FULL, where FULL includes all information of META_ABS view and META_ABS includes all information of the META view. For details see https://dev.elsevier.com/sc_abstract_retrieval_views.html.

    Default: 'META_ABS'

  • kwds (str) – Keywords passed on as query parameters. Must contain fields and values listed in the API specification at https://dev.elsevier.com/documentation/AbstractRetrievalAPI.wadl.

Raises

ValueError – If any of the parameters id_type, refresh or view is not one of the allowed values.

Return type

None

Notes

The directory for cached results is {path}/{view}/{identifier}, where path is specified in your configuration file. In case identifier is a DOI, an underscore replaces the forward slash.

property abstract

The abstract of a document. Note: If this is empty, try property description instead.

property affiliation

A list of namedtuples representing listed affiliations in the form (id, name, city, country).

property aggregationType

Aggregation type of source the document is published in.

property authkeywords

List of author-provided keywords of the document.

property authorgroup

A list of namedtuples representing the article’s authors organized by affiliation, in the form (affiliation_id, dptid, organization, city, postalcode, addresspart, country, auid, indexed_name, surname, given_name). If “given_name” is not present, fall back to initials. Note: Affiliation information might be missing or mal-assigned even when it lookes correct in the web view. In this case please request a correction.

property authors

A list of namedtuples representing the article’s authors, in the form (auid, indexed_name, surname, given_name, affiliation). In case multiple affiliation IDs are given, they are joined on “;”. Note: The affiliation referred to here is what Scopus’ algorithm determined as the main affiliation. Property authorgroup provides all affiliations.

property citedby_count

Number of articles citing the document.

URL to Scopus page listing citing documents.

property chemicals

List of namedtuples representing chemical entities in the form (source, chemical_name, cas_registry_number). In case multiple numbers given, they are joined on “;”.

property confcode

Code of the conference the document belongs to.

property confdate

Date range of the conference the document belongs to represented by two tuples in the form (YYYY, MM, DD).

property conflocation

Location of the conference the document belongs to.

property confname

Name of the conference the document belongs to.

property confsponsor

Sponsor(s) of the conference the document belongs to.

property contributor_group

List of namedtuples representing contributors compiled by Scopus, in the form (given_name, initials, surname, indexed_name, role).

property correspondence

List of namedtuples representing the authors to whom correspondence should be addressed, in the form (surname, initials, organization, country, city_group). Multiple organziations are joined on semicolon.

property coverDate

The date of the cover the document is in.

property description

Return the description of a record. Note: If this is empty, try property abstract instead.

property doi

DOI of the document.

property eid

EID of the document.

property endingPage

Ending page. If this is empty, try .pageRange instead.

property funding

List of namedtuples parsed funding information in the form (agency string id acronym country).

property funding_text

The raw text from which Scopus derives funding information.

property isbn

ISBNs Optional[str] to publicationName as tuple of variying length, (e.g. ISBN-10 or ISBN-13).

property issn

ISSN belonging to the publicationName. Note: If E-ISSN is known to Scopus, this returns both ISSN and E-ISSN in random order separated by blank space.

property identifier

ID of the document (same as EID without “2-s2.0-“).

property idxterms

List of index terms (these are just one category of those Scopus provides in the web version) .

property issueIdentifier

Number of the issue the document was published in.

property issuetitle

Title of the issue the document was published in.

property language

Language of the article.

property openaccess

The openaccess status encoded in single digits.

property openaccessFlag

Whether the document is available via open access or not.

property pageRange

Page range. If this is empty, try .startingPage and .endingPage instead.

property pii

The PII (Publisher Item Identifier) of the document.

property publicationName

Name of source the document is published in.

property publisher

Name of the publisher of the document. Note: Information provided in the FULL view of the article might be more complete.

property publisheraddress

Name of the publisher of the document.

property pubmed_id

The PubMed ID of the document.

property refcount

Number of references of an article. Note: Requires either the FULL view or REF view.

property references

List of namedtuples representing references listed in the document, in the form (position, id, doi, title, authors, authors_auid, authors_affiliationid, sourcetitle, publicationyear, volume, issue, first, last, citedbycount, type, text, fulltext). position is the number at which the reference appears in the document, id is the Scopus ID of the referenced document (EID without the “2-s2.0-“), authors is a string of the names of the authors in the format “Surname1, Initials1; Surname2, Initials2”, authors_auid is a string of the author IDs joined on “; “, authors_affiliationid is a string of the authors’ affiliation IDs joined on “; “, sourcetitle is the name of the source (e.g. the journal), publicationyear is the year of the publication as a string, volume and issue, are strings referring to the volume and issue, first and last refer to the page range, citedbycount is a string for the total number of citations of the cited item, type describes the parsing status of the reference (resolved or not), text is Scopus-provided information on the publication, fulltext is the text the authors used for the reference.

Note: Requires either the FULL view or REF view. Might be empty even if refcount is positive. Specific fields can be empty. Author lists (authors, authors_auid, authors_affiliationid) may contain duplicates but None’s have been filtered out.

URL to the document page on Scopus.

URL to Scopus API page of this document.

property sequencebank

List of namedtuples representing biological entities defined or mentioned in the text, in the form (name, sequence_number, type).

property source_id

Scopus source ID of the document.

property sourcetitle_abbreviation

Abbreviation of the source the document is published in. Note: Requires the FULL view of the article.

property srctype

Aggregation type of source the document is published in (short version of aggregationType).

property startingPage

Starting page. If this is empty, try .pageRange instead.

property subject_areas

List of namedtuples containing subject areas of the article in the form (area abbreviation code). Note: Requires the FULL view of the article.

property subtype

Type of the document. Refer to the Scopus Content Coverage Guide for a list of possible values. Short version of subtypedescription.

property subtypedescription

Type of the document. Refer to the Scopus Content Coverage Guide for a list of possible values. Long version of subtype.

property title

Title of the document.

property url

URL to the API view of the document.

property volume

Volume for the document.

property website

Website of publisher.

get_bibtex()[source]

Bibliographic entry in BibTeX format.

Raises

ValueError – If the item’s aggregationType is not Journal.

Return type

str

get_html()[source]

Bibliographic entry in html format.

Return type

str

get_latex()[source]

Bibliographic entry in LaTeX format.

Return type

str

get_ris()[source]

Bibliographic entry in RIS (Research Information System Format) format for journal articles.

Raises

ValueError – If the item’s aggregationType is not Journal.

Return type

str

get_cache_file_age()

Return the age of the cached file in days.

Return type

int

get_cache_file_mdate()

Return the modification date of the cached file.

Return type

str

get_key_remaining_quota()

Return number of remaining requests for the current key and the current API (relative on last actual request).

Return type

Optional[str]

get_key_reset_time()

Return time when current key is reset (relative on last actual request).

Return type

Optional[str]

Examples

You initialize the class with an ID that Scopus uses, e.g. the EID:

>>> from pybliometrics.scopus import AbstractRetrieval
>>> ab = AbstractRetrieval("2-s2.0-85068268027", view='FULL')

You can obtain basic information just by printing the object:

>>> print(ab)
Michael E. Rose and John R. Kitchin: "pybliometrics: Scriptable bibliometrics using
a Python interface to Scopus", SoftwareX, 10, (no pages found)(2019). https://doi.org/10.1016/j.softx.2019.100263.
12 citation(s) as of 2021-04-27
  Affiliation(s):
   Max Planck Institute for Innovation and Competition
   Carnegie Mellon University

There are 52 attributes and 8 methods to interact with. For example, to obtain bibliographic information:

>>> ab.publicationName
'SoftwareX'
>>> ab.aggregationType
'Journal'
>>> ab.coverDate
'2019-07-01'
>>> ab.volume
'10'
>>> ab.issueIdentifier
None
>>> ab.pageRange
None
>>> ab.doi
'10.1016/j.softx.2019.100263'
>>> ab.openaccessFlag
True

Attributes idxterms, subject_areas and authkeywords (if provided) provide an idea on the content of a document:

>>> ab.idxterms
['Bibliometrics', 'Python', 'Python interfaces', 'Reproducibilities',
 'Scientometrics', 'Scopus', 'Scopus database', 'User friendly interface']
>>> ab.subject_areas
[Area(area='Software', abbreviation='COMP', code=1712),
 Area(area='Computer Science Applications', abbreviation='COMP', code=1706)]
>>> ab.authkeywords
['Bibliometrics', 'Python', 'Scientometrics', 'Scopus', 'Software']

To obtain the total citation count (at the time the abstract was retrieved and cached):

>>> ab.citedby_count
7

You get the authors as a list of namedtuples, which pair conveniently with pandas:

>>> ab.authors
[Author(auid=57209617104, indexed_name='Rose M.E.', surname='Rose',
 given_name='Michael E.', affiliation='60105007'),
 Author(auid=7004212771, indexed_name='Kitchin J.R.', surname='Kitchin',
 given_name='John R.', affiliation='60027950')]

>>> import pandas as pd
>>> print(pd.DataFrame(ab.authors))
          auid  indexed_name  surname  given_name affiliation
0  57209617104     Rose M.E.     Rose  Michael E.  60105007
1   7004212771  Kitchin J.R.  Kitchin     John R.  60027950

The same structure applies for the attributes affiliation and authorgroup:

>>> ab.affiliation
[Affiliation(id=60105007, name='Max Planck Institute for Innovation and Competition',
             city='Munich', country='Germany'),
 Affiliation(id=60027950, name='Carnegie Mellon University',
             city='Pittsburgh', country='United States')]

>>> ab.authorgroup
[Author(affiliation_id=60105007, dptid=None,
 organization='Max Planck Institute for Innovation and Competition',
 city=None, postalcode=None, addresspart=None, country='Germany',
 auid=57209617104, indexed_name='Rose M.E.', surname='Rose', given_name='Michael E.'),
 Author(affiliation_id=60027950, dptid=110785688,
 organization='Carnegie Mellon University, Department of Chemical Engineering',
 city=None, postalcode=None, addresspart=None, country='United States',
 auid=7004212771, indexed_name='Kitchin J.R.', surname='Kitchin', given_name='John R.')]

Keep in mind that Scopus might not perfectly/correctly pair authors and affiliations as per the original document, even if it looks so on the web view. In this case please request corrections to be made in Scopus’ API here here.

The references of an article (useful to build citation networks) are only available if you downloaded the article with ‘FULL’ as view parameter.

>>> ab.refcount
25
>>> refs = ab.references
>>> refs[0]
Reference(position='1', id='38949137710', doi='10.1007/978-94-007-7618-0˙310',
title='Comparison of PubMed, Scopus, Web of Science, and Google Scholar:
strengths and weaknesses',
authors='Falagas, M.E.; Pitsouni, E.I.; Malietzis, G.A.; Pappas, G.',
authors_auid=None, authors_affiliationid=None, sourcetitle='FASEB J',
publicationyear='2007', volume=None, issue=None, first=None, last=None,
citedbycount=None, type=None, text=None,fulltext='Falagas, M.E., Pitsouni,
E.I., Malietzis, G.A., Pappas, G., Comparison of PubMed, Scopus, Web of
Science, and Google Scholar: strengths and weaknesses. FASEB J 22:2 (2007),
338–342, 10.1007/978-94-007-7618-0˙310.')

>>> df = pd.DataFrame(refs)
>>> df.columns
Index(['position', 'id', 'doi', 'title', 'authors', 'authors_auid',
       'authors_affiliationid', 'sourcetitle', 'publicationyear', 'volume',
       'issue', 'first', 'last', 'citedbycount', 'type', 'text', 'fulltext'],
      dtype='object')
>>> df['eid'] = '2-s2.0-' + df['id']
>>> df['eid'].tolist()
['2-s2.0-38949137710', '2-s2.0-84956635108', '2-s2.0-84954384742',
 '2-s2.0-85054706190', '2-s2.0-84978682989', '2-s2.0-85047117387',
 '2-s2.0-85068267813', '2-s2.0-84959420483', '2-s2.0-85041892797',
 '2-s2.0-85019268211', '2-s2.0-85059309053', '2-s2.0-85033499871',
 nan, '2-s2.0-85068268189', '2-s2.0-84958069531', '2-s2.0-84964429621',
 '2-s2.0-84977619412', '2-s2.0-85068262994', nan, '2-s2.0-23744500479',
 '2-s2.0-70349549313', nan, '2-s2.0-85042855814', '2-s2.0-85068258349',
 '2-s2.0-84887264733']

Setting view=”REF” accesses the REF view of the article, which provides more information on the referenced items (but less on other attributes of the document):

>>> ab_ref = AbstractRetrieval("2-s2.0-85068268027", view='REF')
>>> ab_ref.references[0]
Reference(position='1', id='38949137710', doi='10.1096/fj.07-9492LSF',
title='Comparison of PubMed, Scopus, Web of Science, and Google Scholar:
Strengths and weaknesses', authors='Falagas, Matthew E.; Pitsouni, Eleni I.;
Malietzis, George A.; Falagas, Matthew E.; Pappas, Georgios; Falagas, Matthew E.',
authors_auid='7003962139; 16240046300; 43761284000; 7003962139; 7102070422; 7003962139',
authors_affiliationid='60033272; 60033272; 60033272; 60015849; 60081865; 60033272',
sourcetitle='FASEB Journal', publicationyear=None, volume='22', issue='2', first='338',
last='342', citedbycount='1232', type='resolvedReference', text=None, fulltext=None)

For conference proceedings, Scopus also collects information on the conference:

>>> cp = AbstractRetrieval("2-s2.0-0029486824", view="FULL")
>>> cp.confname
'Proceedings of the 1995 34th IEEE Conference on Decision and Control. Part 1 (of 4)'
>>> cp.confcode
'44367'
>>> cp.confdate
((1995, 12, 13), (1995, 12, 15))
>>> cp.conflocation
'New Orleans, LA, USA'
>>> cp.confsponsor
'IEEE'

Some articles have information on funding, chemicals and genome banks:

>>> ab_fund = AbstractRetrieval("2-s2.0-85053478849", view="FULL")
>>> ab_fund.funding
[Funding(agency=None, string='CNRT “Nickel et son Environnement',
 id=None, acronym=None, country=None)]
>> ab_fund.funding_text
'The authors gratefully acknowledge CNRT “Nickel et son Environnement” for
providing the financial support. The results reported in this publication
are gathered from the CNRT report “Ecomine BioTop”.'
>>> ab_fund.chemicals
[Chemical(source='esbd', chemical_name='calcium', cas_registry_number='7440-70-2;14092-94-5'),
 Chemical(source='esbd', chemical_name='magnesium', cas_registry_number='7439-95-4'),
 Chemical(source='nlm', chemical_name='Fertilizers', cas_registry_number=None),
 Chemical(source='nlm', chemical_name='Sewage', cas_registry_number=None),
 Chemical(source='nlm', chemical_name='Soil', cas_registry_number=None)]
>>> ab_fund.sequencebank
[Sequencebank(name='GENBANK', sequence_number='MH150839:MH150870', type='submitted')]

You can print the abstract in a variety of formats, including LaTeX, bibtex, HTML, and RIS. For bibtex entries, the key is the first author’s surname, the year, and the first and last name of the title:

>>> print(ab.get_bibtex())
@article{Rose2019Pybliometrics:Scopus,
  author = {Michael E. Rose and John R. Kitchin},
  title = {{pybliometrics: Scriptable bibliometrics using a Python interface to Scopus}},
  journal = {SoftwareX},
  year = {2019},
  volume = {10},
  number = {None},
  pages = {-},
  doi = {10.1016/j.softx.2019.100263}}
>>> print(ab.get_ris())
TY  - JOUR
TI  - pybliometrics: Scriptable bibliometrics using a Python interface to Scopus
JO  - SoftwareX
VL  - 10
DA  - 2019-07-01
PY  - 2019
SP  - None
AU  - Rose M.E.
AU  - Kitchin J.R.
DO  - 10.1016/j.softx.2019.100263
UR  - https://doi.org/10.1016/j.softx.2019.100263
ER  -

Downloaded results are cached to speed up subsequent analysis. This information may become outdated. To refresh the cached results if they exist, set refresh=True, or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set refresh=100. Use ab.get_cache_file_mdate() to get the date of last modification, and ab.get_cache_file_age() the number of days since the last modification.