pybliometrics.scopus.AbstractRetrieval

AbstractRetrieval() implements the Scopus Abstract Retrieval API.

It accepts any identifier as the main argument. Most commonly, this will be a Scopus EID, but DOI, Scopus ID (the last part of the EID), PubMed identifier or Publisher Item Identifier (PII) work as well. AbstractRetrieval tries to infer the class itself - to speed this up you can tell the ID type via ID_type.

The Abstract Retrieval API allows a differing information depth via views, some of which are restricted. The ‘META_ABS’ view is the most comprehensive among unrestricted views, encompassing all information from other unrestricted views. It is therefore the default view. The view with the most information content is ‘FULL’, which includes all information available with ‘META_ABS’, but is restricted. Generally, you should always try to use view=’FULL’ when downloading an abstract and fall back to the default otherwise.

Documentation

class pybliometrics.scopus.AbstractRetrieval(identifier=None, refresh=False, view='META_ABS', id_type=None, **kwds)[source]

Interaction with the Abstract Retrieval API.

Parameters:
  • identifier (Union[int, str], optional) – The identifier of a document. Can be the Scopus EID , the Scopus ID, the PII, the Pubmed-ID or the DOI.

    Default: None

  • refresh (Union[bool, int], optional) – Whether to refresh the cached file if it exists or not. If int is passed, cached file will be refreshed if the number of days since last modification exceeds that value.

    Default: False

  • id_type (str, optional) – The type of used ID. Allowed values: None, ‘eid’, ‘pii’, ‘scopus_id’, ‘pubmed_id’, ‘doi’. If the value is None, the function tries to infer the ID type itself.

    Default: None

  • view (str, optional) – The view of the file that should be downloaded. Allowed values: META, META_ABS, REF, FULL, where FULL includes all information of META_ABS view and META_ABS includes all information of the META view. For details see https://dev.elsevier.com/sc_abstract_retrieval_views.html.

    Default: 'META_ABS'

  • kwds (str) – Keywords passed on as query parameters. Must contain fields and values listed in the API specification at https://dev.elsevier.com/documentation/AbstractRetrievalAPI.wadl.

Raises:

ValueError – If any of the parameters id_type, refresh or view is not one of the allowed values.

Notes

The directory for cached results is {path}/{view}/{identifier}, where path is specified in your configuration file. In case identifier is a DOI, an underscore replaces the forward slash.

property abstract: str | None

The abstract of a document. Note: If this is empty, try description property instead.

property affiliation: List[NamedTuple] | None

A list of namedtuples representing listed affiliations in the form (id, name, city, country).

property aggregationType: str

Aggregation type of source the document is published in.

property authkeywords: List[str] | None

List of author-provided keywords of the document.

property authorgroup: List[NamedTuple] | None

A list of namedtuples representing the article’s authors organized by affiliation, in the form (affiliation_id, dptid, organization, city, postalcode, addresspart, country, collaboration, auid, orcid, indexed_name, surname, given_name). If given_name is not present, fall back to initials. Note: Affiliation information might be missing or mal-assigned even when it looks correct in the web view. In this case please request a correction. It is generally missing for collaborations.

property authors: List[NamedTuple] | None

A list of namedtuples representing the article’s authors, in the form (auid, indexed_name, surname, given_name, affiliation). In case multiple affiliation IDs are given, they are joined on “;”. Note: The affiliation referred to here is what Scopus’ algorithm determined as the main affiliation. Property authorgroup provides all affiliations.

property citedby_count: int | None

Number of articles citing the document.

URL to Scopus page listing citing documents.

property chemicals: List[NamedTuple] | None

List of namedtuples representing chemical entities in the form (source, chemical_name, cas_registry_number). In case multiple numbers given, they are joined on “;”.

property confcode: int | None

Code of the conference the document belongs to.

property confdate: Tuple[Tuple[int, int], Tuple[int, int]] | None

Date range of the conference the document belongs to represented by two tuples in the form (YYYY, MM, DD).

property conflocation: str | None

Location of the conference the document belongs to.

property confname: str | None

Name of the conference the document belongs to.

property confsponsor: List[str] | str | None

Sponsor(s) of the conference the document belongs to.

property contributor_group: List[NamedTuple] | None

List of namedtuples representing contributors compiled by Scopus, in the form (given_name, initials, surname, indexed_name, role).

property copyright: str

The copyright statement of the document.

property copyright_type: str

The copyright holder of the document.

property correspondence: List[NamedTuple] | None

List of namedtuples representing the authors to whom correspondence should be addressed, in the form ´(surname, initials, organization, country, city_group)´. Multiple organziations are joined on semicolon.

property coverDate: str

The date of the cover the document is in.

property date_created: Tuple[int, int, int] | None

Return the date_created of a record.

property description: str | None

Return the description of a record. Note: If this is empty, try abstract property instead.

property doi: str | None

DOI of the document.

property eid: str

EID of the document.

property endingPage: str | None

Ending page. If this is empty, try pageRange property instead.

property funding: List[NamedTuple] | None

List of namedtuples parsed funding information in the form (agency, agency_id, string, funding_id, acronym, country).

property funding_text: str | None

The raw text from which Scopus derives funding information.

property isbn: Tuple[str, ...] | None

ISBNs Optional[str] to publicationName as tuple of variying length, (e.g. ISBN-10 or ISBN-13).

property issn: NamedTuple | None

Namedtuple in the form (print electronic). Note: If the source has an E-ISSN, the META view will return None. Use FULL view instead.

property identifier: int

ID of the document (same as EID without “2-s2.0-“).

property idxterms: List[str] | None

List of index terms (these are just one category of those Scopus provides in the web version) .

property issueIdentifier: str | None

Number of the issue the document was published in.

property issuetitle: str | None

Title of the issue the document was published in.

property language: str | None

Language of the article.

property openaccess: int | None

The openaccess status encoded in single digits.

property openaccessFlag: bool | None

Whether the document is available via open access or not.

property pageRange: str | None

Page range. If this is empty, try startingPage and endingPage properties instead.

property pii: str | None

The PII (Publisher Item Identifier) of the document.

property publicationName: str | None

Name of source the document is published in.

property publisher: str | None

Name of the publisher of the document. Note: Information provided in the FULL view of the article might be more complete.

property publisheraddress: str | None

Name of the publisher of the document.

property pubmed_id: int | None

The PubMed ID of the document.

property refcount: int | None

Number of references of an article. Note: Requires either the FULL view or REF view.

property references: List[NamedTuple] | None

List of namedtuples representing references listed in the document, in the form (position, id, doi, title, authors, authors_auid, authors_affiliationid, sourcetitle, publicationyear, coverDate, volume, issue, first, last, citedbycount, type, text, fulltext).

position is the number at which the reference appears in the document, id is the Scopus ID of the referenced document (EID without the “2-s2.0-“), authors is a string of the names of the authors in the format “Surname1, Initials1; Surname2, Initials2”, authors_auid is a string of the author IDs joined on “; “, authors_affiliationid is a string of the authors’ affiliation IDs joined on “; “, sourcetitle is the name of the source (e.g. the journal), publicationyear is the year of the publication as string (FULL view only), coverDate is the date of the publication as string (REF view only), volume and issue, are strings referring to the volume and issue, first and last refer to the page range, citedbycount the total number of citations of the cited item (REF view only), type describes the parsing status of the reference (resolved or not), text is information on the publication, fulltext is the text the authors used for the reference.

Note: Requires either the FULL view or REF view. Might be empty even if refcount is positive. Specific fields can be empty. The lists authors and authors_auid may contain duplicates because of the 1:1 pairing with the list authors_affiliationid.

URL to the document page on Scopus.

URL to Scopus API page of this document.

property sequencebank: List[NamedTuple] | None

List of namedtuples representing biological entities defined or mentioned in the text, in the form (name, sequence_number, type).

property source_id: int | None

Scopus source ID of the document.

property sourcetitle_abbreviation: str | None

Abbreviation of the source the document is published in. Note: Requires the FULL view of the article.

property srctype: str | None

Aggregation type of source the document is published in (short version of aggregationType).

property startingPage: str | None

Starting page. If this is empty, try pageRange property instead.

property subject_areas: List[NamedTuple] | None

List of namedtuples containing subject areas of the article in the form (area abbreviation code). Note: Requires the FULL view of the article.

property subtype: str

Type of the document. Refer to the Scopus Content Coverage Guide for a list of possible values. Short version of subtypedescription.

property subtypedescription: str

Type of the document. Refer to the Scopus Content Coverage Guide for a list of possible values. Long version of subtype.

property title: str | None

Title of the document.

property url: str | None

URL to the API view of the document.

property volume: str | None

Volume for the document.

property website: str

Website of publisher.

get_bibtex()[source]

Bibliographic entry in BibTeX format.

Raises:

ValueError – If the item’s aggregationType is not Journal.

Return type:

str

get_html()[source]

Bibliographic entry in html format.

Return type:

str

get_latex()[source]

Bibliographic entry in LaTeX format.

Return type:

str

get_ris()[source]

Bibliographic entry in RIS (Research Information System Format) format for journal articles.

Raises:

ValueError – If the item’s aggregationType is not Journal.

Return type:

str

get_cache_file_age()

Return the age of the cached file in days.

Return type:

int

get_cache_file_mdate()

Return the modification date of the cached file.

Return type:

str

get_key_remaining_quota()

Return number of remaining requests for the current key and the current API (relative on last actual request).

Return type:

str | None

get_key_reset_time()

Return time when current key is reset (relative on last actual request).

Return type:

str | None

Examples

You initialize the class with an ID that Scopus uses, e.g. the EID:

>>> from pybliometrics.scopus import AbstractRetrieval
>>> ab = AbstractRetrieval("2-s2.0-85068268027", view='FULL')

You can obtain basic information just by printing the object:

>>> print(ab)
Michael E. Rose and John R. Kitchin: "pybliometrics: Scriptable bibliometrics using a Python interface to Scopus", SoftwareX, 10, (no pages found)(2019). https://doi.org/10.1016/j.softx.2019.100263.
34 citation(s) as of 2022-04-07
  Affiliation(s):
   Max Planck Institute for Innovation and Competition
   Carnegie Mellon University

There are 52 attributes and 8 methods to interact with. For example, to obtain bibliographic information:

>>> ab.publicationName
'SoftwareX'
>>> ab.aggregationType
'Journal'
>>> ab.coverDate
'2019-07-01'
>>> ab.volume
'10'
>>> ab.issueIdentifier
None
>>> ab.pageRange
None
>>> ab.doi
'10.1016/j.softx.2019.100263'
>>> ab.openaccessFlag
True

The attributes idxterms, subject_areas and authkeywords (if provided) offer insights into the document’s content:

>>> ab.idxterms
['Bibliometrics', 'Python', 'Python interfaces', 'Reproducibilities',
 'Scientometrics', 'Scopus', 'Scopus database', 'User friendly interface']
>>> ab.subject_areas
[Area(area='Software', abbreviation='COMP', code=1712),
 Area(area='Computer Science Applications', abbreviation='COMP', code=1706)]
>>> ab.authkeywords
['Bibliometrics', 'Python', 'Scientometrics', 'Scopus', 'Software']

To obtain the total citation count (at the time the abstract was retrieved and cached):

>>> ab.citedby_count
34

You can retrieve the authors as a list of namedtuples, which pair conveniently with pandas:

>>> ab.authors
[Author(auid=57209617104, indexed_name='Rose M.E.', surname='Rose',
        given_name='Michael E.', affiliation='60105007'),
 Author(auid=7004212771, indexed_name='Kitchin J.R.', surname='Kitchin',
        given_name='John R.', affiliation='60027950')]

>>> import pandas as pd
>>> print(pd.DataFrame(ab.authors))
          auid  indexed_name  surname  given_name affiliation
0  57209617104     Rose M.E.     Rose  Michael E.  60105007
1   7004212771  Kitchin J.R.  Kitchin     John R.  60027950

The same structure applies for the attributes affiliation and authorgroup:

>>> ab.affiliation
[Affiliation(id=60105007, name='Max Planck Institute for Innovation and Competition',
             city='Munich', country='Germany'),
 Affiliation(id=60027950, name='Carnegie Mellon University',
             city='Pittsburgh', country='United States')]

>>> ab.authorgroup
[Author(affiliation_id=60105007, dptid=None,
        organization='Max Planck Institute for Innovation and Competition',
        city=None, postalcode=None, addresspart=None, country='Germany',
        collaboration=None, auid=57209617104, orcid=None,
        indexed_name='Rose M.E.', surname='Rose', given_name='Michael E.'),
 Author(affiliation_id=60027950, dptid=110785688,
        organization='Carnegie Mellon University, Department of Chemical Engineering',
        city=None, postalcode=None, addresspart=None, country='United States',
        collaboration=None, auid=7004212771, orcid=None,
        indexed_name='Kitchin J.R.', surname='Kitchin', given_name='John R.')]

Note that Scopus may not always accurately pair authors with their affiliations as per the original document, even if it looks so on the web view. In this case please request corrections to be made in Scopus’ API here here.

The references of an article (useful to build citation networks) are only available if you downloaded the article with ‘FULL’ as view parameter.

>>> ab.refcount
25
>>> refs = ab.references
>>> refs[0]
Reference(position='1', id='38949137710', doi='10.1007/978-94-007-7618-0˙310',
          title='Comparison of PubMed, Scopus, Web of Science, and Google Scholar:
                 strengths and weaknesses',
          authors='Falagas, M.E.; Pitsouni, E.I.; Malietzis, G.A.; Pappas, G.',
          authors_auid=None, authors_affiliationid=None, sourcetitle='FASEB J',
          publicationyear='2007', coverDate=None, volume=None, issue=None,
          first=None, last=None, citedbycount=None, type=None, text=None,
          fulltext='Falagas, M.E., Pitsouni, E.I., Malietzis, G.A., Pappas, G.,
                    Comparison of PubMed, Scopus, Web of Science, and Google
                    Scholar: strengths and weaknesses. FASEB J 22:2 (2007),
                    338–342, 10.1007/978-94-007-7618-0˙310.')

>>> df = pd.DataFrame(refs)
>>> df.columns
Index(['position', 'id', 'doi', 'title', 'authors', 'authors_auid',
       'authors_affiliationid', 'sourcetitle', 'publicationyear', 'coverDate',
       'volume', 'issue', 'first', 'last', 'citedbycount', 'type', 'fulltext'],
      dtype='object')
>>> df['eid'] = '2-s2.0-' + df['id']
>>> df['eid'].tolist()
['2-s2.0-38949137710', '2-s2.0-84956635108', '2-s2.0-84954384742',
 '2-s2.0-85054706190', '2-s2.0-84978682989', '2-s2.0-85047117387',
 '2-s2.0-85068267813', '2-s2.0-84959420483', '2-s2.0-85041892797',
 '2-s2.0-85019268211', '2-s2.0-85059309053', '2-s2.0-85033499871',
 nan, '2-s2.0-85068268189', '2-s2.0-84958069531', '2-s2.0-84964429621',
 '2-s2.0-84977619412', '2-s2.0-85068262994', nan, '2-s2.0-23744500479',
 '2-s2.0-70349549313', nan, '2-s2.0-85042855814', '2-s2.0-85068258349',
 '2-s2.0-84887264733']

Using view=”REF” accesses the REF view of the article, which provides more information on the referenced items (but less on other attributes of the document):

>>> ab_ref = AbstractRetrieval("2-s2.0-85068268027", view='REF')
>>> ab_ref.references[0]
Reference(position='1', id='38949137710', doi='10.1096/fj.07-9492LSF',
          title='Comparison of PubMed, Scopus, Web of Science, and Google Scholar:
                  Strengths and weaknesses',
           authors='Falagas, Matthew E.; Pitsouni, Eleni I.; Malietzis, George A.;
                    Falagas, Matthew E.; Pappas, Georgios',
           authors_auid='7003962139; 16240046300; 43761284000; 7003962139; 7102070422',
           authors_affiliationid='60033272; 60033272; 60033272; 60015849; 60081865',
           sourcetitle='FASEB Journal', publicationyear=None, coverDate='2008-02-01',
           volume='22', issue='2', first='338', last='342', citedbycount='1676',
           type='resolvedReference', text=None, fulltext=None)

The list of authors contains duplicate because of the 1:1 pairing with the authors’ affiliation IDs. In above example, 7003962139 is affiliated with 60033272 and with 60015849. Authors are therefore grouped by affiliation ID.

Scopus also gathers detailed information about conferences for conference proceedings, including:

>>> cp = AbstractRetrieval("2-s2.0-0029486824", view="FULL")
>>> cp.confname
'Proceedings of the 1995 34th IEEE Conference on Decision and Control. Part 1 (of 4)'
>>> cp.confcode
'44367'
>>> cp.confdate
((1995, 12, 13), (1995, 12, 15))
>>> cp.conflocation
'New Orleans, LA, USA'
>>> cp.confsponsor
'IEEE'

Some articles have information on funding, chemicals and genome banks:

>>> ab_fund = AbstractRetrieval("2-s2.0-85053478849", view="FULL")
>>> ab_fund.funding
[Funding(agency=None, string='CNRT “Nickel et son Environnement',
 agency_id=None, funding_id=None, acronym=None, country=None)]
>> ab_fund.funding_text
'The authors gratefully acknowledge CNRT “Nickel et son Environnement” for
providing the financial support. The results reported in this publication
are gathered from the CNRT report “Ecomine BioTop”.'
>>> ab_fund.chemicals
[Chemical(source='esbd', chemical_name='calcium',
          cas_registry_number='7440-70-2;14092-94-5'),
 Chemical(source='esbd', chemical_name='magnesium',
          cas_registry_number='7439-95-4'),
 Chemical(source='nlm', chemical_name='Fertilizers', cas_registry_number=None),
 Chemical(source='nlm', chemical_name='Sewage', cas_registry_number=None),
 Chemical(source='nlm', chemical_name='Soil', cas_registry_number=None)]
>>> ab_fund.sequencebank
[Sequencebank(name='GENBANK', sequence_number='MH150839:MH150870', type='submitted')]

You can print the abstract in a variety of formats, including LaTeX, bibtex, HTML, and RIS. For bibtex entries, the key is the first author’s surname, the year, and the first and last name of the title:

>>> print(ab.get_bibtex())
@article{Rose2019Pybliometrics:Scopus,
  author = {Michael E. Rose and John R. Kitchin},
  title = {{pybliometrics: Scriptable bibliometrics using a Python interface to Scopus}},
  journal = {SoftwareX},
  year = {2019},
  volume = {10},
  number = {None},
  pages = {-},
  doi = {10.1016/j.softx.2019.100263}}
>>> print(ab.get_ris())
TY  - JOUR
TI  - pybliometrics: Scriptable bibliometrics using a Python interface to Scopus
JO  - SoftwareX
VL  - 10
DA  - 2019-07-01
PY  - 2019
SP  - None
AU  - Rose M.E.
AU  - Kitchin J.R.
DO  - 10.1016/j.softx.2019.100263
UR  - https://doi.org/10.1016/j.softx.2019.100263
ER  -

Downloaded results are cached to expedite subsequent analyses. This information may become outdated. To refresh the cached results if they exist, set refresh=True, or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set refresh=100. Use ab.get_cache_file_mdate() to obtain the date of last modification, and ab.get_cache_file_age() to determine the number of days since the last modification.