pybliometrics.scopus.CitationOverview

CitationOverview() implements the Citation Overview API. Your API Key needs to be approved by Elsevier manually. Please contact Scopus to do so. Otherwise each request throws a 403 error.

Table of Contents

Documentation

class pybliometrics.scopus.CitationOverview(identifier, start, end=2021, id_type='scopus_id', eid=None, refresh=False, citation=None, **kwds)[source]

Interaction witht the Citation Overview API.

Parameters
  • identifier (List[Union[str, int]]) – Up to 25 identifiers for which to look up citations. Must be Scopus IDs, DOIs, PIIs or Pubmed IDs.

  • start (Union[int, str]) – The first year for which the citation count should be loaded.

  • end (Union[int, str], optional) – The last year for which the citation count should be loaded. Defaults to the current year.

    Default: 2021

  • id_type (str, optional) – The type of the IDs provided in identifier. Must be one of “scopus_id”, “doi”, “pii”, “pubmed_id”.

    Default: 'scopus_id'

  • eid (Optional[str], optional) – (deprecated) The Scopus ID of the abstract - will be removed in a future release: Instead use param scopus_id after stripping the part until the second hyphen. If you use this parameter, it will be converted to scopus_id instead.

    Default: None

  • refresh (Union[bool, int], optional) – Whether to refresh the cached file if it exists or not. If int is passed, cached file will be refreshed if the number of days since last modification exceeds that value.

    Default: False

  • citation (Optional[str], optional) – Allows for the exclusion of self-citations or those by books. If None, will count all citations. Allowed values: None, exclude-self, exclude-books

    Default: None

  • kwds (str) – Keywords passed on as query parameters. Must contain fields and values mentioned in the API specification at https://dev.elsevier.com/documentation/AbstractCitationAPI.wadl.

Raises
  • ValueError – If parameter identifier contains fewer than 1 or more than 25 elements.

  • ValueError – If any of the parameters citation, id_type or refresh is not one of the allowed values.

Return type

None

Notes

The directory for cached results is {path}/STANDARD/{id}-{citation}, where path is specified in your configuration file, and id the md5-hashed version of a string joining identifier on underscore.

Your API Key needs to be augmented by Elsevier’s Scopus Integration Team to access this API.

property authors

A list of lists of namedtuples storing author information, where each namedtuple corresponds to one author and each sub-list to one document. The information in each namedtuple is (name surname initials id url). All entries are strings.

property cc

List of lists of tuples of yearly number of citations for specified years, where each sub-list corresponds to one document.

property citationType_long

Type (long version) of the documents (e.g. article, review).

property citationType_short

Type (short version) of the documents (e.g. ar, re).

property columnTotal

The yearly number of citations for all documents combined.

property doi

Document Object Identifier (DOI) of the documents.

property endingPage

Ending pages of the documents.

property grandTotal

The total number of citations of all documents together.

property h_index

Combined h-index of citations of all the documents.

property issn

ISSN of the publishers of the documents. Note: If E-ISSN is known to Scopus, this returns both ISSN and E-ISSN in random order separated by blank space.

property issueIdentifier

Issue numbers of the documents.

property laterColumnTotal

The total number of citations for all years after the end year for all documents combined.

property lcc

Number of citations after the end year of each document.

property pcc

Number of citations before the start year.

property pii

The Publication Item Identifier (PII) of the documents.

property prevColumnTotal

The total number of citations for all years before the start year for all documents combined.

property publicationName

Name of source the documents are published in (e.g. the Journal).

property rangeColumnTotal

The total number of citations for all specified years for all documents combined.

property rangeCount

Number of citations for the specified years for each document.

property rowTotal

Total number of citations (specified and omitted years) for each document.

property scopus_id

The Scopus ID(s) of the documents. Might differ from the ones provided.

property startingPage

Starting page.

property title

Titles of each document.

property url

URL(s) to Citation Overview API view of each document.

property volume

Volume for the abstract.

get_cache_file_age()

Return the age of the cached file in days.

Return type

int

get_cache_file_mdate()

Return the modification date of the cached file.

Return type

str

get_key_remaining_quota()

Return number of remaining requests for the current key and the current API (relative on last actual request).

Return type

Optional[str]

get_key_reset_time()

Return time when current key is reset (relative on last actual request).

Return type

Optional[str]

Examples

The class can download yearly citation counts for up to 25 documents at once. Simply provide a list of either the Scopus identifiers, the DOIs, the PIIs or the pubmed IDs and specify the identifier type in id_type. The API needs to know for which years you want to retrieve yearly citation counts. Therefore you need to set the year from which on CitationOverview() will return yearly citation counts (e.g., the publication year). If no ending year is given, CitationOverview() will use the current year. Optionally you can exclude citations by books or self-citation via exclude.

You initialize the class with a list of identifiers:

>>> from pybliometrics.scopus import CitationOverview
>>> identifier = ["85068268027", "84930616647"]
>>> co = CitationOverview(identifier, start=2019, end=2021)

You can obtain basic information just by printing the object:

>>> print(co)
2 document(s) has/have the following total citation count
as of 2021-07-17:
    16; 13

The most important information is stored in attribute cc, which is a list of of list of tuples storing year-wise citations to the article. Each list corresponds to one document, in the order specified when initating the class:

>>> co.cc
[[(2019, 0), (2020, 6), (2021, 10)],
 [(2019, 2), (2020, 2), (2021, 1)]]

Attributes pcc, rangeCount, lcc and rowTotal give citation summaries by document. pcc is the count of citations before the specified year, rangeCount the count of citations for the specified years, and lcc the count of citations after the specified year. For the sum (i.e., the total number of citations by document) use rowTotal

>>> co.pcc
[0, 8]
>>> co.rangeCount
[16, 5]
>>> co.lcc
[0, 0]
>>> co.rowTotal
[16, 13]

Attribute columnTotal gives the total number of yearly citations for all documents combined, which rangeColumnTotal summarizes. Finally grandTotal is the total number of citations for all documents combined.

>>> co.columnTotal
[2, 8, 11]
>>> co.rangeColumnTotal
21
>>> co.grandTotal
29

Using parameter citation, one can exclude self-citations or citations by books:

>>> co_self = CitationOverview(identifier, start=2019, end=2021,
                               citation="exclude-self")
>>> print(co_self)
2 document(s) has/have the following total citation count
excluding self-citations as of 2021-07-17:
    14; 11
>>> co_books = CitationOverview(identifier, start=2019, end=2021,
                                citation="exclude-books")
>>> print(co_books)
2 document(s) has/have the following total citation count
excluding citations from books as of 2021-07-17:
    16; 13

There are also author information stored as list of lists of namedtuples:

>>> co.authors[0]
[Author(name='Rose M.E.', surname='Rose', initials='M.E.', id='57209617104',
        url='https://api.elsevier.com/content/author/author_id/57209617104'),
 Author(name='Kitchin J.R.', surname='Kitchin', initials='J.R.', id='7004212771',
        url='https://api.elsevier.com/content/author/author_id/7004212771')]
>>> co.authors[1]
[Author(name='Kitchin J.R.', surname='Kitchin', initials='J.R.', id='7004212771',
        url='https://api.elsevier.com/content/author/author_id/7004212771')]

Via co.authors[0][0].id one can for instance obtain further author information via the AuthorRetrieval() class.

Finally, there are bibliographic information, too:

>>> co.title
['pybliometrics: Scriptable bibliometrics using a Python interface to Scopus',
 'Examples of effective data sharing in scientific publishing']
>>> co.publicationName
['SoftwareX', 'ACS Catalysis']
>>> co.volume
['10', '5']
>>> co.issueIdentifier
[None, '6']
>>> co.citationType_long
['Article', 'Review']

Using pandas, you can turn the citation counts into a DataFrame like so:

>>> import pandas as pd
>>> df = pd.concat([pd.Series(dict(x)) for x in co.cc], axis=1).T
>>> df.index = co.scopus_id
>>> print(df)
             2019  2020  2021
85068268027     0     6    10
84930616647     2     2     1

Downloaded results are cached to speed up subsequent analysis. This information may become outdated, and will not change if you set certain restrictions (e.g. via the citation parameter)! To refresh the cached results if they exist, set refresh=True, or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set refresh=100. Use co.get_cache_file_mdate() to get the date of last modification, and co.get_cache_file_age() the number of days since the last modification.