pybliometrics.scopus.SerialSearch

SerialSearch() implements the search of the Serial Title API. It performs a search for serial sources (journal, tradejournal, conferenceproceeding, bookseries) by title, ISSN, publisher, subject or source type.

Table of Contents

Documentation

class pybliometrics.scopus.SerialSearch(query, refresh=False, view='ENHANCED')[source]

Interaction with the Serial Title API.

Parameters
  • query (Dict) – Query parameters and corresponding fields. Allowed keys ‘title’, ‘issn’, ‘pub’, ‘subj’, ‘subjCode’, ‘content’, ‘oa’. For examples on possible values, please refer to https://dev.elsevier.com/documentation/SerialTitleAPI.wadl#d1e22.

  • refresh (Union[bool, int], optional) – Whether to refresh the cached file if it exists or not. If int is passed, cached file will be refreshed if the number of days since last modification exceeds that value.

    Default: False

  • view (str, optional) – The view of the file that should be downloaded. Allowed values: STANDARD, ENHANCED, CITESCORE. For details see https://dev.elsevier.com/sc_serial_title_views.html.

    Default: 'ENHANCED'

Raises
  • Scopus400Error – If provided value for a query key is invalid or if for non-subscribers the number of search results exceeds 5000.

  • ValueError – If any of the parameters refresh or view is not one of the allowed values.

Return type

None

Notes

The directory for cached results is {path}/{view}/{fname}, where path is specified in your configuration file, and fname is the md5-hashed version of query dict turned into string in format of ‘key=value’ delimited by ‘&’.

property results

A list of OrderedDicts representing results of serial search. The number of keys may vary from one search result to another depending on the length of yearly data.

get_cache_file_age()

Return the age of the cached file in days.

Return type

int

get_cache_file_mdate()

Return the modification date of the cached file.

Return type

str

get_key_remaining_quota()

Return number of remaining requests for the current key and the current API (relative on last actual request).

Return type

Optional[str]

get_key_reset_time()

Return time when current key is reset (relative on last actual request).

Return type

Optional[str]

get_results_size()

Return the number of results (works even if download=False).

Return type

int

Examples

The class is initialized with a search query dictionary. Its keys are limited to the following set: “title”, “issn”, “pub”, “subj”, “subjCode”, “content”, and “oa”. No more than 200 results can be returned.

>>> from pybliometrics.scopus import SerialSearch
>>> s = SerialSearch(query={"title": "SoftwareX"})

You can obtain basic information just by printing the object:

>>> print(s)
Search '{'title': 'SoftwareX'}' yielded 1 source as of 2021-07-14:
    SoftwareX

Users can receive the number of results programmatically via .get_results_size():

>>> s.get_results_size()
1

The class’ main attribute results returns a list of OrderedDict. Provided information can differ greatly between results and depending on the view (see below) they can be numerous. (Lists of )OrderedDict can be used neatly with pandas to form DataFrames:

>>> import pandas as pd
>>> df = pd.DataFrame(pd.DataFrame(s.results))
>>> df.shape
(1, 147)
>>> df.columns
Index(['title', 'publisher', 'coverageStartYear', 'coverageEndYear',
       'aggregationType', 'source-id', 'eIssn', 'openaccess',
       'openaccessArticle', 'subject_area_codes',
       ...
       'publicationCount_2019', 'citeCountSCE_2019', 'zeroCitesSCE_2019',
       'zeroCitesPercentSCE_2019', 'revPercent_2019', 'publicationCount_2020',
       'citeCountSCE_2020', 'zeroCitesSCE_2020', 'zeroCitesPercentSCE_2020',
       'revPercent_2020'],
      dtype='object', length=142)
>>> pd.set_option('display.max_columns', None)
>>> df.iloc[:,:16]
   title    publisher coverageStartYear coverageEndYear aggregationType  \
0  SoftwareX  Elsevier BV              2015            2020         journal

     source-id      eIssn openaccess  openaccessArticle subject_area_codes  \
0  21100422153  2352-7110          1               True          1712;1706

  subject_area_abbrevs                      subject_area_names SNIP_2018  \
0                 COMP  Software;Computer Science Applications     4.905

  SJR_2018 citeScoreTracker_2019 citeScoreCurrentMetric_2018
0    4.539                  2.18                       11.56

Information beyond the first 16 columns refer to journal metrics: publication counts, citation counts, not-cited documents, share of not-cited documents, and the share of review article documents, for each year since indexation.

Downloaded results are cached to speed up subsequent analysis. This information may become outdated. To refresh the cached results if they exist, set refresh=True, or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set refresh=100. Use s.get_cache_file_mdate() to get the date of last modification, and s.get_cache_file_age() the number of days since the last modification.

The Serial Title API allows a differing information depth via views. While all views are restricted, view ‘ENHANCED’ is the highest among them. In addition to the information contained in ‘STANDARD’ it contains yearly journal metrics. If you are not interested in this information, or when speed is an issue, choose the ‘STANDARD’ view.