pybliometrics.scopus.SerialSearch

SerialSearch() implements the search of the Serial Title API. This class performs searches for serial sources (journals, trade journals, conference proceedings, book series) based on title, ISSN, publisher, subject, or source type.

Documentation

class pybliometrics.scopus.SerialSearch(query, refresh=False, view='ENHANCED', **kwds)[source]

Interaction with the Serial Title API.

Parameters:
  • query (Dict) – Query parameters and corresponding fields. Allowed keys ‘title’, ‘issn’, ‘pub’, ‘subj’, ‘subjCode’, ‘content’, ‘oa’. For examples on possible values, please refer to https://dev.elsevier.com/documentation/SerialTitleAPI.wadl#d1e22.

  • refresh (Union[bool, int], optional) – Whether to refresh the cached file if it exists or not. If int is passed, cached file will be refreshed if the number of days since last modification exceeds that value.

    Default: False

  • view (str, optional) – The view of the file that should be downloaded. Allowed values: STANDARD, ENHANCED, CITESCORE. For details see https://dev.elsevier.com/sc_serial_title_views.html.

    Default: 'ENHANCED'

  • kwds (str) – Keywords passed on as query parameters. Must contain fields and values listed in the API specification at https://dev.elsevier.com/documentation/SerialTitleAPI.wadl.

Raises:
  • Scopus400Error – If provided value for a query key is invalid or if for non-subscribers the number of search results exceeds 5000.

  • ValueError – If any of the parameters refresh or view is not one of the allowed values.

Notes

The directory for cached results is {path}/{view}/{fname}, where path is specified in your configuration file, and fname is the md5-hashed version of query dict turned into string in format of ‘key=value’ delimited by ‘&’.

property results: List[Dict[str, str]] | None

A list of OrderedDicts representing results of serial search. The number of keys may vary from one search result to another depending on the length of yearly data.

get_cache_file_age()

Return the age of the cached file in days.

Return type:

int

get_cache_file_mdate()

Return the modification date of the cached file.

Return type:

str

get_key_remaining_quota()

Return number of remaining requests for the current key and the current API (relative on last actual request).

Return type:

str | None

get_key_reset_time()

Return time when current key is reset (relative on last actual request).

Return type:

str | None

get_results_size()

Return the number of results (works even if download=False).

Return type:

int

Examples

The class is initialized with a search query dictionary. Its keys are limited to the following set: “title”, “issn”, “pub”, “subj”, “subjCode”, “content”, and “oa”. No more than 200 results can be returned.

>>> from pybliometrics.scopus import SerialSearch
>>> s = SerialSearch(query={"title": "SoftwareX"})

You can obtain basic information just by printing the object:

>>> print(s)
Search '{'title': 'SoftwareX'}' yielded 1 source as of 2021-07-14:
    SoftwareX

Users can determine the number of results programmatically using the .get_results_size() method:

>>> s.get_results_size()
1

The main attribute of the class, results, returns a list of OrderedDict objects. Provided information can differ greatly between results and depending on the view (see below) they can be numerous. Lists of OrderedDict objects can be efficiently converted into DataFrames using pandas:

>>> import pandas as pd
>>> df = pd.DataFrame(pd.DataFrame(s.results))
>>> df.shape
(1, 147)
>>> df.columns
Index(['title', 'publisher', 'coverageStartYear', 'coverageEndYear',
       'aggregationType', 'source-id', 'eIssn', 'openaccess',
       'openaccessArticle', 'subject_area_codes',
       ...
       'publicationCount_2019', 'citeCountSCE_2019', 'zeroCitesSCE_2019',
       'zeroCitesPercentSCE_2019', 'revPercent_2019', 'publicationCount_2020',
       'citeCountSCE_2020', 'zeroCitesSCE_2020', 'zeroCitesPercentSCE_2020',
       'revPercent_2020'],
      dtype='object', length=142)
>>> pd.set_option('display.max_columns', None)
>>> df.iloc[:,:16]
   title    publisher coverageStartYear coverageEndYear aggregationType  \
0  SoftwareX  Elsevier BV              2015            2020         journal

     source-id      eIssn openaccess  openaccessArticle subject_area_codes  \
0  21100422153  2352-7110          1               True          1712;1706

  subject_area_abbrevs                      subject_area_names SNIP_2018  \
0                 COMP  Software;Computer Science Applications     4.905

  SJR_2018 citeScoreTracker_2019 citeScoreCurrentMetric_2018
0    4.539                  2.18                       11.56

The information in columns beyond the first 16 pertains to journal metrics: publication counts, citation counts, not-cited documents, share of not-cited documents, and the share of review article documents, for each year since indexation.

Downloaded results are cached to expedite subsequent analyses. This information may become outdated. To refresh the cached results if they exist, set refresh=True, or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set refresh=100. Use ab.get_cache_file_mdate() to obtain the date of last modification, and ab.get_cache_file_age() to determine the number of days since the last modification.

The Serial Title API offers varying depths of information through views. While all views are restricted, view ‘ENHANCED’ is the highest among them. In addition to the information contained in ‘STANDARD’ it contains yearly journal metrics. If you are not interested in this information, or when speed is an issue, choose the ‘STANDARD’ view.