pybliometrics.scopus.SerialSearch¶
SerialSearch() implements the search of the Serial Title API. This class performs searches for serial sources (journals, trade journals, conference proceedings, book series) based on title, ISSN, publisher, subject, or source type.
Documentation¶
- class pybliometrics.scopus.SerialSearch(query, refresh=False, view='ENHANCED', **kwds)[source]¶
Interaction with the Serial Title API.
- Parameters:
query (
Dict
) – Query parameters and corresponding fields. Allowed keys ‘title’, ‘issn’, ‘pub’, ‘subj’, ‘subjCode’, ‘content’, ‘oa’. For examples on possible values, please refer to https://dev.elsevier.com/documentation/SerialTitleAPI.wadl#d1e22.refresh (
Union
[bool
,int
], optional) – Whether to refresh the cached file if it exists or not. If int is passed, cached file will be refreshed if the number of days since last modification exceeds that value.Default:False
view (
str
, optional) – The view of the file that should be downloaded. Allowed values: STANDARD, ENHANCED, CITESCORE. For details see https://dev.elsevier.com/sc_serial_title_views.html.Default:'ENHANCED'
kwds (
str
) – Keywords passed on as query parameters. Must contain fields and values listed in the API specification at https://dev.elsevier.com/documentation/SerialTitleAPI.wadl.- Raises:
Scopus400Error – If provided value for a query key is invalid or if for non-subscribers the number of search results exceeds 5000.
ValueError – If any of the parameters refresh or view is not one of the allowed values.
Notes
The directory for cached results is {path}/{view}/{fname}, where path is specified in your configuration file, and fname is the md5-hashed version of query dict turned into string in format of ‘key=value’ delimited by ‘&’.
- property results: List[Dict[str, str]] | None¶
A list of OrderedDicts representing results of serial search. The number of keys may vary from one search result to another depending on the length of yearly data.
- get_cache_file_age()¶
Return the age of the cached file in days.
- Return type:
int
- get_cache_file_mdate()¶
Return the modification date of the cached file.
- Return type:
str
- get_key_remaining_quota()¶
Return number of remaining requests for the current key and the current API (relative on last actual request).
- Return type:
str | None
- get_key_reset_time()¶
Return time when current key is reset (relative on last actual request).
- Return type:
str | None
- get_results_size()¶
Return the number of results (works even if download=False).
- Return type:
int
Examples¶
The class is initialized with a search query dictionary. Its keys are limited to the following set: “title”, “issn”, “pub”, “subj”, “subjCode”, “content”, and “oa”. No more than 200 results can be returned.
>>> from pybliometrics.scopus import SerialSearch >>> s = SerialSearch(query={"title": "SoftwareX"})
You can obtain basic information just by printing the object:
>>> print(s) Search '{'title': 'SoftwareX'}' yielded 1 source as of 2021-07-14: SoftwareX
Users can determine the number of results programmatically using the .get_results_size() method:
>>> s.get_results_size() 1
The main attribute of the class, results, returns a list of OrderedDict objects. Provided information can differ greatly between results and depending on the view (see below) they can be numerous. Lists of OrderedDict objects can be efficiently converted into DataFrames using pandas:
>>> import pandas as pd >>> df = pd.DataFrame(pd.DataFrame(s.results)) >>> df.shape (1, 147) >>> df.columns Index(['title', 'publisher', 'coverageStartYear', 'coverageEndYear', 'aggregationType', 'source-id', 'eIssn', 'openaccess', 'openaccessArticle', 'subject_area_codes', ... 'publicationCount_2019', 'citeCountSCE_2019', 'zeroCitesSCE_2019', 'zeroCitesPercentSCE_2019', 'revPercent_2019', 'publicationCount_2020', 'citeCountSCE_2020', 'zeroCitesSCE_2020', 'zeroCitesPercentSCE_2020', 'revPercent_2020'], dtype='object', length=142) >>> pd.set_option('display.max_columns', None) >>> df.iloc[:,:16] title publisher coverageStartYear coverageEndYear aggregationType \ 0 SoftwareX Elsevier BV 2015 2020 journal source-id eIssn openaccess openaccessArticle subject_area_codes \ 0 21100422153 2352-7110 1 True 1712;1706 subject_area_abbrevs subject_area_names SNIP_2018 \ 0 COMP Software;Computer Science Applications 4.905 SJR_2018 citeScoreTracker_2019 citeScoreCurrentMetric_2018 0 4.539 2.18 11.56
The information in columns beyond the first 16 pertains to journal metrics: publication counts, citation counts, not-cited documents, share of not-cited documents, and the share of review article documents, for each year since indexation.
Downloaded results are cached to expedite subsequent analyses. This information may become outdated. To refresh the cached results if they exist, set refresh=True, or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set refresh=100. Use ab.get_cache_file_mdate() to obtain the date of last modification, and ab.get_cache_file_age() to determine the number of days since the last modification.
The Serial Title API offers varying depths of information through views. While all views are restricted, view ‘ENHANCED’ is the highest among them. In addition to the information contained in ‘STANDARD’ it contains yearly journal metrics. If you are not interested in this information, or when speed is an issue, choose the ‘STANDARD’ view.