pybliometrics.sciencedirect.SubjectClassifications

SubjectClassifications() implements the ScienceDirect Subject Classifications API. It enables retrieval-like queries for All Science Journal Classification (ASJC) subjects/areas.

Documentation

class pybliometrics.sciencedirect.SubjectClassifications(query, refresh=False, fields=None, **kwds)[source]

Interaction with the ScienceDirect Subject Classifications API.

Parameters:
  • query (dict) – Query parameters and corresponding fields. Allowed keys ‘code’, ‘abbrev’, ‘description’, ‘detail’. For more details on search fields please refer to the documentation.

  • refresh (bool | int, optional) – Whether to refresh the cached file if it exists or not. If int is passed, cached file will be refreshed if the number of days since last modification exceeds that value.

    Default: False

  • fields (list[str] | tuple[str, ...] | None, optional) – The fields to return when calling search results. Allowed values: ‘code’, ‘abbrev’, ‘description’, ‘detail’. For details see the documentation.

    Default: None

  • kwds (str) – Keywords passed on as query parameters. Must contain fields and values mentioned in the API specification.

Raises:
  • TypeError – If returned fields are not passed in an iterable container.

  • ValueError – If any of the parameters fields, refresh or query is not one of the allowed values.

Notes

The directory for cached results is {path}/{fname}, where path is specified in your configuration file, and fname is the md5-hashed version of query dict turned into string in format of ‘key=value’ delimited by ‘&’.

get_cache_file_age()

Return the age of the cached file in days.

Return type:

int

get_cache_file_mdate()

Return the modification date of the cached file.

Return type:

str

get_key_remaining_quota()

Return number of remaining requests for the current key and the current API (relative on last actual request).

Return type:

str | None

get_key_reset_time()

Return time when current key is reset (relative on last actual request).

Return type:

str | None

get_results_size()

Return the number of results (works even if download=False).

Return type:

int

property results: list[NamedTuple] | None

A list of namedtuples representing results of subject classifications search in the form (code, description, detail, abbrev).

Examples

You initialize the class with a query dict. It contains the “description” (general classification of the subject), “code” (the ASJC code), “detail” (detailed name of the subject), “abbrev” (abbreviation of general classification of subject) or a combination of those:

>>> from pybliometrics.sciencedirect import SubjectClassifications, init
>>> init()
>>> # Retrieve subject areas with 'Chemistry' in the description
>>> sc = SubjectClassifications({'description': 'Chemistry'}, refresh=30)
>>> # Access the results
>>> sc.results
[Subject(code='18', description='Biochemistry, Genetics and Molecular Biology', detail='Biochemistry, Genetics and Molecular Biology', abbrev='biochemgenmolbiol'),
Subject(code='399', description='Biochemistry', detail='Biochemistry, Genetics and Molecular Biology::Biochemistry', abbrev='biochem'),
Subject(code='400', description='Biochemistry, Genetics and Molecular Biology (General)', detail='Biochemistry, Genetics and Molecular Biology::Biochemistry, Genetics and Molecular Biology (General)', abbrev='biogen'),
...]

The results are stored in a named tuple. We can access the individual fields, like the ASJC code as follows:

>>> # Access the first result and get the ASJC code
>>> first_result = sc.results[0]
>>> first_result.code
'18'

The results can be cast into a pandas DataFrame:

>>> import pandas as pd
>>> # Cast results to a pandas DataFrame
>>> df = pd.DataFrame(sc.results)
>>> # Display available fields
>>> df.columns
Index(['code', 'description', 'detail', 'abbrev'], dtype='object')
>>> # Get shape of the DataFrame (rows x columns)
(16, 4)
>>> # Display the first 5 rows
>>> df.head(5)