pybliometrics.scopus.AuthorSearch¶
AuthorSearch() implements the Author Search API. It executes a query to search for authors and retrieves the corresponding records.
Documentation¶
- class pybliometrics.scopus.AuthorSearch(query, refresh=False, verbose=False, download=True, integrity_fields=None, integrity_action='raise', count=200, **kwds)[source]¶
Interaction with the Author Search API.
- Parameters:
query (
str
) – A string of the query. For allowed fields and values see https://dev.elsevier.com/sc_author_search_tips.html.refresh (
Union
[bool
,int
], optional) – Whether to refresh the cached file if it exists or not. If int is passed, cached file will be refreshed if the number of days since last modification exceeds that value.Default:False
download (
bool
, optional) – Whether to download results (if they have not been cached).Default:True
verbose (
bool
, optional) – Whether to print a download progress bar.Default:False
integrity_fields (
Union
[List
[str
],Tuple
[str
,...
]], optional) – Names of fields whose completeness should be checked. ScopusSearch will perform the action specified in integrity_action if elements in these fields are missing. This helps avoiding idiosynchratically missing elements that should always be present (e.g., EID or source ID).Default:None
integrity_action (
str
, optional) – What to do in case integrity of provided fields cannot be verified. Possible actions: - “raise”: Raise an AttributeError - “warn”: Raise a UserWarningDefault:'raise'
count (
int
, optional) – (deprecated) The number of entries to be displayed at once. A smaller number means more queries with each query having fewer results.Default:200
kwds (
str
) – Keywords passed on as query parameters. Must contain fields and values mentioned in the API specification at https://dev.elsevier.com/documentation/AuthorSearchAPI.wadl.- Raises:
ScopusQueryError – If the number of search results exceeds 5000, which is the API’s maximum number of results returned. The error prevents the download attempt and avoids making use of your API key.
ValueError – If any of the parameters integrity_action or refresh is not one of the allowed values.
Notes
The directory for cached results is {path}/STANDARD/{fname}, where path is specified in your configuration file, and fname is the md5-hashed version of query.
- property authors: List[NamedTuple] | None¶
A list of namedtuples storing author information, where each namedtuple corresponds to one author. The information in each namedtuple is (eid orcid surname initials givenname documents affiliation affiliation_id city country areas).
All entries are str or None. Areas combines abbreviated subject areas followed by the number of documents in this subject.
- Raises:
ValueError – If the elements provided in integrity_fields do not match the actual field names (listed above).
- get_cache_file_age()¶
Return the age of the cached file in days.
- Return type:
int
- get_cache_file_mdate()¶
Return the modification date of the cached file.
- Return type:
str
- get_key_remaining_quota()¶
Return number of remaining requests for the current key and the current API (relative on last actual request).
- Return type:
str | None
- get_key_reset_time()¶
Return time when current key is reset (relative on last actual request).
- Return type:
str | None
- get_results_size()¶
Return the number of results (works even if download=False).
- Return type:
int
Examples¶
The class is initialized using a search query, details of which can be found in Author Search Guide. An invalid search query will result in an error.
>>> from pybliometrics.scopus import AuthorSearch >>> s = AuthorSearch('AUTHLAST(Selten) and AUTHFIRST(Reinhard)')
You can obtain a search summary just by printing the object:
>>> print(s) Search 'AUTHLAST(Selten) and AUTHFIRST(Reinhard)' yielded 2 authors as of 2021-11-12: Selten, Reinhard; AUTHOR_ID:6602907525 (74 document(s)) Selten, Reinhard; AUTHOR_ID:57213632570 (1 document(s))
To determine the the number of results use the .get_results_size() method, even before you download the results:
>>> other = AuthorSearch("AUTHLAST(Selten)", download=False) >>> other.get_results_size() 29
Primarily, the class provides a list of namedtuples storing author EIDs, which you can use for the AuthorRetrieval class, and corresponding information:
>>> s.authors[0] [Author(eid='9-s2.0-6602907525', orcid=None, surname='Selten', initials='R.', givenname='Reinhard', affiliation='Universitat Bonn', documents=74, affiliation_id='60007493', city='Bonn', country='Germany', areas='ECON (73); MATH (19); BUSI (16)')]
Working with namedtuples is straightforward: Using pandas, you can quickly convert the results set into a DataFrame:
>>> import pandas as pd >>> pd.set_option('display.max_columns', None) >>> print(pd.DataFrame(s.authors)) eid orcid surname initials givenname \ 0 9-s2.0-6602907525 None Selten R. Reinhard 1 9-s2.0-57213632570 None Selten R. Reinhard affiliation documents affiliation_id city country \ 0 Universität Bonn 74 60007493 Bonn Germany 1 Southwest Jiaotong University 1 60010421 Chengdu China areas 0 ECON (73); MATH (19); BUSI (16) 1 COMP (3)
Downloaded results are cached to expedite subsequent analyses. This information may become outdated. To refresh the cached results if they exist, set refresh=True, or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set refresh=100. Use ab.get_cache_file_mdate() to obtain the date of last modification, and ab.get_cache_file_age() to determine the number of days since the last modification.
Occasionally, some information that exists in the Scopus database may be missing in the returned results. For example, the EID may be missing, even though every element always has an EID. This is not a bug of pybliometrics. Instead it is somehow related to a problem in the download process from the Scopus database. To check for completeness of specific fields, use parameter integrity_fields, which accepts any iterable. With the integrity_action parameter you can choose between two actions if the integrity check fails: Set integrity_action=”warn” to issue a UserWarning, or set integrity_action=”raise” to raise an AttributeError.
>>> s = AuthorSearch("AUTHLAST(Selten)", integrity_fields=["eid"], integrity_action="warn")
When searching for authors by institution, it’s important to note that searching by affiliation profile ID and affiliation name yields different results. Search by affiliation name, i.e. AFFIL(Max Planck Institute for Innovation and Competition)), finds all authors ever affiliated with the Max Planck Institute for Innovation and Competition, whereas search by affiliation profile ID, i.e. AF-ID(60105007), finds researchers whose latest affiliation includes the Max Planck Institute for Innovation and Competition.
©2017-2023 Michael E. Rose and John Kitchin. | Powered by Sphinx 7.2.6 & Alabaster 0.7.13 | Page source