pybliometrics.scopus.AuthorRetrieval

AuthorRetrieval() implements the Author Retrieval API. It provides a complete author record according to Scopus.

Documentation

class pybliometrics.scopus.AuthorRetrieval(author_id, refresh=False, view='ENHANCED', **kwds)[source]

Interaction with the Author Retrieval API.

Parameters:
  • author_id (Union[int, str]) – The ID or the EID of the author.

  • refresh (Union[bool, int], optional) – Whether to refresh the cached file if it exists or not. If int is passed, cached file will be refreshed if the number of days since last modification exceeds that value.

    Default: False

  • view (str, optional) – The view of the file that should be downloaded. Allowed values: METRICS, LIGHT, STANDARD, ENHANCED, where STANDARD includes all information of LIGHT view and ENHANCED includes all information of any view. For details see https://dev.elsevier.com/sc_author_retrieval_views.html. Note: Neither the BASIC nor the DOCUMENTS view are active, although documented.

    Default: 'ENHANCED'

  • kwds (str) – Keywords passed on as query parameters. Must contain fields and values mentioned in the API specification at https://dev.elsevier.com/documentation/AuthorRetrievalAPI.wadl.

Raises:

ValueError – If any of the parameters refresh or view is not one of the allowed values.

Notes

The directory for cached results is {path}/ENHANCED/{author_id}, where path is specified in your configuration file, and author_id is stripped of an eventually leading ‘9-s2.0-’.

property affiliation_current: List[NamedTuple] | None

A list of namedtuples representing the authors’s current affiliation(s), in the form (id parent type relationship afdispname preferred_name parent_preferred_name country_code country address_part city state postal_code org_domain org_URL). Note: Affiliation information might be missing or mal-assigned even when it lookes correct in the web view. In this case please request a correction.

property affiliation_history: List[NamedTuple] | None

A list of namedtuples representing the authors’s historical affiliation(s), in the form (id parent type relationship afdispname preferred_name parent_preferred_name country_code country address_part city state postal_code org_domain org_URL). Note: Affiliation information might be missing or mal-assigned even when it lookes correct in the web view. In this case please request a correction.

Note: Unlike on their website, Scopus doesn’t provide the periods of affiliation.

property alias: List[str] | None

List of possible new Scopus Author Profile IDs in case the profile has been merged.

property citation_count: int

Total number of citing items.

property cited_by_count: int

Total number of citing authors.

property classificationgroup: List[Tuple[int, int]] | None

List with tuples with form`(subject group ID, number of documents)`.

property coauthor_count: int | None

Total number of coauthors.

URL to Scopus API search page for coauthors.

property date_created: Tuple[int, int, int] | None

Date the Scopus record was created.

property document_count: int

Number of documents authored (excludes book chapters and notes).

property eid: str | None

The EID of the author. If it differs from the one provided, pybliometrics will throw a warning informing the user about author profile merges.

property given_name: str | None

Author’s preferred given name.

property h_index: str | None

The author’s h-index.

property historical_identifier: List[int] | None

Scopus IDs of previous profiles now compromising this profile.

property identifier: int

The author’s ID. Might differ from the one provided.

property indexed_name: str | None

Author’s name as indexed by Scopus.

property initials: str | None

Author’s preferred initials.

property name_variants: List[NamedTuple] | None

List of named tuples containing variants of the author name with number of documents published with that variant.

property orcid: str | None

The author’s ORCID.

property publication_range: Tuple[int, int] | None

Tuple containing years of first and last publication.

Link to the Scopus web view of the author.

URL to the API page listing documents of the author.

Link to the author’s API page.

property status: str | None

The status of the author profile.

property subject_areas: List[NamedTuple] | None

List of named tuples of subject areas in the form (area, abbreviation, code) of author’s publication.

property surname: str | None

Author’s preferred surname.

property url: str | None

URL to the author’s API page.

get_coauthors()[source]

Retrieves basic information about co-authors as a list of namedtuples in the form (surname, given_name, id, areas, affiliation_id, name, city, country), where areas is a list of subject area codes joined by “; “. Note: Method retrieves information via individual queries which will not be cached. The Scopus API returns 160 coauthors at most.

Return type:

List[NamedTuple] | None

get_documents(subtypes=None, *args, **kwds)[source]

Return list of the author’s publications using a ScopusSearch() query, where publications may fit a specified set of document subtypes.

Parameters:
  • subtypes (List[str], optional) – The type of documents that should be returned.

    Default: None

  • args (str) – Parameters to be passed on to ScopusSearch().

  • kwds (str) – Parameters to be passed on to ScopusSearch().

Return type:

List[NamedTuple] | None

Note: To update these results, use refresh; the class’ refresh parameter is not used here.

get_document_eids(*args, **kwds)[source]

Return list of EIDs of the author’s publications using a ScopusSearch() query.

Parameters:
  • args (str) – Parameters to be passed on to ScopusSearch().

  • kwds (str) – Parameters to be passed on to ScopusSearch().

Return type:

List[str] | None

Note: To update these results, use refresh; the class’ refresh parameter is not used here.

estimate_uniqueness(query=None, *args, **kwds)[source]

Return the number of Scopus author profiles similar to this profile via calls with AuthorSearch().

Parameters:
  • query (str, optional) – The query string to perform to search for authors. If None, the query is of form “AUTHLAST() AND AUTHFIRST()” with the corresponding information included. Provided queries may include “SUBJAREA()” OR “AF-ID() AND SUBJAREA()”. For details see https://dev.elsevier.com/tips/AuthorSearchTips.htm.

    Default: None

  • args (str) – Parameters to be passed on to AuthorSearch().

  • kwds (str) – Parameters to be passed on to AuthorSearch().

Return type:

int

get_cache_file_age()

Return the age of the cached file in days.

Return type:

int

get_cache_file_mdate()

Return the modification date of the cached file.

Return type:

str

get_key_remaining_quota()

Return number of remaining requests for the current key and the current API (relative on last actual request).

Return type:

str | None

get_key_reset_time()

Return time when current key is reset (relative on last actual request).

Return type:

str | None

Examples

You initiate the class with the author’s Scopus ID, which can be either an integer or a string:

>>> from pybliometrics.scopus import AuthorRetrieval
>>> au = AuthorRetrieval(7004212771)

You can obtain basic information just by printing the object:

>>> print(au)
Kitchin J. from Department of Chemical Engineering in United States,
published 108 document(s) since 1995
which were cited by 11,980 author(s) in 14,861 document(s) as of 2021-07-14

This object provides access to various data about an author, including the number of papers, h-index, current affiliation, etc. When a list of namedtuples is returned, it can neatly be turned into a pandas DataFrame.

Information regarding the author’s names includes:

>>> au.indexed_name
'Kitchin J.'
>>> au.surname
'Kitchin'
>>> au.given_name
'John R.'
>>> au.initials
'J.R.'
>>> au.name_variants
[Variant(indexed_name='Kitchin J.', initials='J.R.', surname='Kitchin',
 given_name='John R.', doc_count=90),
 Variant(indexed_name='Kitchin J.', initials='J.', surname='Kitchin',
 given_name='John', doc_count=11),
 Variant(indexed_name='Kitchin J.', initials='J.R.', surname='Kitchin',
 given_name='J. R.', doc_count=8)]
>>> au.eid
'9-s2.0-7004212771'

Bibliometric information includes:

>>> au.citation_count
14861
>>> au.document_count
108
>>> au.h_index
34
>>> au.orcid
'0000-0003-2625-9232'
>>> au.publication_range
(1995, 2021)
>>> import pandas as pd
>>> areas = pd.DataFrame(au.subject_areas)
>>> areas.shape
(49, 3)
>>> areas.head()
                            area abbreviation  code
0                Safety Research         SOCI  3311
1           Analytical Chemistry         CHEM  1602
2        Modeling and Simulation         MATH  2611
3        Materials Science (all)         MATE  2500
4  Colloid and Surface Chemistry         CENG  1505
>>> au.classificationgroup
[('3311', '4'), ('1602', '1'), ('2611', '5'), ('2500', '11'),
 ('1505', '1'), ('1605', '4'), ('1303', '2'), ('2504', '10'),
 ('1508', '3'), ('1706', '2'), ('1712', '1'), ('2209', '5'),
 ('2105', '1'), ('1504', '2'), ('1500', '26'), ('3309', '1'),
 ('1600', '28'), ('2508', '14'), ('2310', '2'), ('1503', '22'),
 ('2300', '1'), ('2102', '3'), ('3107', '3'), ('1000', '1'),
 ('3110', '9'), ('2213', '7'), ('2505', '6'), ('3100', '9'),
 ('1906', '1'), ('1305', '3'), ('2304', '1'), ('1604', '2'),
 ('1909', '1'), ('2207', '2'), ('2200', '2'), ('1607', '1'),
 ('2103', '3'), ('2308', '2'), ('3104', '21'), ('1311', '1'),
 ('1603', '3'), ('2305', '2'), ('1606', '24'), ('2503', '1'),
 ('2100', '11'), ('2208', '1'), ('1502', '2'), ('2104', '2'),
 ('1710', '5')]

If you request data of a merged author profile, Scopus provides information corresponding to the new, merged profile. The cache file’s name uses the provided, i.e., old, ID. With property .identifer you can verify the validity of the provided Author ID. When the provided ID belongs to a profile that has been merged, pybliometrics will throw a UserWarning (upon accessing the property .identifer) pointing to the ID of the new main profile.

Detailed information on current and former affiliations is also provided in the form of namedtuple:

>>> au.affiliation_current
[Affiliation(id=110785688, parent=60027950, type='dept', relationship='author',
 afdispname=None, preferred_name='Department of Chemical Engineering',
 parent_preferred_name='Carnegie Mellon University', country_code='usa',
 country='United States', address_part='5000 Forbes Avenue', city='Pittsburgh',
 state='PA', postal_code='15213-3890', org_domain='cmu.edu', org_URL='https://www.cmu.edu/')]
>>> len(au.affiliation_history)
16
>>> au.affiliation_history[10]
Affiliation(id=60008644, parent=None, type='parent', relationship='author',
afdispname=None, preferred_name='Fritz Haber Institute of the Max Planck Society',
parent_preferred_name=None, country_code='deu', country='Germany',
address_part='Faradayweg 4-6', city='Berlin', state=None, postal_code='14195',
org_domain='fhi.mpg.de', org_URL='https://www.fhi.mpg.de/')

The affiliation ID to be used for the AffiliationRetrieval class.

Downloaded results are cached to expedite subsequent analyses. This information may become outdated. To refresh the cached results if they exist, set refresh=True, or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set refresh=100. Use ab.get_cache_file_mdate() to obtain the date of last modification, and ab.get_cache_file_age() to determine the number of days since the last modification.

Several getter methods are available for convenience. For example, you can obtain some basic information on co-authors as a list of namedtuples (query will not be cached and is always up-to-date):

>>> coauthors = pd.DataFrame(au.get_coauthors())
>>> coauthors.shape
(160, 8)
>>> coauthors.columns
Index(['surname', 'given_name', 'id', 'areas', 'affiliation_id',
       'name', 'city', 'country'],
      dtype='object')

The get_documents() method is another convenient option for searching the author’s publications via ScopusSearch (information will be cached):

>>> docs = pd.DataFrame(au.get_documents(refresh=10))
>>> docs.shape
(108, 34)
>>> docs.columns
Index(['eid', 'doi', 'pii', 'pubmed_id', 'title', 'subtype',
       'subtypeDescription', 'creator', 'afid', 'affilname',
       'affiliation_city', 'affiliation_country', 'author_count',
       'author_names', 'author_ids', 'author_afids', 'coverDate',
       'coverDisplayDate', 'publicationName', 'issn', 'source_id', 'eIssn',
       'aggregationType', 'volume', 'issueIdentifier', 'article_number',
       'pageRange', 'description', 'authkeywords', 'citedby_count',
       'openaccess', 'fund_acr', 'fund_no', 'fund_sponsor'],
      dtype='object')

WWith a few additional code lines, you can determine the number of journal articles where the author is listed first:

>>> articles = docs[docs['aggregationType'] == 'Journal']
>>> first = articles[articles['author_ids'].str.startswith('7004212771')]
>>> first["eid"].tolist()
['2-s2.0-85048443766', '2-s2.0-85019169906', '2-s2.0-84971324241',
 '2-s2.0-84930349644', '2-s2.0-84930616647', '2-s2.0-84866142469',
 '2-s2.0-67449106405', '2-s2.0-40949100780', '2-s2.0-20544467859',
 '2-s2.0-13444307808', '2-s2.0-2942640180', '2-s2.0-0141924604',
 '2-s2.0-0037368024']

or you might be interested in the yearly number of publications:

>>> docs['year'] = docs['coverDate'].str[:4]
>>> docs['year'].value_counts().sort_index()
1995     1
2002     1
2003     3
2004     4
2005     3
2006     1
2007     2
2008     7
2009    10
2010     6
2011    10
2012     8
2013     4
2014    10
2015    12
2016     7
2017     8
2018     4
2019     2
2020     2
2021     3
Name: year, dtype: int64

If you’re just interested in the EIDs of the documents, use au.get_document_eids(). This method makes use of the same data available for/through au.get_documents().