pybliometrics.scopus.AuthorRetrieval

AuthorRetrieval() implements the Author Retrieval API. It contains an entire author record as per Scopus.

Table of Contents

Documentation

class pybliometrics.scopus.AuthorRetrieval(author_id, refresh=False, view='ENHANCED')[source]

Interaction with the Author Retrieval API.

Parameters
  • author_id (Union[int, str]) – The ID or the EID of the author.

  • refresh (Union[bool, int], optional) – Whether to refresh the cached file if it exists or not. If int is passed, cached file will be refreshed if the number of days since last modification exceeds that value.

    Default: False

  • view (str, optional) – The view of the file that should be downloaded. Allowed values: METRICS, LIGHT, STANDARD, ENHANCED, where STANDARD includes all information of LIGHT view and ENHANCED includes all information of any view. For details see https://dev.elsevier.com/sc_author_retrieval_views.html. Note: Neither the BASIC nor the DOCUMENTS view are active, although documented.

    Default: 'ENHANCED'

Raises

ValueError – If any of the parameters refresh or view is not one of the allowed values.

Return type

None

Notes

The directory for cached results is {path}/ENHANCED/{author_id}, where path is specified in your configuration file, and author_id is stripped of an eventually leading ‘9-s2.0-’.

property affiliation_current

A list of namedtuples representing the authors’s current affiliation(s), in the form (id parent type relationship afdispname preferred_name parent_preferred_name country_code country address_part city state postal_code org_domain org_URL). Note: Affiliation information might be missing or mal-assigned even when it lookes correct in the web view. In this case please request a correction.

property affiliation_history

A list of namedtuples representing the authors’s historical affiliation(s), in the form (id parent type relationship afdispname preferred_name parent_preferred_name country_code country address_part city state postal_code org_domain org_URL). Note: Affiliation information might be missing or mal-assigned even when it lookes correct in the web view. In this case please request a correction.

Note: Unlike on their website, Scopus doesn’t provide the periods of affiliation.

property alias

List of possible new Scopus Author Profile IDs in case the profile has been merged.

property citation_count

Total number of citing items.

property cited_by_count

Total number of citing authors.

property classificationgroup

List with (subject group ID, number of documents)-tuples.

property coauthor_count

Total number of coauthors.

URL to Scopus API search page for coauthors.

property date_created

Date the Scopus record was created.

property document_count

Number of documents authored (excludes book chapters and notes).

property eid

The EID of the author. If it differs from the one provided, pybliometrics will throw a warning informing the user about author profile merges.

property given_name

Author’s preferred given name.

property h_index

The author’s h-index.

property historical_identifier

Scopus IDs of previous profiles now compromising this profile.

property identifier

The author’s ID. Might differ from the one provided.

property indexed_name

Author’s name as indexed by Scopus.

property initials

Author’s preferred initials.

property name_variants

List of named tuples containing variants of the author name with number of documents published with that variant.

property orcid

The author’s ORCID.

property publication_range

Tuple containing years of first and last publication.

Link to the Scopus web view of the author.

URL to the API page listing documents of the author.

Link to the author’s API page.

property status

The status of the author profile.

property subject_areas

List of named tuples of subject areas in the form (area, abbreviation, code) of author’s publication.

property surname

Author’s preferred surname.

property url

URL to the author’s API page.

get_coauthors()[source]

Retrieves basic information about co-authors as a list of namedtuples in the form (surname, given_name, id, areas, affiliation_id, name, city, country), where areas is a list of subject area codes joined by “; “. Note: Method retrieves information via individual queries which will not be cached. The Scopus API returns 160 coauthors at most.

Return type

Optional[List[NamedTuple]]

get_documents(subtypes=None, *args, **kwds)[source]

Return list of the author’s publications using a ScopusSearch() query, where publications may fit specified set of document subtypes.

Parameters
  • subtypes (Optional[List[str]], optional) – The type of documents that should be returned.

    Default: None

  • args (str) – Parameters to be passed on to ScopusSearch().

  • kwds (str) – Parameters to be passed on to ScopusSearch().

Return type

Optional[List[NamedTuple]]

get_document_eids(*args, **kwds)[source]

Return list of EIDs of the author’s publications using a ScopusSearch() query.

Parameters
  • args (str) – Parameters to be passed on to ScopusSearch().

  • kwds (str) – Parameters to be passed on to ScopusSearch().

Return type

Optional[List[str]]

estimate_uniqueness(query=None, *args, **kwds)[source]

Return the number of Scopus author profiles similar to this profile via calls with AuthorSearch().

Parameters
  • query (Optional[str], optional) – The query string to perform to search for authors. If None, the query is of form “AUTHLAST() AND AUTHFIRST()” with the corresponding information included. Provided queries may include “SUBJAREA()” OR “AF-ID() AND SUBJAREA()”. For details see https://dev.elsevier.com/tips/AuthorSearchTips.htm.

    Default: None

  • args (str) – Parameters to be passed on to AuthorSearch().

  • kwds (str) – Parameters to be passed on to AuthorSearch().

Return type

int

get_cache_file_age()

Return the age of the cached file in days.

Return type

int

get_cache_file_mdate()

Return the modification date of the cached file.

Return type

str

get_key_remaining_quota()

Return number of remaining requests for the current key and the current API (relative on last actual request).

Return type

Optional[str]

get_key_reset_time()

Return time when current key is reset (relative on last actual request).

Return type

Optional[str]

Examples

You initiate the class with the author’s Scopus ID, which can be passed as either an integer or a string:

>>> from pybliometrics.scopus import AuthorRetrieval
>>> au = AuthorRetrieval(7004212771)

You can obtain basic information just by printing the object:

>>> print(au)
Kitchin J. from Department of Chemical Engineering in United States,
published 108 document(s) since 1995
which were cited by 11,980 author(s) in 14,861 document(s) as of 2021-07-14

The object can access many bits of data about an author, including the number of papers, h-index, current affiliation, etc. When a list of namedtuples is returned, it can neatly be turned into a pandas DataFrame.

Information on names:

>>> au.indexed_name
'Kitchin J.'
>>> au.surname
'Kitchin'
>>> au.given_name
'John R.'
>>> au.initials
'J.R.'
>>> au.name_variants
[Variant(indexed_name='Kitchin J.', initials='J.R.', surname='Kitchin',
 given_name='John R.', doc_count=90),
 Variant(indexed_name='Kitchin J.', initials='J.', surname='Kitchin',
 given_name='John', doc_count=11),
 Variant(indexed_name='Kitchin J.', initials='J.R.', surname='Kitchin',
 given_name='J. R.', doc_count=8)]
>>> au.eid
'9-s2.0-7004212771'

Bibliometric information:

>>> au.citation_count
14861
>>> au.document_count
108
>>> au.h_index
34
>>> au.orcid
'0000-0003-2625-9232'
>>> au.publication_range
(1995, 2021)
>>> import pandas as pd
>>> areas = pd.DataFrame(au.subject_areas)
>>> areas.shape
(49, 3)
>>> areas.head()
                            area abbreviation  code
0                Safety Research         SOCI  3311
1           Analytical Chemistry         CHEM  1602
2        Modeling and Simulation         MATH  2611
3        Materials Science (all)         MATE  2500
4  Colloid and Surface Chemistry         CENG  1505
>>> au.classificationgroup
[('3311', '4'), ('1602', '1'), ('2611', '5'), ('2500', '11'),
 ('1505', '1'), ('1605', '4'), ('1303', '2'), ('2504', '10'),
 ('1508', '3'), ('1706', '2'), ('1712', '1'), ('2209', '5'),
 ('2105', '1'), ('1504', '2'), ('1500', '26'), ('3309', '1'),
 ('1600', '28'), ('2508', '14'), ('2310', '2'), ('1503', '22'),
 ('2300', '1'), ('2102', '3'), ('3107', '3'), ('1000', '1'),
 ('3110', '9'), ('2213', '7'), ('2505', '6'), ('3100', '9'),
 ('1906', '1'), ('1305', '3'), ('2304', '1'), ('1604', '2'),
 ('1909', '1'), ('2207', '2'), ('2200', '2'), ('1607', '1'),
 ('2103', '3'), ('2308', '2'), ('3104', '21'), ('1311', '1'),
 ('1603', '3'), ('2305', '2'), ('1606', '24'), ('2503', '1'),
 ('2100', '11'), ('2208', '1'), ('1502', '2'), ('2104', '2'),
 ('1710', '5')]

If you request data of a merged author profile, Scopus returns information belonging to that new profile. pybliometrics however caches information using the old ID. With property .identifer you can verify the validity of the provided Author ID. When the provided ID belongs to a profile that has been merged, pybliometrics will throw a UserWarning (upon accessing the property .identifer) pointing to the ID of the new main profile.

Extensive information on current and former affiliations is provided as namedtuples as well:

>>> au.affiliation_current
[Affiliation(id=110785688, parent=60027950, type='dept', relationship='author',
 afdispname=None, preferred_name='Department of Chemical Engineering',
 parent_preferred_name='Carnegie Mellon University', country_code='usa',
 country='United States', address_part='5000 Forbes Avenue', city='Pittsburgh',
 state='PA', postal_code='15213-3890', org_domain='cmu.edu', org_URL='https://www.cmu.edu/')]
>>> len(au.affiliation_history)
16
>>> au.affiliation_history[10]
Affiliation(id=60008644, parent=None, type='parent', relationship='author',
afdispname=None, preferred_name='Fritz Haber Institute of the Max Planck Society',
parent_preferred_name=None, country_code='deu', country='Germany',
address_part='Faradayweg 4-6', city='Berlin', state=None, postal_code='14195',
org_domain='fhi.mpg.de', org_URL='https://www.fhi.mpg.de/')

The affiliation ID to be used for the AffiliationRetrieval class.

pybliometrics caches results to speed up subsequent analysis. This information eventually becomes outdated. To refresh the cached results if they exist, use the refresh parameter when initiating the class. Set refresh=True or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set refresh=100. Use au.get_cache_file_mdate() to get the date of last modification, and au.get_cache_file_age() the number of days since the last modification.

There are a number of getter methods for convenience. For example, you can obtain some basic information on co-authors as a list of namedtuples (query will not be cached and is always up-to-date):

>>> coauthors = pd.DataFrame(au.get_coauthors())
>>> coauthors.shape
(160, 8)
>>> coauthors.columns
Index(['surname', 'given_name', 'id', 'areas', 'affiliation_id',
       'name', 'city', 'country'],
      dtype='object')

Method get_documents() is another convenience method to search for the author’s publications via ScopusSearch (information will be cached):

>>> docs = pd.DataFrame(au.get_documents(refresh=10))
>>> docs.shape
(108, 34)
>>> docs.columns
Index(['eid', 'doi', 'pii', 'pubmed_id', 'title', 'subtype',
       'subtypeDescription', 'creator', 'afid', 'affilname',
       'affiliation_city', 'affiliation_country', 'author_count',
       'author_names', 'author_ids', 'author_afids', 'coverDate',
       'coverDisplayDate', 'publicationName', 'issn', 'source_id', 'eIssn',
       'aggregationType', 'volume', 'issueIdentifier', 'article_number',
       'pageRange', 'description', 'authkeywords', 'citedby_count',
       'openaccess', 'fund_acr', 'fund_no', 'fund_sponsor'],
      dtype='object')

With some additional lines of code you can get the number of journal articles where the author is listed first:

>>> articles = docs[docs['aggregationType'] == 'Journal']
>>> first = articles[articles['author_ids'].str.startswith('7004212771')]
>>> first["eid"].tolist()
['2-s2.0-85048443766', '2-s2.0-85019169906', '2-s2.0-84971324241',
 '2-s2.0-84930349644', '2-s2.0-84930616647', '2-s2.0-84866142469',
 '2-s2.0-67449106405', '2-s2.0-40949100780', '2-s2.0-20544467859',
 '2-s2.0-13444307808', '2-s2.0-2942640180', '2-s2.0-0141924604',
 '2-s2.0-0037368024']

or you might be interested in the yearly number of publications:

>>> docs['year'] = docs['coverDate'].str[:4]
>>> docs['year'].value_counts().sort_index()
1995     1
2002     1
2003     3
2004     4
2005     3
2006     1
2007     2
2008     7
2009    10
2010     6
2011    10
2012     8
2013     4
2014    10
2015    12
2016     7
2017     8
2018     4
2019     2
2020     2
2021     3
Name: year, dtype: int64

If you’re just interested in the EIDs of the documents, use au.get_document_eids(). This method makes use of the same data available for/through au.get_documents().