pybliometrics.scopus.AuthorRetrieval ==================================== `AuthorRetrieval()` implements the `Author Retrieval API `_. It provides a complete author record according to Scopus. .. currentmodule:: pybliometrics.scopus .. contents:: Table of Contents :local: Documentation ------------- .. autoclass:: AuthorRetrieval :members: :inherited-members: Examples -------- You initiate the class with the author's Scopus ID, which can be either an integer or a string: .. code-block:: python >>> from pybliometrics.scopus import AuthorRetrieval >>> au = AuthorRetrieval(7004212771) You can obtain basic information just by printing the object: .. code-block:: python >>> print(au) Kitchin J. from Department of Chemical Engineering in United States, published 108 document(s) since 1995 which were cited by 11,980 author(s) in 14,861 document(s) as of 2021-07-14 This object provides access to various data about an author, including the number of papers, h-index, current affiliation, etc. When a list of `namedtuples `_ is returned, it can neatly be turned into a `pandas `_ DataFrame. Information regarding the author's names includes: .. code-block:: python >>> au.indexed_name 'Kitchin J.' >>> au.surname 'Kitchin' >>> au.given_name 'John R.' >>> au.initials 'J.R.' >>> au.name_variants [Variant(indexed_name='Kitchin J.', initials='J.R.', surname='Kitchin', given_name='John R.', doc_count=90), Variant(indexed_name='Kitchin J.', initials='J.', surname='Kitchin', given_name='John', doc_count=11), Variant(indexed_name='Kitchin J.', initials='J.R.', surname='Kitchin', given_name='J. R.', doc_count=8)] >>> au.eid '9-s2.0-7004212771' Bibliometric information includes: .. code-block:: python >>> au.citation_count 14861 >>> au.document_count 108 >>> au.h_index 34 >>> au.orcid '0000-0003-2625-9232' >>> au.publication_range (1995, 2021) >>> import pandas as pd >>> areas = pd.DataFrame(au.subject_areas) >>> areas.shape (49, 3) >>> areas.head() area abbreviation code 0 Safety Research SOCI 3311 1 Analytical Chemistry CHEM 1602 2 Modeling and Simulation MATH 2611 3 Materials Science (all) MATE 2500 4 Colloid and Surface Chemistry CENG 1505 >>> au.classificationgroup [('3311', '4'), ('1602', '1'), ('2611', '5'), ('2500', '11'), ('1505', '1'), ('1605', '4'), ('1303', '2'), ('2504', '10'), ('1508', '3'), ('1706', '2'), ('1712', '1'), ('2209', '5'), ('2105', '1'), ('1504', '2'), ('1500', '26'), ('3309', '1'), ('1600', '28'), ('2508', '14'), ('2310', '2'), ('1503', '22'), ('2300', '1'), ('2102', '3'), ('3107', '3'), ('1000', '1'), ('3110', '9'), ('2213', '7'), ('2505', '6'), ('3100', '9'), ('1906', '1'), ('1305', '3'), ('2304', '1'), ('1604', '2'), ('1909', '1'), ('2207', '2'), ('2200', '2'), ('1607', '1'), ('2103', '3'), ('2308', '2'), ('3104', '21'), ('1311', '1'), ('1603', '3'), ('2305', '2'), ('1606', '24'), ('2503', '1'), ('2100', '11'), ('2208', '1'), ('1502', '2'), ('2104', '2'), ('1710', '5')] If you request data of a merged author profile, Scopus provides information corresponding to the new, merged profile. The cache file's name uses the provided, i.e., old, ID. With property `.identifer` you can verify the validity of the provided Author ID. When the provided ID belongs to a profile that has been merged, pybliometrics will throw a UserWarning (upon accessing the property `.identifer`) pointing to the ID of the new main profile. Detailed information on current and former affiliations is also provided in the form of namedtuple: .. code-block:: python >>> au.affiliation_current [Affiliation(id=110785688, parent=60027950, type='dept', relationship='author', afdispname=None, preferred_name='Department of Chemical Engineering', parent_preferred_name='Carnegie Mellon University', country_code='usa', country='United States', address_part='5000 Forbes Avenue', city='Pittsburgh', state='PA', postal_code='15213-3890', org_domain='cmu.edu', org_URL='https://www.cmu.edu/')] >>> len(au.affiliation_history) 16 >>> au.affiliation_history[10] Affiliation(id=60008644, parent=None, type='parent', relationship='author', afdispname=None, preferred_name='Fritz Haber Institute of the Max Planck Society', parent_preferred_name=None, country_code='deu', country='Germany', address_part='Faradayweg 4-6', city='Berlin', state=None, postal_code='14195', org_domain='fhi.mpg.de', org_URL='https://www.fhi.mpg.de/') The affiliation ID to be used for the :doc:`AffiliationRetrieval <../classes/AffiliationRetrieval>` class. Downloaded results are cached to expedite subsequent analyses. This information may become outdated. To refresh the cached results if they exist, set `refresh=True`, or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set `refresh=100`. Use `ab.get_cache_file_mdate()` to obtain the date of last modification, and `ab.get_cache_file_age()` to determine the number of days since the last modification. Several getter methods are available for convenience. For example, you can obtain some basic information on co-authors as a list of namedtuples (query will not be cached and is always up-to-date): .. code-block:: python >>> coauthors = pd.DataFrame(au.get_coauthors()) >>> coauthors.shape (160, 8) >>> coauthors.columns Index(['surname', 'given_name', 'id', 'areas', 'affiliation_id', 'name', 'city', 'country'], dtype='object') The `get_documents()` method is another convenient option for searching the author's publications via :doc:`ScopusSearch <../classes/ScopusSearch>` (information will be cached): .. code-block:: python >>> docs = pd.DataFrame(au.get_documents(refresh=10)) >>> docs.shape (108, 34) >>> docs.columns Index(['eid', 'doi', 'pii', 'pubmed_id', 'title', 'subtype', 'subtypeDescription', 'creator', 'afid', 'affilname', 'affiliation_city', 'affiliation_country', 'author_count', 'author_names', 'author_ids', 'author_afids', 'coverDate', 'coverDisplayDate', 'publicationName', 'issn', 'source_id', 'eIssn', 'aggregationType', 'volume', 'issueIdentifier', 'article_number', 'pageRange', 'description', 'authkeywords', 'citedby_count', 'openaccess', 'fund_acr', 'fund_no', 'fund_sponsor'], dtype='object') WWith a few additional code lines, you can determine the number of journal articles where the author is listed first: .. code-block:: python >>> articles = docs[docs['aggregationType'] == 'Journal'] >>> first = articles[articles['author_ids'].str.startswith('7004212771')] >>> first["eid"].tolist() ['2-s2.0-85048443766', '2-s2.0-85019169906', '2-s2.0-84971324241', '2-s2.0-84930349644', '2-s2.0-84930616647', '2-s2.0-84866142469', '2-s2.0-67449106405', '2-s2.0-40949100780', '2-s2.0-20544467859', '2-s2.0-13444307808', '2-s2.0-2942640180', '2-s2.0-0141924604', '2-s2.0-0037368024'] or you might be interested in the yearly number of publications: .. code-block:: python >>> docs['year'] = docs['coverDate'].str[:4] >>> docs['year'].value_counts().sort_index() 1995 1 2002 1 2003 3 2004 4 2005 3 2006 1 2007 2 2008 7 2009 10 2010 6 2011 10 2012 8 2013 4 2014 10 2015 12 2016 7 2017 8 2018 4 2019 2 2020 2 2021 3 Name: year, dtype: int64 If you're just interested in the EIDs of the documents, use `au.get_document_eids()`. This method makes use of the same data available for/through `au.get_documents()`.