pybliometrics.scopus.AbstractRetrieval ====================================== `AbstractRetrieval()` implements the `Scopus Abstract Retrieval API `_. It accepts any identifier as the main argument. Most commonly, this will be a `Scopus EID `_, but DOI, Scopus ID (the last part of the EID), PubMed identifier or Publisher Item Identifier (PII) work as well. `AbstractRetrieval` tries to infer the class itself - to speed this up you can tell the ID type via `ID_type`. The Abstract Retrieval API allows a differing information depth via `views `_, some of which are restricted. The 'META_ABS' view is the most comprehensive among unrestricted views, encompassing all information from other unrestricted views. It is therefore the default view. The view with the most information content is 'FULL', which includes all information available with 'META_ABS', but is restricted. Generally, you should always try to use `view='FULL'` when downloading an abstract and fall back to the default otherwise. .. currentmodule:: pybliometrics.scopus .. contents:: Table of Contents :local: Documentation ------------- .. autoclass:: AbstractRetrieval :members: :inherited-members: Examples -------- You initialize the class with an ID that Scopus uses, e.g. the EID: .. code-block:: python >>> from pybliometrics.scopus import AbstractRetrieval >>> ab = AbstractRetrieval("2-s2.0-85068268027", view='FULL') You can obtain basic information just by printing the object: .. code-block:: python >>> print(ab) Michael E. Rose and John R. Kitchin: "pybliometrics: Scriptable bibliometrics using a Python interface to Scopus", SoftwareX, 10, (no pages found)(2019). https://doi.org/10.1016/j.softx.2019.100263. 34 citation(s) as of 2022-04-07 Affiliation(s): Max Planck Institute for Innovation and Competition Carnegie Mellon University There are 52 attributes and 8 methods to interact with. For example, to obtain bibliographic information: .. code-block:: python >>> ab.publicationName 'SoftwareX' >>> ab.aggregationType 'Journal' >>> ab.coverDate '2019-07-01' >>> ab.volume '10' >>> ab.issueIdentifier None >>> ab.pageRange None >>> ab.doi '10.1016/j.softx.2019.100263' >>> ab.openaccessFlag True The attributes `idxterms`, `subject_areas` and `authkeywords` (if provided) offer insights into the document's content: .. code-block:: python >>> ab.idxterms ['Bibliometrics', 'Python', 'Python interfaces', 'Reproducibilities', 'Scientometrics', 'Scopus', 'Scopus database', 'User friendly interface'] >>> ab.subject_areas [Area(area='Software', abbreviation='COMP', code=1712), Area(area='Computer Science Applications', abbreviation='COMP', code=1706)] >>> ab.authkeywords ['Bibliometrics', 'Python', 'Scientometrics', 'Scopus', 'Software'] To obtain the total citation count (at the time the abstract was retrieved and cached): .. code-block:: python >>> ab.citedby_count 34 You can retrieve the authors as a list of `namedtuples `_, which pair conveniently with `pandas `_: .. code-block:: python >>> ab.authors [Author(auid=57209617104, indexed_name='Rose M.E.', surname='Rose', given_name='Michael E.', affiliation='60105007'), Author(auid=7004212771, indexed_name='Kitchin J.R.', surname='Kitchin', given_name='John R.', affiliation='60027950')] >>> import pandas as pd >>> print(pd.DataFrame(ab.authors)) auid indexed_name surname given_name affiliation 0 57209617104 Rose M.E. Rose Michael E. 60105007 1 7004212771 Kitchin J.R. Kitchin John R. 60027950 The same structure applies for the attributes `affiliation` and `authorgroup`: .. code-block:: python >>> ab.affiliation [Affiliation(id=60105007, name='Max Planck Institute for Innovation and Competition', city='Munich', country='Germany'), Affiliation(id=60027950, name='Carnegie Mellon University', city='Pittsburgh', country='United States')] >>> ab.authorgroup [Author(affiliation_id=60105007, dptid=None, organization='Max Planck Institute for Innovation and Competition', city=None, postalcode=None, addresspart=None, country='Germany', collaboration=None, auid=57209617104, orcid=None, indexed_name='Rose M.E.', surname='Rose', given_name='Michael E.'), Author(affiliation_id=60027950, dptid=110785688, organization='Carnegie Mellon University, Department of Chemical Engineering', city=None, postalcode=None, addresspart=None, country='United States', collaboration=None, auid=7004212771, orcid=None, indexed_name='Kitchin J.R.', surname='Kitchin', given_name='John R.')] Note that Scopus may not always accurately pair authors with their affiliations as per the original document, even if it looks so on the web view. In this case please request corrections to be made in Scopus' API here `here `_. The references of an article (useful to build citation networks) are only available if you downloaded the article with 'FULL' as `view` parameter. .. code-block:: python >>> ab.refcount 25 >>> refs = ab.references >>> refs[0] Reference(position='1', id='38949137710', doi='10.1007/978-94-007-7618-0˙310', title='Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses', authors='Falagas, M.E.; Pitsouni, E.I.; Malietzis, G.A.; Pappas, G.', authors_auid=None, authors_affiliationid=None, sourcetitle='FASEB J', publicationyear='2007', coverDate=None, volume=None, issue=None, first=None, last=None, citedbycount=None, type=None, text=None, fulltext='Falagas, M.E., Pitsouni, E.I., Malietzis, G.A., Pappas, G., Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. FASEB J 22:2 (2007), 338–342, 10.1007/978-94-007-7618-0˙310.') >>> df = pd.DataFrame(refs) >>> df.columns Index(['position', 'id', 'doi', 'title', 'authors', 'authors_auid', 'authors_affiliationid', 'sourcetitle', 'publicationyear', 'coverDate', 'volume', 'issue', 'first', 'last', 'citedbycount', 'type', 'fulltext'], dtype='object') >>> df['eid'] = '2-s2.0-' + df['id'] >>> df['eid'].tolist() ['2-s2.0-38949137710', '2-s2.0-84956635108', '2-s2.0-84954384742', '2-s2.0-85054706190', '2-s2.0-84978682989', '2-s2.0-85047117387', '2-s2.0-85068267813', '2-s2.0-84959420483', '2-s2.0-85041892797', '2-s2.0-85019268211', '2-s2.0-85059309053', '2-s2.0-85033499871', nan, '2-s2.0-85068268189', '2-s2.0-84958069531', '2-s2.0-84964429621', '2-s2.0-84977619412', '2-s2.0-85068262994', nan, '2-s2.0-23744500479', '2-s2.0-70349549313', nan, '2-s2.0-85042855814', '2-s2.0-85068258349', '2-s2.0-84887264733'] Using `view="REF"` accesses the REF view of the article, which provides more information on the referenced items (but less on other attributes of the document): .. code-block:: python >>> ab_ref = AbstractRetrieval("2-s2.0-85068268027", view='REF') >>> ab_ref.references[0] Reference(position='1', id='38949137710', doi='10.1096/fj.07-9492LSF', title='Comparison of PubMed, Scopus, Web of Science, and Google Scholar: Strengths and weaknesses', authors='Falagas, Matthew E.; Pitsouni, Eleni I.; Malietzis, George A.; Falagas, Matthew E.; Pappas, Georgios', authors_auid='7003962139; 16240046300; 43761284000; 7003962139; 7102070422', authors_affiliationid='60033272; 60033272; 60033272; 60015849; 60081865', sourcetitle='FASEB Journal', publicationyear=None, coverDate='2008-02-01', volume='22', issue='2', first='338', last='342', citedbycount='1676', type='resolvedReference', text=None, fulltext=None) The list of authors contains duplicate because of the 1:1 pairing with the authors' affiliation IDs. In above example, 7003962139 is affiliated with 60033272 and with 60015849. Authors are therefore grouped by affiliation ID. Scopus also gathers detailed information about conferences for conference proceedings, including: .. code-block:: python >>> cp = AbstractRetrieval("2-s2.0-0029486824", view="FULL") >>> cp.confname 'Proceedings of the 1995 34th IEEE Conference on Decision and Control. Part 1 (of 4)' >>> cp.confcode '44367' >>> cp.confdate ((1995, 12, 13), (1995, 12, 15)) >>> cp.conflocation 'New Orleans, LA, USA' >>> cp.confsponsor 'IEEE' Some articles have information on funding, chemicals and genome banks: .. code-block:: python >>> ab_fund = AbstractRetrieval("2-s2.0-85053478849", view="FULL") >>> ab_fund.funding [Funding(agency=None, string='CNRT “Nickel et son Environnement', agency_id=None, funding_id=None, acronym=None, country=None)] >> ab_fund.funding_text 'The authors gratefully acknowledge CNRT “Nickel et son Environnement” for providing the financial support. The results reported in this publication are gathered from the CNRT report “Ecomine BioTop”.' >>> ab_fund.chemicals [Chemical(source='esbd', chemical_name='calcium', cas_registry_number='7440-70-2;14092-94-5'), Chemical(source='esbd', chemical_name='magnesium', cas_registry_number='7439-95-4'), Chemical(source='nlm', chemical_name='Fertilizers', cas_registry_number=None), Chemical(source='nlm', chemical_name='Sewage', cas_registry_number=None), Chemical(source='nlm', chemical_name='Soil', cas_registry_number=None)] >>> ab_fund.sequencebank [Sequencebank(name='GENBANK', sequence_number='MH150839:MH150870', type='submitted')] You can print the abstract in a variety of formats, including LaTeX, bibtex, HTML, and RIS. For bibtex entries, the key is the first author's surname, the year, and the first and last name of the title: .. code-block:: python >>> print(ab.get_bibtex()) @article{Rose2019Pybliometrics:Scopus, author = {Michael E. Rose and John R. Kitchin}, title = {{pybliometrics: Scriptable bibliometrics using a Python interface to Scopus}}, journal = {SoftwareX}, year = {2019}, volume = {10}, number = {None}, pages = {-}, doi = {10.1016/j.softx.2019.100263}} >>> print(ab.get_ris()) TY - JOUR TI - pybliometrics: Scriptable bibliometrics using a Python interface to Scopus JO - SoftwareX VL - 10 DA - 2019-07-01 PY - 2019 SP - None AU - Rose M.E. AU - Kitchin J.R. DO - 10.1016/j.softx.2019.100263 UR - https://doi.org/10.1016/j.softx.2019.100263 ER - Downloaded results are cached to expedite subsequent analyses. This information may become outdated. To refresh the cached results if they exist, set `refresh=True`, or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set `refresh=100`. Use `ab.get_cache_file_mdate()` to obtain the date of last modification, and `ab.get_cache_file_age()` to determine the number of days since the last modification.