pybliometrics.scopus.AbstractRetrieval¶
AbstractRetrieval() implements the Scopus Abstract Retrieval API.
It accepts any identifier as the main argument. Most commonly, this will be a Scopus EID, but DOI, Scopus ID (the last part of the EID), PubMed identifier or Publisher Item Identifier (PII) work as well. AbstractRetrieval tries to infer the class itself - to speed this up you can tell the ID type via ID_type.
The Abstract Retrieval API allows a differing information depth via views, some of which are restricted. The ‘META_ABS’ view is the most comprehensive among unrestricted views, encompassing all information from other unrestricted views. It is therefore the default view. The view with the most information content is ‘FULL’, which includes all information available with ‘META_ABS’, but is restricted. Generally, you should always try to use view=’FULL’ when downloading an abstract and fall back to the default otherwise.
Documentation¶
- class pybliometrics.scopus.AbstractRetrieval(identifier=None, refresh=False, view='META_ABS', id_type=None, **kwds)[source]¶
Interaction with the Abstract Retrieval API.
- Parameters:
identifier (
Union
[int
,str
], optional) – The identifier of a document. Can be the Scopus EID , the Scopus ID, the PII, the Pubmed-ID or the DOI.Default:None
refresh (
Union
[bool
,int
], optional) – Whether to refresh the cached file if it exists or not. If int is passed, cached file will be refreshed if the number of days since last modification exceeds that value.Default:False
id_type (
str
, optional) – The type of used ID. Allowed values: None, ‘eid’, ‘pii’, ‘scopus_id’, ‘pubmed_id’, ‘doi’. If the value is None, the function tries to infer the ID type itself.Default:None
view (
str
, optional) – The view of the file that should be downloaded. Allowed values: META, META_ABS, REF, FULL, where FULL includes all information of META_ABS view and META_ABS includes all information of the META view. For details see https://dev.elsevier.com/sc_abstract_retrieval_views.html.Default:'META_ABS'
kwds (
str
) – Keywords passed on as query parameters. Must contain fields and values listed in the API specification at https://dev.elsevier.com/documentation/AbstractRetrievalAPI.wadl.- Raises:
ValueError – If any of the parameters id_type, refresh or view is not one of the allowed values.
Notes
The directory for cached results is {path}/{view}/{identifier}, where path is specified in your configuration file. In case identifier is a DOI, an underscore replaces the forward slash.
- property abstract: str | None¶
The abstract of a document. Note: If this is empty, try description property instead.
- property affiliation: List[NamedTuple] | None¶
A list of namedtuples representing listed affiliations in the form (id, name, city, country).
- property aggregationType: str¶
Aggregation type of source the document is published in.
- property authkeywords: List[str] | None¶
List of author-provided keywords of the document.
- property authorgroup: List[NamedTuple] | None¶
A list of namedtuples representing the article’s authors organized by affiliation, in the form (affiliation_id, dptid, organization, city, postalcode, addresspart, country, collaboration, auid, orcid, indexed_name, surname, given_name). If given_name is not present, fall back to initials. Note: Affiliation information might be missing or mal-assigned even when it looks correct in the web view. In this case please request a correction. It is generally missing for collaborations.
- property authors: List[NamedTuple] | None¶
A list of namedtuples representing the article’s authors, in the form (auid, indexed_name, surname, given_name, affiliation). In case multiple affiliation IDs are given, they are joined on “;”. Note: The affiliation referred to here is what Scopus’ algorithm determined as the main affiliation. Property authorgroup provides all affiliations.
- property citedby_count: int | None¶
Number of articles citing the document.
- property citedby_link: str¶
URL to Scopus page listing citing documents.
- property chemicals: List[NamedTuple] | None¶
List of namedtuples representing chemical entities in the form (source, chemical_name, cas_registry_number). In case multiple numbers given, they are joined on “;”.
- property confcode: int | None¶
Code of the conference the document belongs to.
- property confdate: Tuple[Tuple[int, int], Tuple[int, int]] | None¶
Date range of the conference the document belongs to represented by two tuples in the form (YYYY, MM, DD).
- property conflocation: str | None¶
Location of the conference the document belongs to.
- property confname: str | None¶
Name of the conference the document belongs to.
- property confsponsor: List[str] | str | None¶
Sponsor(s) of the conference the document belongs to.
- property contributor_group: List[NamedTuple] | None¶
List of namedtuples representing contributors compiled by Scopus, in the form (given_name, initials, surname, indexed_name, role).
- property copyright: str¶
The copyright statement of the document.
- property copyright_type: str¶
The copyright holder of the document.
- property correspondence: List[NamedTuple] | None¶
List of namedtuples representing the authors to whom correspondence should be addressed, in the form ´(surname, initials, organization, country, city_group)´. Multiple organziations are joined on semicolon.
- property coverDate: str¶
The date of the cover the document is in.
- property date_created: Tuple[int, int, int] | None¶
Return the date_created of a record.
- property description: str | None¶
Return the description of a record. Note: If this is empty, try abstract property instead.
- property doi: str | None¶
DOI of the document.
- property eid: str¶
EID of the document.
- property endingPage: str | None¶
Ending page. If this is empty, try pageRange property instead.
- property funding: List[NamedTuple] | None¶
List of namedtuples parsed funding information in the form (agency, agency_id, string, funding_id, acronym, country).
- property funding_text: str | None¶
The raw text from which Scopus derives funding information.
- property isbn: Tuple[str, ...] | None¶
ISBNs Optional[str] to publicationName as tuple of variying length, (e.g. ISBN-10 or ISBN-13).
- property issn: NamedTuple | None¶
Namedtuple in the form (print electronic). Note: If the source has an E-ISSN, the META view will return None. Use FULL view instead.
- property identifier: int¶
ID of the document (same as EID without “2-s2.0-“).
- property idxterms: List[str] | None¶
List of index terms (these are just one category of those Scopus provides in the web version) .
- property issueIdentifier: str | None¶
Number of the issue the document was published in.
- property issuetitle: str | None¶
Title of the issue the document was published in.
- property language: str | None¶
Language of the article.
- property openaccess: int | None¶
The openaccess status encoded in single digits.
- property openaccessFlag: bool | None¶
Whether the document is available via open access or not.
- property pageRange: str | None¶
Page range. If this is empty, try startingPage and endingPage properties instead.
- property pii: str | None¶
The PII (Publisher Item Identifier) of the document.
- property publicationName: str | None¶
Name of source the document is published in.
- property publisher: str | None¶
Name of the publisher of the document. Note: Information provided in the FULL view of the article might be more complete.
- property publisheraddress: str | None¶
Name of the publisher of the document.
- property pubmed_id: int | None¶
The PubMed ID of the document.
- property refcount: int | None¶
Number of references of an article. Note: Requires either the FULL view or REF view.
- property references: List[NamedTuple] | None¶
List of namedtuples representing references listed in the document, in the form (position, id, doi, title, authors, authors_auid, authors_affiliationid, sourcetitle, publicationyear, coverDate, volume, issue, first, last, citedbycount, type, text, fulltext).
position is the number at which the reference appears in the document, id is the Scopus ID of the referenced document (EID without the “2-s2.0-“), authors is a string of the names of the authors in the format “Surname1, Initials1; Surname2, Initials2”, authors_auid is a string of the author IDs joined on “; “, authors_affiliationid is a string of the authors’ affiliation IDs joined on “; “, sourcetitle is the name of the source (e.g. the journal), publicationyear is the year of the publication as string (FULL view only), coverDate is the date of the publication as string (REF view only), volume and issue, are strings referring to the volume and issue, first and last refer to the page range, citedbycount the total number of citations of the cited item (REF view only), type describes the parsing status of the reference (resolved or not), text is information on the publication, fulltext is the text the authors used for the reference.
Note: Requires either the FULL view or REF view. Might be empty even if refcount is positive. Specific fields can be empty. The lists authors and authors_auid may contain duplicates because of the 1:1 pairing with the list authors_affiliationid.
- property scopus_link: str¶
URL to the document page on Scopus.
- property self_link: str¶
URL to Scopus API page of this document.
- property sequencebank: List[NamedTuple] | None¶
List of namedtuples representing biological entities defined or mentioned in the text, in the form (name, sequence_number, type).
- property source_id: int | None¶
Scopus source ID of the document.
- property sourcetitle_abbreviation: str | None¶
Abbreviation of the source the document is published in. Note: Requires the FULL view of the article.
- property srctype: str | None¶
Aggregation type of source the document is published in (short version of aggregationType).
- property startingPage: str | None¶
Starting page. If this is empty, try pageRange property instead.
- property subject_areas: List[NamedTuple] | None¶
List of namedtuples containing subject areas of the article in the form (area abbreviation code). Note: Requires the FULL view of the article.
- property subtype: str¶
Type of the document. Refer to the Scopus Content Coverage Guide for a list of possible values. Short version of subtypedescription.
- property subtypedescription: str¶
Type of the document. Refer to the Scopus Content Coverage Guide for a list of possible values. Long version of subtype.
- property title: str | None¶
Title of the document.
- property url: str | None¶
URL to the API view of the document.
- property volume: str | None¶
Volume for the document.
- property website: str¶
Website of publisher.
- get_bibtex()[source]¶
Bibliographic entry in BibTeX format.
- Raises:
ValueError – If the item’s aggregationType is not Journal.
- Return type:
str
- get_ris()[source]¶
Bibliographic entry in RIS (Research Information System Format) format for journal articles.
- Raises:
ValueError – If the item’s aggregationType is not Journal.
- Return type:
str
- get_cache_file_age()¶
Return the age of the cached file in days.
- Return type:
int
- get_cache_file_mdate()¶
Return the modification date of the cached file.
- Return type:
str
- get_key_remaining_quota()¶
Return number of remaining requests for the current key and the current API (relative on last actual request).
- Return type:
str | None
- get_key_reset_time()¶
Return time when current key is reset (relative on last actual request).
- Return type:
str | None
Examples¶
You initialize the class with an ID that Scopus uses, e.g. the EID:
>>> from pybliometrics.scopus import AbstractRetrieval >>> ab = AbstractRetrieval("2-s2.0-85068268027", view='FULL')
You can obtain basic information just by printing the object:
>>> print(ab) Michael E. Rose and John R. Kitchin: "pybliometrics: Scriptable bibliometrics using a Python interface to Scopus", SoftwareX, 10, (no pages found)(2019). https://doi.org/10.1016/j.softx.2019.100263. 34 citation(s) as of 2022-04-07 Affiliation(s): Max Planck Institute for Innovation and Competition Carnegie Mellon University
There are 52 attributes and 8 methods to interact with. For example, to obtain bibliographic information:
>>> ab.publicationName 'SoftwareX' >>> ab.aggregationType 'Journal' >>> ab.coverDate '2019-07-01' >>> ab.volume '10' >>> ab.issueIdentifier None >>> ab.pageRange None >>> ab.doi '10.1016/j.softx.2019.100263' >>> ab.openaccessFlag True
The attributes idxterms, subject_areas and authkeywords (if provided) offer insights into the document’s content:
>>> ab.idxterms ['Bibliometrics', 'Python', 'Python interfaces', 'Reproducibilities', 'Scientometrics', 'Scopus', 'Scopus database', 'User friendly interface'] >>> ab.subject_areas [Area(area='Software', abbreviation='COMP', code=1712), Area(area='Computer Science Applications', abbreviation='COMP', code=1706)] >>> ab.authkeywords ['Bibliometrics', 'Python', 'Scientometrics', 'Scopus', 'Software']
To obtain the total citation count (at the time the abstract was retrieved and cached):
>>> ab.citedby_count 34
You can retrieve the authors as a list of namedtuples, which pair conveniently with pandas:
>>> ab.authors [Author(auid=57209617104, indexed_name='Rose M.E.', surname='Rose', given_name='Michael E.', affiliation='60105007'), Author(auid=7004212771, indexed_name='Kitchin J.R.', surname='Kitchin', given_name='John R.', affiliation='60027950')] >>> import pandas as pd >>> print(pd.DataFrame(ab.authors)) auid indexed_name surname given_name affiliation 0 57209617104 Rose M.E. Rose Michael E. 60105007 1 7004212771 Kitchin J.R. Kitchin John R. 60027950
The same structure applies for the attributes affiliation and authorgroup:
>>> ab.affiliation [Affiliation(id=60105007, name='Max Planck Institute for Innovation and Competition', city='Munich', country='Germany'), Affiliation(id=60027950, name='Carnegie Mellon University', city='Pittsburgh', country='United States')] >>> ab.authorgroup [Author(affiliation_id=60105007, dptid=None, organization='Max Planck Institute for Innovation and Competition', city=None, postalcode=None, addresspart=None, country='Germany', collaboration=None, auid=57209617104, orcid=None, indexed_name='Rose M.E.', surname='Rose', given_name='Michael E.'), Author(affiliation_id=60027950, dptid=110785688, organization='Carnegie Mellon University, Department of Chemical Engineering', city=None, postalcode=None, addresspart=None, country='United States', collaboration=None, auid=7004212771, orcid=None, indexed_name='Kitchin J.R.', surname='Kitchin', given_name='John R.')]
Note that Scopus may not always accurately pair authors with their affiliations as per the original document, even if it looks so on the web view. In this case please request corrections to be made in Scopus’ API here here.
The references of an article (useful to build citation networks) are only available if you downloaded the article with ‘FULL’ as view parameter.
>>> ab.refcount 25 >>> refs = ab.references >>> refs[0] Reference(position='1', id='38949137710', doi='10.1007/978-94-007-7618-0˙310', title='Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses', authors='Falagas, M.E.; Pitsouni, E.I.; Malietzis, G.A.; Pappas, G.', authors_auid=None, authors_affiliationid=None, sourcetitle='FASEB J', publicationyear='2007', coverDate=None, volume=None, issue=None, first=None, last=None, citedbycount=None, type=None, text=None, fulltext='Falagas, M.E., Pitsouni, E.I., Malietzis, G.A., Pappas, G., Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. FASEB J 22:2 (2007), 338–342, 10.1007/978-94-007-7618-0˙310.') >>> df = pd.DataFrame(refs) >>> df.columns Index(['position', 'id', 'doi', 'title', 'authors', 'authors_auid', 'authors_affiliationid', 'sourcetitle', 'publicationyear', 'coverDate', 'volume', 'issue', 'first', 'last', 'citedbycount', 'type', 'fulltext'], dtype='object') >>> df['eid'] = '2-s2.0-' + df['id'] >>> df['eid'].tolist() ['2-s2.0-38949137710', '2-s2.0-84956635108', '2-s2.0-84954384742', '2-s2.0-85054706190', '2-s2.0-84978682989', '2-s2.0-85047117387', '2-s2.0-85068267813', '2-s2.0-84959420483', '2-s2.0-85041892797', '2-s2.0-85019268211', '2-s2.0-85059309053', '2-s2.0-85033499871', nan, '2-s2.0-85068268189', '2-s2.0-84958069531', '2-s2.0-84964429621', '2-s2.0-84977619412', '2-s2.0-85068262994', nan, '2-s2.0-23744500479', '2-s2.0-70349549313', nan, '2-s2.0-85042855814', '2-s2.0-85068258349', '2-s2.0-84887264733']
Using view=”REF” accesses the REF view of the article, which provides more information on the referenced items (but less on other attributes of the document):
>>> ab_ref = AbstractRetrieval("2-s2.0-85068268027", view='REF') >>> ab_ref.references[0] Reference(position='1', id='38949137710', doi='10.1096/fj.07-9492LSF', title='Comparison of PubMed, Scopus, Web of Science, and Google Scholar: Strengths and weaknesses', authors='Falagas, Matthew E.; Pitsouni, Eleni I.; Malietzis, George A.; Falagas, Matthew E.; Pappas, Georgios', authors_auid='7003962139; 16240046300; 43761284000; 7003962139; 7102070422', authors_affiliationid='60033272; 60033272; 60033272; 60015849; 60081865', sourcetitle='FASEB Journal', publicationyear=None, coverDate='2008-02-01', volume='22', issue='2', first='338', last='342', citedbycount='1676', type='resolvedReference', text=None, fulltext=None)
The list of authors contains duplicate because of the 1:1 pairing with the authors’ affiliation IDs. In above example, 7003962139 is affiliated with 60033272 and with 60015849. Authors are therefore grouped by affiliation ID.
Scopus also gathers detailed information about conferences for conference proceedings, including:
>>> cp = AbstractRetrieval("2-s2.0-0029486824", view="FULL") >>> cp.confname 'Proceedings of the 1995 34th IEEE Conference on Decision and Control. Part 1 (of 4)' >>> cp.confcode '44367' >>> cp.confdate ((1995, 12, 13), (1995, 12, 15)) >>> cp.conflocation 'New Orleans, LA, USA' >>> cp.confsponsor 'IEEE'
Some articles have information on funding, chemicals and genome banks:
>>> ab_fund = AbstractRetrieval("2-s2.0-85053478849", view="FULL") >>> ab_fund.funding [Funding(agency=None, string='CNRT “Nickel et son Environnement', agency_id=None, funding_id=None, acronym=None, country=None)] >> ab_fund.funding_text 'The authors gratefully acknowledge CNRT “Nickel et son Environnement” for providing the financial support. The results reported in this publication are gathered from the CNRT report “Ecomine BioTop”.' >>> ab_fund.chemicals [Chemical(source='esbd', chemical_name='calcium', cas_registry_number='7440-70-2;14092-94-5'), Chemical(source='esbd', chemical_name='magnesium', cas_registry_number='7439-95-4'), Chemical(source='nlm', chemical_name='Fertilizers', cas_registry_number=None), Chemical(source='nlm', chemical_name='Sewage', cas_registry_number=None), Chemical(source='nlm', chemical_name='Soil', cas_registry_number=None)] >>> ab_fund.sequencebank [Sequencebank(name='GENBANK', sequence_number='MH150839:MH150870', type='submitted')]
You can print the abstract in a variety of formats, including LaTeX, bibtex, HTML, and RIS. For bibtex entries, the key is the first author’s surname, the year, and the first and last name of the title:
>>> print(ab.get_bibtex()) @article{Rose2019Pybliometrics:Scopus, author = {Michael E. Rose and John R. Kitchin}, title = {{pybliometrics: Scriptable bibliometrics using a Python interface to Scopus}}, journal = {SoftwareX}, year = {2019}, volume = {10}, number = {None}, pages = {-}, doi = {10.1016/j.softx.2019.100263}} >>> print(ab.get_ris()) TY - JOUR TI - pybliometrics: Scriptable bibliometrics using a Python interface to Scopus JO - SoftwareX VL - 10 DA - 2019-07-01 PY - 2019 SP - None AU - Rose M.E. AU - Kitchin J.R. DO - 10.1016/j.softx.2019.100263 UR - https://doi.org/10.1016/j.softx.2019.100263 ER -
Downloaded results are cached to expedite subsequent analyses. This information may become outdated. To refresh the cached results if they exist, set refresh=True, or provide an integer that will be interpreted as maximum allowed number of days since the last modification date. For example, if you want to refresh all cached results older than 100 days, set refresh=100. Use ab.get_cache_file_mdate() to obtain the date of last modification, and ab.get_cache_file_age() to determine the number of days since the last modification.