Tips

Software support

  • We announce major and minor package updates via twitter. Follow the hashtag #pybliometrics to be the first to learn of software updates.

  • The best place to ask questions on how to do things is StackOverflow. Simply tag your question with pybliometrics. There’s already a bunch of questions (with answers) out there!

  • In the GitHub repository, you can create issues and report bugs, requests features, or, even better, submit your own pull requests. Read more about contributing here.

Configuration

pybliometrics allows you to set some parameters yourself. See Configuration.

Database updates

Scopus is a living database with changes happening constantly. These are not just additions of new items (Articles, Books, …) as they are published or updated citation counts, but also backfills of existing sources and corrections. Corrections include changes of titles, names or abstracts, mergers of duplicate authors, affiliations or even research items. Mergers affect multiple entities at once: for instance, author mergers impact both the authors’ profiles and the associated articles.

  • When author profiles are merged, the old profile(s) forward to the new one for about 6 months. When you instantiate the AuthorRetrieval() class with a merged profile using and then access the .identifier property, pybliometrics will raise a warning pointing to the ID of the new main profile.

  • Keep your cached files updated. The refresh parameter, which is implemented in all classes, helps you doing so. Specifying a maximum age in number of days when making calls (e.g., AuthorRetrieval(…, refresh=20)), your local cache will always be at most that old.

  • Implement cross-checks, for example to verify that an abstract is also listed as publication in the author profile.

Corrections in the Scopus database can be reported here.

Affiliations

Scopus knows two types of affiliations: Org profiles and Non-Org profiles.

Org profiles are those entities, that perform or sponsor research, such as a university, research institute, or government organization, which leads to the origination of documents by its members. Affiliations that are org profiles (OrgID) according to Scopus start with a 6 (6XXXXXXX). Scopus strives to have precise information about the institution, such as type and address.

Non-Org profiles correspond to automatically clustered profiles. In theory, Non-Org profiles should correspond to research networks and virtual institutes, as they neither have a type nor an address. Affiliations that are Non-Org profiles start with a 1 (1XXXXXXXX). Often these are duplicates of Org profiles, which should be requested to be merged here.

Migration Guide from scopus to pybliometrics

In June 2019 we renamed the package from scopus to pybliometrics. This way we comply with naming rules for Elsevier’s trademark “Scopus”. At the same time we open the package for further development.

Migration is easy:

  1. Install pybliometrics

  2. Uninstall scopus

  3. In your scripts, simply change the import statement: “from pybliometrics.scopus import …” instead of “from scopus import …”

Migration Guide from 0.x to 1.x

The upgrade from scopus (now pybliometrics) 0.x to 1.x saw many changes in pybliometrics’ internal architecture, but also in four classes (see change log): ScopusAbstract(), ScopusAffiliation(), ScopusAuthor() and ScopusSearch().

To avoid too many issues resulting from missing backward-compatibility, new classes were introduced to gradually replace other ones: AbstractRetrieval() (replacing ScopusAbstract()), AuthorRetrieval() (replacing ScopusAuthor()) and ContentAffiliationRetrieval() (replacing ScopusAffiliation()). The corresponding old classes will remained until pybliometrics 2.x but their maintenance has been suspended immediately. Cached files that were downloaded with the old classes are not usable by the new classes.

ScopusSearch() had to be revamped completely; code that uses ScopusSearch() has to be updated, but not significantly.

Guiding principles

The change to scopus 1.x was guided by five principles: 1. Use json rather than xml for the cached files to reduce overhead and lower maintenance efforts 2. Align class names, script names, attribution names and names of folders with the names the Scopus API uses 3. Use properties to return a high share of information provided by Scopus, and get functions to increase user experience 4. Allow users to set and change configuration via a configuration file 5. Return namedtuples when Scopus provides combined information to increase interoperability with other python modules

How to update code

Class AbstractRetrieval() replaces ScopusAbstract(). This class has seen the most changes. The following attributes have been renamed but their return value stays the same (so that simply renaming it will suffice): citationLanguage becomes language, citationType becomes srctype, citingby_url becomes citingby_link, scopus_url becomes scopus_link. There are some attributes which are now properties: bibtex becomes get_bibtex(), html becomes get_html(), ris becomes get_ris() and latex becomes get_latex(). Properties affiliations (new: affiliation), subjectAreas (new: subject_areas), authkeywords and authors are entirely different now: They return namedtuples. Please see the examples for how to use them. Property nauthors has been removed; use len(AbstractRetrieval(<eid>).authors instead. Finally, method get_corresponding_author_info() has been removed, as Scopus does not prodive this information any more.

Class AuthorRetrieval() replaces ScopusAuthor(). The following properties have been renamed but their value stays the same: author_id becomes identifier, coauthor_url becomes coauthor_link, firstname becomes given_name, hindex becomes h_index, lastname becomes surname, name becomes indexed_name, ncited_by becomes cited_by_count, ncoauthors becomes coauthor_count, ndocuments becomes document_count. Property current_affiliation has been renamed to affiliation_current but the return value is now the Scopus ID of the affiliation. Property publication_history has been renamed to journal_history and returns a list of namedtuples rather than a list of tuples. Property affiliation_history now returns a list of Scopus IDs instead of a list of ScopusAffiliation() objects. Property subject_areas now returns a list of namedtuples instead of a list of tuples.

Class ContentAffiliationRetrieval() replaces ScopusAffiliation. It will suffice to replace the class name in your scripts and rename the following attributes: nauthors becomes author_count, ndocuments becomes document_count, name becomes affiliation_name, org_url becomes org_URL, api_url becomes self_link, scopus_id becomes identifier.

Class ScopusSearch() remains but was revamped. The search results are now cached under a hex-ed filename to allow for complex queries. Files are now saved in a different folder (by default). results is now the main property, returning a list of namedtuples containing all useful information regarding the search results. For convenience, get_eids() returns just the list of EIDs of the articles, and property EIDS, which will be removed in a future release, returns just this list.