Skip to Main Content
Edward G. Miner Library

R Publication Statistics: Methods

Learn how the Rbibliometrix package can be used to explore the scholarly publishing landscape and make informed decisions to maximize the visibility and influence of your work.


Once the author names were gathered, they were looked up one by one in Scopus to acquire the author ID. Some researchers have multiple entries. After cross-referencing with institutional affiliation or publication titles, it is possible to submit a merge request to Scopus, or simply treat each unique ID as its own entry in the final search string.To create the search string, Excel has useful text join and concatenate functions to put it in the appropriate format: 

For Scopus, the search query format for author ID is : AU-ID ("Last name, First name" ###########).

Example search string:

With the search string, it can then be copy and pasted into Scopus as an advanced search. Selecting all documents found and exporting will download all of the fields selected. For this report, we selected all information.

When exporting there are several options for file types or to export straight to a reference manager or other platforms.

In this case, we used CSV and/or BibTex as the export format. While all formats contain the same basic information for each document, CSV, for example, retains and connects the cited references within each document whereas BibTex does not. There are also limitations on size as up to 20,000 records can be exported to CSV, whereas up to 2,000 can be exported to other file types.

Clean Up

Once exported, the data needs a slight bit of cleanup. Due to several factors such as authors having multiple author IDs in Scopus, it needs to be deduplicated. This can be done in R for BibTex files with the bibliometrix package or in Excel with Data Validation to remove duplicates based on title.

Additionally, there may be a few corrupted records that are missing necessary elements such as title or author that can be removed.


In addition to using R (version 4.3.1), RStudio and the Bibliometrix package, the following tools were used to clean up the data and visualizations:

  • Microsoft Excel: another means of deduplication and allows clean up of records that did not export from Scopus correctly
  • Image editor: to modify some elements of the visualizations from Biblioshiny without coding

To begin the process, the names of the researchers needs to be gathered. One way to do this is to grab their information from the department website with Python. Example code

Workflow Diagram


File format:
When using the Bibliometrix package for biblioshiny visualizations, the choice of file format can impact the completeness of bibliographic metadata. While the .bib (BibTeX) format is tailored for bibliographic information, it may be less versatile for handling large datasets, including numerical data, text, and other types of records. This limitation could lead to missing or incomplete data in the visualizations such as cited references, and number of cited references. If working with extensive or complex data, consider the nature of your dataset and the capabilities of the .bib format to ensure that it aligns with your analysis needs.