Author: Fabio Ashtar Telarico, University of Ljubljana, FDV
stopwords
is an R package originally developed by Kohei Watanabe of the Waseda
Institute for Advanced Study (check out his publications here)
that provides easy access to stopwords in more than 50 languages in the
Stopwords ISO library.
The package has not been updated since Dec 22, 2017 and was not
installable anymore from GitHub
. So, this reboot happened
to grant continuity to the project.
install.packages('morestopwords')
if(requireNamespace('remotes'))
remotes::install_github('fatelarico/morestopwords')
The code base has changed since version 0.1.0 (the last maintained by
Dr. Watanabe). Now, the function stopwords::stopwords()
supports not only two-letter ISO codes, but also three-letter ones.
Moreover, it can identify languages by their ISO name (e.g., German, not
Deutsch; Swedish, not Sverige, etc.).
The package stopwords
is also based on Watanabe’s archived GitHub repository. Thus, it is the
most similar to morestopwords
, too. However, these two
packages are differentiated by both design choices and features:
morestopwords
has got no dependencies and integrates
with the package cld2
.morestopwords
can (if cld2
is installed)
identify the language of one (or more) string(s) automaticallymorestopwords
can remove stop words from one or more
strings either in conjuction with language detection or
independently.morestopwords
does not allow the user to choose a list
of stop words to use. Rather, it tries to provide the most comprehensive
list in an intuitive way.morestopwords
’s lists include more stop words than any
single list included in stopwords
.