Word Embedding Research Framework for Psychological Science.
An integrative toolbox of word embedding research that provides:
⚠️ All users should update the package to version ≥ 0.3.2. Old versions may have slow processing speed and other problems.
Han-Wu-Shuang (Bruce) Bao 包寒吴霜
library(PsychWordVec)
for the
APA-7 format of your installed version.## Method 1: Install from CRAN
install.packages("PsychWordVec")
## Method 2: Install from GitHub
install.packages("devtools")
::install_github("psychbruce/PsychWordVec", force=TRUE) devtools
PsychWordVec
embed |
wordvec |
|
---|---|---|
Basic class | matrix | data.table |
Row size | vocabulary size | vocabulary size |
Column size | dimension size | 2 (variables: word , vec ) |
Advantage | faster (with matrix operation) | easier to inspect and manage |
Function to get | as_embed() |
as_wordvec() |
Function to load | load_embed() |
load_wordvec() |
PsychWordVec
as_embed()
: from wordvec
(data.table) to
embed
(matrix)as_wordvec()
: from embed
(matrix) to
wordvec
(data.table)load_embed()
: load word embeddings data as
embed
(matrix)load_wordvec()
: load word embeddings data as
wordvec
(data.table)data_transform()
: transform plain text word vectors to
wordvec
or embed
subset()
: extract a subset of wordvec
and
embed
normalize()
: normalize all word vectors to the unit
length 1get_wordvec()
: extract word vectorssum_wordvec()
: calculate the sum vector of multiple
wordsplot_wordvec()
: visualize word vectorsplot_wordvec_tSNE()
: 2D or 3D visualization with
t-SNEorth_procrustes()
: Orthogonal Procrustes matrix
alignmentcosine_similarity()
: cos_sim()
or
cos_dist()
pair_similarity()
: compute a similarity matrix of word
pairsplot_similarity()
: visualize similarities of word
pairstab_similarity()
: tabulate similarities of word
pairsmost_similar()
: find the Top-N most similar wordsplot_network()
: visualize a (partial correlation)
network graph of wordstest_WEAT()
: WEAT and SC-WEAT with permutation test of
significancetest_RND()
: RND with permutation test of
significancedict_expand()
: expand a dictionary from the most
similar wordsdict_reliability()
: reliability analysis and PCA of a
dictionarytokenize()
: tokenize raw texttrain_wordvec()
: train static word embeddingstext_init()
: set up a Python environment for PLMtext_model_download()
: download PLMs from Hugging Face to local “.cache”
foldertext_model_remove()
: remove PLMs from local “.cache”
foldertext_to_vec()
: extract contextualized token and text
embeddingstext_unmask()
: <deprecated> <please use FMAT> fill in the blank
mask(s) in a querySee the documentation (help pages) for their usage and details.