A lightweight, versatile NLP package for R, focused on search-centric workflows with minimal dependencies and easy data-frame integration. This package provides key functionalities for:
Web Search: Perform search engine queries to retrieve relevant URLs.
Web Scraping: Extract URL content, including some relevant metadata.
Text Processing & Chunking: Segment text into meaningful units, eg, sentences, paragraphs, and larger chunks. Designed to support tasks related to retrieval-augmented generation (RAG).
Corpus Search: Perform keyword, phrase, and pattern-based searches across processed corpora, supporting both traditional in-context search techniques (e.g., KWIC, regex matching) and advanced semantic searches using embeddings.
Embedding Generation: Generate embeddings using the HuggingFace API for enhanced semantic search.
Ideal for users who need a basic, unobtrusive NLP toolkit in R.
::install_github("jaytimm/textpress") devtools
<- 'AI and education'
sterm
<- textpress::web_search(search_term = sterm,
yresults search_engine = "Yahoo News",
num_pages = 5)
|> select(2) |> sample_n(5) |> knitr::kable() yresults
<- yresults$raw_url |>
arts ::web_scrape_urls(cores = 4) textpress
nlp_split_paragraphs()
<
nlp_split_sentences()
<
nlp_build_chunks()
<- arts |>
articles mutate(doc_id = row_number())|>
::nlp_split_paragraphs(paragraph_delim = "\\n+") |>
textpress::nlp_split_sentences(text_hierarchy = c('doc_id',
textpress'paragraph_id')) |>
::nlp_build_chunks(text_hierarchy = c('doc_id',
textpress'paragraph_id',
'sentence_id'),
chunk_size = 1,
context_size = 1) |>
mutate(id = paste(doc_id, paragraph_id, chunk_id, sep = '.'))
id | chunk | chunk_plus_context |
---|---|---|
1.1.1 | ‘TO AI OR NOT TO AI?’ | ‘TO AI OR NOT TO AI?’ This is one of the most pressing questions that today’s educators and higher education leaders face. |
1.1.2 | This is one of the most pressing questions that today’s educators and higher education leaders face. | ‘TO AI OR NOT TO AI?’ This is one of the most pressing questions that today’s educators and higher education leaders face. While there is no doubt that artificial intelligence (AI) will play an increasingly central role in people’s lives, many in the education sector remain skeptical — with some even deeming it a harbinger of educational doom. |
1.1.3 | While there is no doubt that artificial intelligence (AI) will play an increasingly central role in people’s lives, many in the education sector remain skeptical — with some even deeming it a harbinger of educational doom. | This is one of the most pressing questions that today’s educators and higher education leaders face. While there is no doubt that artificial intelligence (AI) will play an increasingly central role in people’s lives, many in the education sector remain skeptical — with some even deeming it a harbinger of educational doom. In a study conducted by global educational technology or edtech leader Anthology, 30% or three in every 10 university leaders in the Philippines see generative AI as unethical and should be banned from being used in educational settings. |
<- c('\\bhigher education\\b',
sterm2 '\\bsecondary education\\b')
# '\\S+ education\\b',
# '\\b\\w{4,}\\b education\\b')
<- articles |>
kwics rename(text = chunk) |>
::sem_search_corpus(search = sterm2,
textpresstext_hierarchy = c('doc_id',
'paragraph_id',
'chunk_id'))
|>
kwics mutate(id = paste(doc_id,
paragraph_id,
chunk_id, sep = '.')) |>
select(id, pattern, text) |>
sample_n(5) |> knitr::kable()
id | pattern | text |
---|---|---|
1.3.1 | higher education | The study conducted across 11 countries including the Philippines involved 5,000 higher education leaders and students. |
1.8.1 | higher education | AI is a game-changer in higher education, bridging gaps in accessibility and quality. |
1.2.3 | higher education | It revealed that university leaders have certain reservations around allowing AI in higher education, perceiving it as being unethical. |
9.2.1 | Higher Education | In 1998, noted technology critic and historian of automation David Noble published his influential article “Digital Diploma Mills: The Automation of Higher Education,” in which he warned about the negative impacts the internet would have on education. |
15.4.2 | secondary education | “It underscores the urgent need to address the looming AI knowledge gap in schools—for both students and teachers—to raise parental awareness and increase their involvement in AI conversations, and push for stronger AI integration in American primary and secondary education.” |
<- "https://api-inference.huggingface.co/models/BAAI/bge-base-en-v1.5"
api_url
<- articles |>
vstore rename(text = chunk) |>
::api_huggingface_embeddings(
textpresstext_hierarchy = c('doc_id',
'paragraph_id',
'chunk_id'),
verbose = F,
api_url = api_url,
dims = 768, #1024, 768, 384
api_token = api_token)
“How can AI personalize learning experiences for students?”
<- "How can AI personalize learning experiences for students?"
q
<- textpress::api_huggingface_embeddings(
query query = q,
api_url = api_url,
dims = 768,
api_token = api_token)
<- textpress::sem_nearest_neighbors(
rags x = query,
matrix = vstore,
n = 20) |>
left_join(articles, by = c("term2" = "id"))
id | cos_sim | chunk_plus_context |
---|---|---|
14.10.2 | 0.844 | 1. Personalized learning: AI can analyze data to understand each student’s learning style, strengths and areas for improvement. For example, an AI-driven platform could identify that a particular student struggles with reading comprehension and then provide tailored exercises that improve the student’s skills. |
7.8.2 | 0.836 | There’s a better way. This is where AI-assisted learning steps in to create personalized lesson plans. In our schools, we’ve transformed the traditional teacher’s role into that of a “guide.” |
22.2.2 | 0.835 | Artificial intelligence has permeated nearly every industry, and higher education is no exception. AI-powered solutions promise to revolutionize learning by providing personalized and adaptive experiences. |
22.7.5 | 0.810 | Consider how new tools integrate with existing platforms and map to the entire learner lifecycle. AI should simplify, not complicate, the student experience. With thoughtful implementation, these intelligent technologies can personalize learning and improve outcomes from start to finish. |
7.10.1 | 0.810 | AI is revolutionizing the role of teachers by excelling at delivering personalized learning experiences. These advanced AI programs can swiftly and accurately pinpoint what a student knows and doesn’t know in each subject, allowing lessons to be designed around their unique aptitudes without any judgment. |