I am trying to write some regex pattern that will look through a sentence and remove any one or two sequentially repeated words
for example:
JavaScript
x
8
1
# R code below
2
string_a = "hello hello, how are you you?"
3
string_b = "goodbye world goodbye world, I am flying to the the moon!"
4
5
gsub(pattern, "", string_a)
6
gsub(pattern, "", string_b)
7
8
Desired outputs are
JavaScript
1
4
1
[1] "hello, how are you?"
2
[2] "goodbye world, I am flying to the moon!"
3
4
Advertisement
Answer
Try
JavaScript
1
2
1
gsub("(\S+(\s+\S+)?)\s+\1+", "\1", c(string_a, string_b))
2
-output
JavaScript
1
3
1
[1] "hello, how are you?"
2
[2] "goodbye world, I am flying to the moon!"
3