regex for repeating word to repeating two words

I am trying to write some regex pattern that will look through a sentence and remove any one or two sequentially repeated words

for example:

# R code below
string_a = "hello hello, how are you you?"
string_b = "goodbye world goodbye world, I am flying to the the moon!"

gsub(pattern, "", string_a)
gsub(pattern, "", string_b)

JavaScript
​x
 
# R code below
string_a = "hello hello, how are you you?"
string_b = "goodbye world goodbye world, I am flying to the the moon!"
​
gsub(pattern, "", string_a)
gsub(pattern, "", string_b)
​
​

Desired outputs are

[1] "hello, how are you?"
[2] "goodbye world, I am flying to the moon!"

JavaScript
 
[1] "hello, how are you?"
[2] "goodbye world, I am flying to the moon!"
​
​

Answer

Try

 gsub("(\S+(\s+\S+)?)\s+\1+", "\1", c(string_a, string_b))

JavaScript
 
 gsub("(\S+(\s+\S+)?)\s+\1+", "\1", c(string_a, string_b))
​

-output

[1] "hello, how are you?"                  
[2] "goodbye world, I am flying to the moon!"

JavaScript
 
[1] "hello, how are you?"                  
[2] "goodbye world, I am flying to the moon!"
​

Advertisement

Answer