Skip to content
Advertisement

Extract URLs from paragraph or block of text using a regular expression

I have a text and a script

var x = "This is an example url http://www.longurl.com/?a=example@gmail.com&x=y1 and this must me a example url";

function getMatch(str) {
  var urlRegex = '(?!mailto:)(?:(?:http|https|ftp)://)(?:\S+(?::\S*)?@)?(?:(?:(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[0-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))|localhost)(?::\d{2,5})?(?:(/|\?|#)[^\s]*)?';
  var reg = new RegExp(urlRegex, 'ig');
  return str.match(reg);
}

console.log(getMatch(x));

The expected outcome must be

[ http://www.longurl.com/?a=example@gmail.com&x=y1 ] 

but the below getMatch is giving me incorrectly (skipping &x=y1)

http://www.longurl.com/?a=example@gmail.com

How do I modify the function to return a complete URL

NOTE: This happens only when the email arguments are passed after it finds @ character, the function behaves weird.

Advertisement

Answer

Why not simplify:

var x = `This is an example url http://www.longurl.com/?a=example@gmail.com&x=y1 and this must me a example url

http://            www.longurl.com/?a=example@gmail.com&x=y1 (with an arbitrary number of spaces between the protocol and the beginning of the url) 
here is a mailto:a@b.c?subject=aaa%20bbb and some more text
So https://www.google.com/search?q=bla or ftp://aaa:bbb@server.com could appear`

function getMatch(str) {
  var urlRegex = /((mailto:|ftp://|https?://)S+?)[^s]+/ig;
  return str.match(urlRegex);
}

console.log(getMatch(x));
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement