Skip to content
Advertisement

Removing all script tags from html with JS Regular Expression

I want to strip script tags out of this HTML at Pastebin:

http://pastebin.com/mdxygM0a

I tried using the below regular expression:

html.replace(/<script.*>.*</script>/ims, " ")

But it does not remove all of the script tags in the HTML. It only removes in-line scripts. I’m looking for some regex that can remove all of the script tags (in-line and multi-line). It would be highly appreciated if a test is carried out on my sample http://pastebin.com/mdxygM0a

Advertisement

Answer

Attempting to remove HTML markup using a regular expression is problematic. You don’t know what’s in there as script or attribute values. One way is to insert it as the innerHTML of a div, remove any script elements and return the innerHTML, e.g.

  function stripScripts(s) {
    var div = document.createElement('div');
    div.innerHTML = s;
    var scripts = div.getElementsByTagName('script');
    var i = scripts.length;
    while (i--) {
      scripts[i].parentNode.removeChild(scripts[i]);
    }
    return div.innerHTML;
  }

alert(
 stripScripts('<span><script type="text/javascript">alert('foo');</script></span>')
);

Note that at present, browsers will not execute the script if inserted using the innerHTML property, and likely never will especially as the element is not added to the document.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement