Skip to content

Removing all script tags from html with JS Regular Expression

I want to strip script tags out of this HTML at Pastebin:

http://pastebin.com/mdxygM0a

I tried using the below regular expression:

html.replace(/<script.*>.*</script>/ims, " ")

But it does not remove all of the script tags in the HTML. It only removes in-line scripts. I’m looking for some regex that can remove all of the script tags (in-line and multi-line). It would be highly appreciated if a test is carried out on my sample http://pastebin.com/mdxygM0a

Answer

Attempting to remove HTML markup using a regular expression is problematic. You don’t know what’s in there as script or attribute values. One way is to insert it as the innerHTML of a div, remove any script elements and return the innerHTML, e.g.

  function stripScripts(s) {
    var div = document.createElement('div');
    div.innerHTML = s;
    var scripts = div.getElementsByTagName('script');
    var i = scripts.length;
    while (i--) {
      scripts[i].parentNode.removeChild(scripts[i]);
    }
    return div.innerHTML;
  }

alert(
 stripScripts('<span><script type="text/javascript">alert('foo');</script></span>')
);

Note that at present, browsers will not execute the script if inserted using the innerHTML property, and likely never will especially as the element is not added to the document.