Monday, February 21, 2011

Remove HTML Tags in Javascript with Regex

I am trying to remove all the html tags out of a string in Javascript. Heres what I have... I can't figure out why its not working....any know what I am doing wrong?

<script type="text/javascript">

var regex = "/<(.|\n)*?>/";
var body = "<p>test</p>";
var result = body.replace(regex, "");
alert(result);

</script>

Thanks a lot!

From stackoverflow
  • Try this, noting that the grammar of HTML is too complex for regular expressions to be correct 100% of the time:

    var regex = /(<([^>]+)>)/ig;
    var body = "<p>test</p>";
    var result = body.replace(regex, "");
    alert(result);
    

    If you're willing to use a library such as jQuery, you could simply do this:

    alert($('<p>test</p>').text());
    
    gmcalab : AWESOME, I didnt think about the jQuery option. That is way preferred! Thanks so much!
    brianary : Why are you wrapping the regex in a string? var regex = /(<([^>]+)>)/ig;
    karim79 : @brianary - because I'm an idiot. Corrected.
    Mike Samuel : This won't work. Specifically, it will fail on short tags: http://www.is-thought.co.uk/book/sgml-9.htm#SHORTTAG
  • For a proper HTML sanitizer in JS, see http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer

0 comments:

Post a Comment