Below code convert html code into text form and remove all tags related to html, script and style etc. Code remove all tags with the help of regular expressions so you don't need any parser for this, and its more efficient then any parser.
public static string ConvertHTMLtoText(string input)
{
input = Regex.Replace(input, @"<script.*?>[\s\S]*?</.*?script>", "");
input = Regex.Replace(input, @"<style.*?>[\s\S]*?</.*?style>", "");
input = Regex.Replace(input, @"(<|</)( )*\w*>", "");
return input = Regex.Replace(input, @"<( )*([^>])*>", "");
}
thanks, its really great contribution
ReplyDeleteits efficient solution
ReplyDelete