博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
C#正则删除HTML标签
阅读量:5061 次
发布时间:2019-06-12

本文共 1746 字,大约阅读时间需要 5 分钟。

using System;

using System.Data;
using System.Configuration;
using System.Collections;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using System.Text.RegularExpressions;

public partial class Ceshi : System.Web.UI.Page

{
    protected void Page_Load(object sender, EventArgs e)
    {
        if (!Page.IsPostBack)
        {
            string str = Regex.Replace("AAA\nBBB\nCCC<br>", "^", "开始=>", RegexOptions.Multiline | RegexOptions.IgnoreCase);//多行模式,每行前面加 '开始=>'
            Response.Write(str);
            string s = @"<html><title>title\\标题</title><head><script>alert('JS脚本');</script>head头部</head><body><table><tr><td><!--注释的东西-->TD的内容1</td><td>TD的内容2</td></table><div style='width:100px;'>DIV的内容</div><span>span内容1</spaN><spAN>span内容2</SPAN></body></html>";
            Response.Write(ClearHTMLTags(s));
        }
    }
    public static string ClearHTMLTags(string HTML)
    {
        string[] Regexs ={
                        @"<script[^>]*?>.*?</script>",
                        @"<(\/\s*)?!?((\w+:)?\w+)(\w+(\s*=?\s*(([""'])(\\[""'tbnr]|[^\7])*?\7|\w+)|.{0})|\s)*?(\/\s*)?>",
                        @"([\r\n])[\s]+",
                        @"&(quot|#34);",
                        @"&(amp|#38);",
                        @"&(lt|#60);",
                        @"&(gt|#62);",
                        @"&(nbsp|#160);",
                        @"&(iexcl|#161);",
                        @"&(cent|#162);",
                        @"&(pound|#163);",
                        @"&(copy|#169);",
                        @"&#(\d+);",
                        @"-->",
                        @"<!--.*\n"
        };

        string[] Replaces ={

                            "",
                            "",
                            "",
                            "\"",
                            "&",
                            "<",
                            ">",
                            " ",
                            "\xa1", //chr(161),
                            "\xa2", //chr(162),
                            "\xa3", //chr(163),
                            "\xa9", //chr(169),
                            "",
                            "\r\n",
                            ""
        };

        string s = HTML;

        for (int i = 0; i < Regexs.Length; i++)
        {
            s = new Regex(Regexs[i], RegexOptions.Multiline | RegexOptions.IgnoreCase).Replace(s, Replaces[i]);
        }
        s.Replace("<", "");
        s.Replace(">", "");
        s.Replace("\r\n", "");
        return s;
    }
}

转载于:https://www.cnblogs.com/wangchuang/archive/2012/05/23/2515278.html

你可能感兴趣的文章
第9课 uart
查看>>
Range和xrange的区别
查看>>
BZOJ 1010 [HNOI2008]玩具装箱 (斜率优化DP)
查看>>
java-动态规划算法学习笔记
查看>>
STL容器之vector
查看>>
Linux 内核中断内幕
查看>>
DNS负载均衡
查看>>
无法向会话状态服务器发出会话状态请求
查看>>
数据中心虚拟化技术
查看>>
Oracle OEM 配置报错: No value was set for the parameter DBCONTROL_HTTP_PORT 解决方法
查看>>
01入门
查看>>
python正则表达式
查看>>
嵌套循环连接(nested loops join)原理
查看>>
shell统计特征数量
查看>>
复习文件操作
查看>>
C#Hashtable与Dictionary性能
查看>>
10个让你忘记 Flash 的 HTML5 应用演示
查看>>
8个Python面试必考的题目,小编也被坑过 ToT
查看>>
SQL Server 使用作业设置定时任务之一(转载)
查看>>
centos 图形界面和命令行界面切换(转载)
查看>>