正則表示式去html標籤

阿新 • • 發佈：2019-02-07

寫這個是因為後端要與app介面通訊，在推送訊息時，後臺是富文字編輯，

會有一些html標籤資訊，在展示推送內容時不是很友好，所以要去掉富文字資訊。

在網上也找了一些方法，但是效果不是很好，下面這個可以滿足基本需求，有個別案例，可以自由新增規則。

//引用正則

import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**

* @author Michael

*/

public class HtmlCleanUtil {
   private static final String regEx_script = "<script[^>]*?>[\\s\\S]*?<\\/script>"; // 定義script的正則表示式
        private static final String regEx_style = "<style[^>]*?>[\\s\\S]*?<\\/style>"; // 定義style的正則表示式
        private static final String regEx_html = "<[^>]+>"; // 定義HTML標籤的正則表示式
        private static final String regEx_space = "\\s*|\t|\r|\n";//定義空格回車換行符

        /**
         * @param htmlStr
         * @return
         * 刪除Html標籤
         */
        public static String delHTMLTag(String htmlStr) {
            Pattern p_script = Pattern.compile(regEx_script, Pattern.CASE_INSENSITIVE);
            Matcher m_script = p_script.matcher(htmlStr);
            htmlStr = m_script.replaceAll(""); // 過濾script標籤

            Pattern p_style = Pattern.compile(regEx_style, Pattern.CASE_INSENSITIVE);
            Matcher m_style = p_style.matcher(htmlStr);
            htmlStr = m_style.replaceAll(""); // 過濾style標籤

            Pattern p_html = Pattern.compile(regEx_html, Pattern.CASE_INSENSITIVE);
            Matcher m_html = p_html.matcher(htmlStr);
            htmlStr = m_html.replaceAll(""); // 過濾html標籤

            Pattern p_space = Pattern.compile(regEx_space, Pattern.CASE_INSENSITIVE);
            Matcher m_space = p_space.matcher(htmlStr);
            htmlStr = m_space.replaceAll(""); // 過濾空格回車標籤
            return htmlStr.trim(); // 返回文字字串
        }

        public static String getTextFromHtml(String htmlStr){
            htmlStr = delHTMLTag(htmlStr);
            htmlStr = htmlStr.replaceAll(" ", "");
            return htmlStr;
        }

        public static void main(String[] args) {
            String str = "<div style='text-align:center;'> 整治“四風清弊除垢<br/><span style='font-size:14px;'>     </span><span style='font-size:18px;'>公司召開黨的群眾路線教育實踐活動動員大會。</span><br/></div>";
            System.out.println(getTextFromHtml(str));
        }
}

正則表示式去html標籤

正則表示式去html標籤

正則表示式替換HTML標籤小寫為大寫

用正則表示式匹配HTML標籤

html字串去除標籤，字串利用正則表示式去除html標籤

js正則表示式去除HTML標籤

正則表示式替換 html 標籤

python正則表示式去除html標籤

Java中正則表示式去除html標籤

正則表示式刪除HTML標籤

正則表示式去除html標籤

java正則表示式去除html中所有的標籤和特殊HTML字元

C#正則表示式提取HTML中IMG標籤的SRC地址

js 正則表示式去除html字元中所有的標籤（img標籤除外）

Java呼叫replaceAll方法通過正則表示式把HTML字串中的img標籤的src預設屬性值uri補全

正則表示式提取HTML中IMG標籤的SRC地址

用正則表示式修改html字串的所有div的style樣式

正則表示式去除a標籤和img標籤原始碼

正則表示式替換img標籤src值

【教程】BeautifulSoup中使用正則表示式去搜索多種可能的關鍵字

正則表示式去空格換行

正則表示式去html標籤

相關推薦