基於Lire庫搜尋相似圖片

阿新 • • 發佈：2019-02-02

什麼是Lire

LIRE（Lucene Image REtrieval）提供一種的簡單方式來建立基於影象特性的Lucene索引。利用該索引就能夠構建一個基於內容的影象檢索(content- based image retrieval，CBIR)系統，來搜尋相似的影象。LIRE使用的特性都取自MPEG-7標準： ScalableColor、ColorLayout、EdgeHistogram。此外該類庫還提供一個搜尋該索引的方法。

下面直接介紹程式碼實現

程式碼結構

Gradle依賴為

dependencies {
    compile fileTree(dir: 'libs', include: ['*.jar'])
    testCompile group: 'junit', name: 'junit', version: '4.11'

    compile group: 'us.codecraft', name: 'webmagic-core', version: '0.7.3'
    // https://mvnrepository.com/artifact/us.codecraft/webmagic-extension
    compile group: 'us.codecraft', name: 'webmagic-extension', version: '0.7.3'

    compile group: 'commons-io', name: 'commons-io', version: '2.6'

    compile group: 'org.apache.lucene', name: 'lucene-core', version: '6.4.0'
    compile group: 'org.apache.lucene', name: 'lucene-analyzers-common', version: '6.4.0'
    compile group: 'org.apache.lucene', name: 'lucene-queryparser', version: '6.4.0'

    // https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient
    compile group: 'org.apache.httpcomponents', name: 'httpclient', version: '4.5.6'
}

爬取圖片樣本

使用WebMagic爬蟲爬取華為應用市場應用的圖示當做樣本，WebMagic使用請看《WebMagic爬取應用市場應用資訊》

import us.codecraft.webmagic.Page;
import us.codecraft.webmagic.Site;
import us.codecraft.webmagic.Spider;
import us.codecraft.webmagic.processor.PageProcessor;
import us.codecraft.webmagic.selector.Selectable;

/**
 * @author wzj
 * @create 2018-07-17 22:06
 **/
public class AppStoreProcessor implements PageProcessor
{
    // 部分一：抓取網站的相關配置，包括編碼、抓取間隔、重試次數等
    private Site site = Site.me().setRetryTimes(5).setSleepTime(1000);

    public void process(Page page)
    {
        //獲取名稱
        String name = page.getHtml().xpath("//p/span[@class='title']/text()").toString();
        page.putField("appName",name );

        String downloadIconUrl =  page.getHtml().xpath("//img[@class='app-ico']/@src").toString();
        page.putField("downloadIconUrl",downloadIconUrl );

        if (name == null || downloadIconUrl == null)
        {
            //skip this page
            page.setSkip(true);
        }

        //獲取頁面其他連結
        Selectable links = page.getHtml().links();
        page.addTargetRequests(links.regex("(http://app.hicloud.com/app/C\\d+)").all());
    }


    public Site getSite()
    {
        return site;
    }

    public static void main(String[] args)
    {
        Spider.create(new AppStoreProcessor())

                .addUrl("http://app.hicloud.com")
                .addPipeline(new MyPipeline())
                .thread(20)
                .run();
    }
}

上面程式碼提取出來每個頁面的圖示下載URL，自定義了Pipeline來儲存應用圖示，使用Apache的HttpClient包來下載圖片

import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import us.codecraft.webmagic.ResultItems;
import us.codecraft.webmagic.Task;
import us.codecraft.webmagic.pipeline.Pipeline;

import java.io.*;
import java.nio.file.Paths;

/**
 * @author wzj
 * @create 2018-07-17 22:16
 **/
public class MyPipeline implements Pipeline
{
    /**
     * 儲存檔案的路徑,儲存到資源目錄下
     */
    private static final String saveDir = MyPipeline.class.getResource("/conf/image").getPath();

    /*
     * 統計數目
     */
    private int count = 1;


    /**
     * Process extracted results.
     *
     * @param resultItems resultItems
     * @param task        task
     */
    public void process(ResultItems resultItems, Task task)
    {
        String appName = resultItems.get("appName");
        String downloadIconUrl = resultItems.get("downloadIconUrl");

        try
        {
            saveIcon(downloadIconUrl,appName);
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }

        System.out.println(String.valueOf(count++) + " " + appName);
    }

    public void saveIcon(String downloadUrl,String appName) throws IOException
    {
        CloseableHttpClient client = HttpClients.createDefault();
        HttpGet get = new HttpGet(downloadUrl);
        CloseableHttpResponse response = client.execute(get);
        HttpEntity entity = response.getEntity();
        InputStream input = entity.getContent();
        BufferedInputStream bufferedInput = new BufferedInputStream(input);
        File file = Paths.get(saveDir,appName + ".png").toFile();
        FileOutputStream output = new FileOutputStream(file);
        byte[] imgByte = new byte[1024 * 2];
        int len = 0;
        while ((len = bufferedInput.read(imgByte, 0, imgByte.length)) != -1)
        {
            output.write(imgByte, 0, len);
        }
        input.close();
        output.close();
    }
}

注意：可能華為應用市場有反爬蟲機制，每次只能爬取1000個左右的圖示。

Lire測試程式碼

注意：類中的IMAGE_PATH指定圖片路徑，INDEX_PATH指定索引儲存位置，程式碼拷貝之後，需要修改路徑。

indexImages方法是建立索引，searchSimilarityImage方法是查詢最相似的圖片，並把相似度打印出來。

GenericFastImageSearcher方法的第一個引數是指定搜尋Top相似的圖片，我設定的為5，就找出最相似的5個圖片。

ImageSearcher searcher = new GenericFastImageSearcher(5, CEDD.class);

圖片越相似，給出的相似值越小，如果為1.0說明是原圖片，下面是完整程式碼

import net.semanticmetadata.lire.builders.DocumentBuilder;
import net.semanticmetadata.lire.builders.GlobalDocumentBuilder;
import net.semanticmetadata.lire.imageanalysis.features.global.CEDD;
import net.semanticmetadata.lire.searchers.GenericFastImageSearcher;
import net.semanticmetadata.lire.searchers.ImageSearchHits;
import net.semanticmetadata.lire.searchers.ImageSearcher;
import net.semanticmetadata.lire.utils.FileUtils;
import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.FSDirectory;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.file.Paths;
import java.util.Iterator;
import java.util.List;


/**
 * @author wzj
 * @create 2018-07-22 11:16
 **/
public class ImageSimilarityTest
{
    /**
     * 圖片儲存的路徑
     */
    private static final String IMAGE_PATH = "H:\\JAVA\\ImageSim\\conf\\image";

    /**
     * 索引儲存目錄
     */
    private static final String INDEX_PATH = "H:\\JAVA\\ImageSim\\conf\\index";


    public static void main(String[] args) throws IOException
    {
        //indexImages();
        searchSimilarityImage();
    }

    private static void indexImages() throws IOException
    {
        List<String> images = FileUtils.getAllImages(Paths.get(IMAGE_PATH).toFile(), true);

        GlobalDocumentBuilder globalDocumentBuilder = new GlobalDocumentBuilder(false, false);
        globalDocumentBuilder.addExtractor(CEDD.class);

        IndexWriterConfig conf = new IndexWriterConfig(new WhitespaceAnalyzer());
        IndexWriter indexWriter = new IndexWriter(FSDirectory.open(Paths.get(INDEX_PATH)), conf);

        for (Iterator<String> it = images.iterator(); it.hasNext(); )
        {
            String imageFilePath = it.next();
            System.out.println("Indexing " + imageFilePath);

            BufferedImage img = ImageIO.read(new FileInputStream(imageFilePath));
            Document document = globalDocumentBuilder.createDocument(img, imageFilePath);
            indexWriter.addDocument(document);
        }

        indexWriter.close();

        System.out.println("Create index image successful.");
    }

    private static void searchSimilarityImage() throws IOException
    {
        IndexReader ir = DirectoryReader.open(FSDirectory.open(Paths.get(INDEX_PATH)));
        ImageSearcher searcher = new GenericFastImageSearcher(5, CEDD.class);

        String inputImagePath = "H:\\JAVA\\ImageSim\\conf\\image\\5.png";
        BufferedImage img = ImageIO.read(Paths.get(inputImagePath).toFile());

        ImageSearchHits hits = searcher.search(img, ir);


        for (int i = 0; i < hits.length(); i++)
        {
            String fileName = ir.document(hits.documentID(i)).getValues(DocumentBuilder.FIELD_NAME_IDENTIFIER)[0];
            System.out.println(hits.score(i) + ": \t" + fileName);
        }
    }
}

測試結果如下：

原始碼下載

基於Lire庫搜尋相似圖片

什麼是Lire LIRE（Lucene Image REtrieval）提供一種的簡單方式來建立基於影象特性的Lucene索引。利用該索引就能夠構建一個基於內容的影象檢索(content- based image retrieval，CBIR)系統，來搜尋相似的影象。LIR

基於libjpeg庫實現JPEG圖片壓縮程式碼實現（程式碼普適性強）

本文先把程式碼貼上，後續會講解原理及實現過程1、從官網上下載jpeg原始碼，編譯成庫（本人在windows下編譯），編譯方法網上很多，這裡不敘述。2、新增庫檔案和jpeglib.h、jmorecfg.h、jconfig.h至工程中3、程式碼實現（VS2013）#define

基於libjpeg庫實現JPEG圖片解碼程式碼實現（程式碼普適性強）

準備工作同上篇一致。JPEG壓縮基於libjpeg中的example.c進行擴充套件編寫解碼原理及實現後續會進行介紹程式碼部分：#include <afxwin.h>#include <setjmp.h>extern "C" {#include "jp

python+opencv實現相似圖片的搜尋

在學習時：http://python.jobbole.com/80860/ 在這裡對上面給出的連結中的程式碼進行整理和修改了下，影象搜尋的原理，以及搜尋的大致步驟和想法，在原博主文章中已經講解的很詳細了，在這裡我就不寫了，對於上面連結中的程式碼，有些地方是需要改動的先貼出我的程

相似圖片搜尋的原理，python實現的方法解密！

2011年，Google把“相似圖片搜尋”正式放上了首頁。你可以用一張圖片，搜尋網際網路上所有與它相似的圖片。點選搜尋框中照相機的圖示。學習Pyt

利用Python實現簡單的相似圖片搜尋

【搞了好幾天，終於把程式復原除錯通過，特此在這裡把技術文件貼出來，尤其是環境配置的說明，供大家分享。】寫作本文的目是發現建立網站的時候，很多使用者用相同的頭像，這導致識別度降低，為了防止使用者上傳相同的圖片作為自己的頭像以及上傳不當的影象檔案，作者研究了這個

相似圖片搜尋原理三(顏色直方圖—c++實現)

影象的顏色直方圖可以用於影象檢索，適應有相同色彩，並且可以有平移、縮放、旋轉不變性的影象檢索，當然了這三大特點不如sift或者surf穩定性強，此外最大的侷限就是如果形狀內容一樣，但色彩不一，結果是搜不到的。不過它在某些情況下達到較好的結果。顏色直方圖兩種計算

Google 相似圖片搜尋原理

前陣子在阮一峰的部落格上看到了這篇《相似圖片搜尋原理》部落格，就有一種衝動要將這些原理實現出來了。 Google "相似圖片搜尋"：你可以用一張圖片，搜尋網際網路上所有與它相似的圖片。開啟Google圖片搜尋頁面：點選使用上傳一張angelababy原圖：

相似圖片搜尋原理

給定一張圖片，怎樣在網際網路上找出和它近似的圖片？可以使用Google的相似圖片搜尋功能，匹配程度相當高。那麼，計算機是採用怎樣的技術判定兩張圖片的相似程度的呢？方法有很多。在這裡，我們從圖片的輪廓和色彩兩方面入手，做一個簡單的演算法實現，重在原理呈現。

基於VUE選擇上傳圖片並在頁面顯示（圖片可刪除）

.ajax sta http data .cn 數據 file prim 生成 demo例子：依賴文件： http://files.cnblogs.com/files/zhengweijie/jquery.form.rar HTML文本內容：

IJL庫之JPEG圖片壓縮

長度 def 釋放 dll rom tmp 一級目錄加載 ever 　　如何將比較大的圖片壓縮成比較小的圖片，通常在相機一直拍圖且需要將圖片網絡傳輸時，必須壓縮圖片。相機一般幾十FPS，每張幾M，只能用JPEG有順壓縮才能到可以實時傳輸要求。還有就是這種特定情況壓縮需要

樹莓派與Arduino Leonardo使用NRF24L01無線模塊通信之基於RF24庫 (四) 樹莓派單子節點查詢

spi listening div num 另一個 control 樹莓派 des gin 考慮到項目的實際需要，樹莓派作為主機，應該只在需要的時候查詢特定節點發送的數據，因此接收到數據後需要根據頭部判斷是否是自己需要的數據，如果不是繼續接收數據，超過一定時間未查詢到特定節

樹莓派與Arduino Leonardo使用NRF24L01無線模塊通信之基於RF24庫 (六) 樹莓派查詢子節點溫濕度數據

put fort 自己 include signed 區分 hardware atoi ace nrl24l01每次只能發送4個字節，前面說到，第一個字節用於源節點，第二個字節用於目的節點。因此只剩下兩個字節用於溫度和濕度，一個字節只有八位，需要表示溫濕度的正負數，因此每個

【轉載】MySQL存入圖片+Qt讀入讀出數據庫中的圖片

alt 頻道 AI ati post OS 讀取圖片 val info /* Time： 2017.01.02 —— 2017.01.04 * Author： WJ * Function：連接數據庫，從數據庫中讀取圖片並顯示（已成功） */ 【參考鏈接】 MySQL存入圖片

爬蟲-基於bs4庫的HTML內容查找方法

簽名正則化 all 擴展 rev recursive title 參數 pre bs4有一個find_all(name,attrs,recursive,string,**kwargs)方法，返回一個列表類型，存儲查找的結果 name 對標簽名稱的檢索字符串 attrs 對

基於bs4庫的HTML內容查找方法和HTML格式化和編碼

檢索 mage rec ive string ngs info TP 正則表達式 bs4庫的prettify()方法：將某一個標簽打印：對於中文的HTML代碼，也可以直接打印：

上傳圖片至數據庫及從數據庫中讀取圖片顯示至頁面

for循環 common 基於 serial 文件創建每一個 super lis size 1.基於最簡單的servlet+jsp+jdbc實現 2.實驗環境：myeclipse以及tomcat 8.5 3.所需jar包：　　 4.數據庫：　　數據庫用的是mysql

使用innobackupex基於從庫搭建mysql主從架構

oot word info conf over found upgrade datadir 參數使用innobackupex基於從庫搭建mysql主從架構現有的架構是一主一從，版本為Mysql5.6.37。實施要求是：利用從庫，搭建第二個從庫，版本為5.7.21 1、備

基於開源算法實現圖片比對進行圖片全圖和局部比對

== transform col reads img 希望 object 兩個最新需要最新源碼，或技術提問，請加QQ群：538327407，由於源碼在不斷完善，會在之後同步到AI開源項目中一、需求需要針對藝術品局部和全圖進行相識度比對，從而識別圖片的真

OpenCV框架各類演算法，Python構建相似圖片搜尋引擎！

頂級公司和反向圖片搜尋有很多頂級的科技公司把RIQ用得很好。例如，Pinterest 2014年第一次實現視覺搜尋。隨後2015年它釋出了一個白皮書，揭示了視覺搜尋的結構。反向圖片搜尋使得Pinterest能夠從時尚的東西中提取視覺元素，然後給消費者推薦類似的產品。

基於Lire庫搜尋相似圖片

什麼是Lire

程式碼結構

爬取圖片樣本

Lire測試程式碼

原始碼下載

相關推薦