前端預覽PDF：PDFObject、PDF.js

阿新 • • 發佈：2019-01-14

這兩天有個需求，要在網頁上顯示PDF檔案。首先< object >、< embed >、< iframe >這幾個標籤就能實現PDF檔案的預覽（無需JavaScript支援），我還在網上看了下發現挺多第三方js庫可以實現PDF預覽，如jQuery Document Viewer、jquery.media.js、PDFObject、PDF.js等等。我大概看了下PDFObject、PDF.js這兩個庫，前者並不是一個PDF的渲染工具，而是通過使用< embed >標籤來顯示PDF；後者則會解析PDF檔案內容，還能將PDF渲染成Canvas。

< iframe >

所有瀏覽器都支援 < iframe > 標籤，直接將src設定為指定的PDF檔案就可以預覽了。此外可以把需要的文字放置在 < iframe > 和之間，這樣就可以應對無法理解 iframe 的瀏覽器，比如下面的程式碼可以提供一個PDF的下載連結：

<iframe src="/index.pdf" width="100%" height="100%">

This browser does not support PDFs. Please download the PDF to view it: <a href="/index.pdf">Download PDF</a 
>

</iframe>

< embed >

< embed > 標籤定義嵌入的內容，比如外掛。在HTML5中這個標籤有4個屬性：

屬性	值	描述
height	pixels	設定嵌入內容的高度。
width	pixels	設定嵌入內容的寬度。
type	type	定義嵌入內容的型別。
src	url	嵌入內容的 URL。

但是需要注意的是這個標籤不能提供回退方案，與< iframe > < / iframe >
不同，這個標籤是自閉合的的，也就是說如果瀏覽器不支援PDF的嵌入，那麼這個標籤的內容什麼都看不到。用法如下：

<embed src="/index.pdf" type="application/pdf" width="100%" height="100%">

< object >

< object >定義一個嵌入的物件，請使用此元素向頁面新增多媒體。此元素允許您規定插入 HTML 文件中的物件的資料和引數，以及可用來顯示和操作資料的程式碼。用於包含物件，比如影象、音訊、視訊、Java applets、ActiveX、PDF 以及 Flash。幾乎所有主流瀏覽器都擁有部分對 < object > 標籤的支援。這個標籤在這裡的用法和< iframe >很小，也支援回退：

<object data="/index.php" type="application/pdf" width="100%" height="100%">

This browser does not support PDFs. Please download the PDF to view it: <a href="/index.pdf">Download PDF</a>

</object>

當然，結合< object >和< iframe >能提供一個更強大的回退方案：

<object data="/index.pdf" type="application/pdf" width="100%" height="100%">

<iframe src="/index.pdf" width="100%" height="100%" style="border: none;">

This browser does not support PDFs. Please download the PDF to view it: <a href="/index.pdf">Download PDF</a>

</iframe>

</object>

以上三個標籤是一種無需JavaScript支援的PDF預覽方案。下面提到的PDFObject和PDF.js都是js庫。

PDFObject

看官網上的介紹，PDFObject並不是一個PDF渲染工具，它也是通過< embed >標籤實現PDF預覽：

PDFObject is not a rendering engine. PDFObject just writes an < embed > element to the page, and relies on the browser or browser plugins to render the PDF. If the browser does not support embedded PDFs, PDFObject is not capable of forcing the browser to render the PDF.

PDFObject提供了一個PDFObject.supportsPDFs用於判斷該瀏覽器能否使用PDFObject：

if(PDFObject.supportsPDFs){
   console.log("Yay, this browser supports inline PDFs.");
} else {
   console.log("Boo, inline PDFs are not supported by this browser");
}

整個PDFObject使用起來非常簡單，完整程式碼：

<!DOCTYPE html>
<html>
<head>
    <title>Show PDF</title>
    <meta charset="utf-8" />
    <script type="text/javascript" src='pdfobject.min.js'></script>
    <style type="text/css">
        html,body,#pdf_viewer{
            width: 100%;
            height: 100%;
            margin: 0;
            padding: 0;
        }
    </style>
</head>
<body>
    <div id="pdf_viewer"></div>
</body>
<script type="text/javascript">
    if(PDFObject.supportsPDFs){
        // PDF嵌入到網頁
        PDFObject.embed("index.pdf", "#pdf_viewer" );
    } else {
        location.href = "/canvas";
    }
</script>
</html>

效果如下：
這裡寫圖片描述

PDF.js

PDF.js可以實現在html下直接瀏覽pdf文件，是一款開源的pdf文件讀取解析外掛，非常強大，能將PDF檔案渲染成Canvas。PDF.js主要包含兩個庫檔案，一個pdf.js和一個pdf.worker.js，一個負責API解析，一個負責核心解析。
首先引入pdf.js檔案<script type="text/javascript" src='pdf.js'></script>
PDF.js大部分用法都是基於Promise的，PDFJS.getDocument(url)方法返回的就是一個Promise：

    PDFJS.getDocument('../index.pdf').then(pdf=>{
        var numPages = pdf.numPages;
        var start = 1;
        renderPageAsync(pdf, numPages, start);
    });

Promise返回的pdf是一個PDFDocumentProxy物件官網API介紹是：

Proxy to a PDFDocument in the worker thread. Also, contains commonly used properties that can be read synchronously.

PDF的解析工作需要通過pdf.getPage(page)去執行，這個方法返回的也是一個Promise，因此可以通過async/await函式去逐頁解析PDF：

    async function renderPageAsync(pdf, numPages, current){
        for(let i=1; i<=numPages; i++){
            // 解析page
            let page = await pdf.getPage(i);
            // 渲染
            // ...
        }
    }

得到的page是一個PDFPageProxy物件，即Proxy to a PDFPage in the worker thread 。這個物件得到了這一頁的PDF解析結果，我們可以看下這個物件提供的方法：

方法	返回
getAnnotations	A promise that is resolved with an {Array} of the annotation objects.
getTextContent	That is resolved a TextContent object that represent the page text content.
getViewport	Contains ‘width’ and ‘height’ properties along with transforms required for rendering.
render	An object that contains the promise, which is resolved when the page finishes rendering.

我們可以試試呼叫getTextContent方法，並將其結果打印出來：

page.getTextContent().then(v=>console.log('page', v));

第一頁部分結果如下：

{
    "items": [
        {
            "str": "小冊子標題",
            "dir": "ltr",
            "width": 240,
            "height": 2304,
            "transform": [
                48,
                0,
                0,
                48,
                45.32495,
                679.04
            ],
            "fontName": "g_d0_f1"
        },
        {
            "str": " ",
            "dir": "ltr",
            "width": 9.600000000000001,
            "height": 2304,
            "transform": [
                48,
                0,
                0,
                48,
                285.325,
                679.04
            ],
            "fontName": "g_d0_f2"
        }
      ],
    "styles": {
        "g_d0_f1": {
            "fontFamily": "monospace",
            "ascent": 1.05810546875,
            "descent": -0.26171875,
            "vertical": false
        },
        "g_d0_f2": {
            "fontFamily": "sans-serif",
            "ascent": 0.74365234375,
            "descent": -0.25634765625
        }
    }
 }

我們可以發現，PDF.js將每頁文字的字串、位置、字型都解析出來，感覺還是挺厲害的。

官網有個demo，還用到了官網提到的viewer.js（我認為它的作用是對PDF.js渲染結果再次處理）：http://mozilla.github.io/pdf.js/web/viewer.html，我看了一下它的HTML機構，首先底圖是一個Canvas，內容和PDF一樣（通過下面介紹的page.render方法可以得到），底圖之上是一個textLayer，我猜想這一層就是通過page.getTextContent()得到了字型的位置和樣式，再覆蓋在Canvas上：
這裡寫圖片描述
通過這種方式就能實現再預覽檔案上選中文字（剛開始我還在納悶為什麼渲染成Canvas還能選擇文字）

將page渲染成Canvas是通過render方法實現的，程式碼如下：

    async function renderPageAsync(pdf, numPages, current){
        console.log("renderPage async");
        for(let i=1; i<=numPages; i++){
            // page
            let page = await pdf.getPage(i);

            let scale = 1.5;
            let viewport = page.getViewport(scale);
            // Prepare canvas using PDF page dimensions.
            let canvas = document.createElement("canvas");
            let context = canvas.getContext('2d');
            document.body.appendChild(canvas);

            canvas.height = viewport.height;
            canvas.width = viewport.width;

            // Render PDF page into canvas context.
            let renderContext = {
                    canvasContext: context,
                    viewport: viewport
            };
            page.render(renderContext);
        }
    }

PDF.js是Mozilla實驗室的作品，感覺真的很強大！
我在碼雲上有個demo，結合了PDFObject和PDF.js。因為PDFObject使用的< embed >標籤可以直接顯示PDF檔案，速度很快；但是手機上很多瀏覽器不支援，比如微信的瀏覽器、小米瀏覽器，所以我就使用了PDF.js將其渲染成Canvas，速度與PDFObject相比慢多了，但至少能看。-_-||
demo地址：https://git.oschina.net/liuyaqi/PDFViewer.git

前端預覽PDF：PDFObject、PDF.js

< iframe >

< embed >

< object >

PDFObject

PDF.js

前端預覽PDF：PDFObject、PDF.js

線上預覽WORD文件，PDF文件

Android實現線上預覽office文件(Word,Pdf,excel,PPT.txt等格式)

使用pdf.js線上預覽遠端伺服器上的pdf檔案

java實現線上預覽--poi實現word、excel、ppt轉html

JS:上傳時圖片預覽(input：type="file" :)

利用HTML5上傳檔案並顯示在前端預覽，以圖片為例

前端預覽input上傳的圖片

上傳圖片截圖預覽控制元件不顯示cropper.js 跨域問題

移動端圖片預覽外掛-fly-zomm-img.min.js

HTTPS配置入門：Nginx、Node.js配置HTTPS伺服器

前端面試之模組化-3、require.js的用法

Electron學習一：Electron、Node.js、JavaScript、JQuery、Vue.js、Angular.js的不同

前端實現線上預覽pdf、word、xls、ppt等檔案

Asp.net MVC 利用(aspose+pdfobject.js) 實現線上預覽word、excel、ppt、pdf檔案

PDF預覽之PDFObject.js總結

PDF、WORD、EXCEL、PPT預覽

txt、doc、xls、ppt、pdf檔案線上預覽

h5線上預覽dox、xls、ppt、pdf

【微信小程式】下載並預覽文件——pdf、word、excel等多種型別

前端預覽PDF：PDFObject、PDF.js

< iframe >

< embed >

< object >

PDFObject

PDF.js

相關推薦