【WEB开发】wkhtmltox使用教程：HTML转PDF图片

当前位置：点晴教程→知识管理交流 →『技术文档交流』

admin

2025年7月18日 14:33 本文热度 1066

一、介绍与学习

1.1 概述

本问将讲解如何使用wkhtmltox工具将HTML内容转换为PDF和各种格式的图片。

1.2 目标

理解wkhtmltox的工作原理和核心概念
掌握wkhtmltox的安装与配置方法
熟练使用命令行和Python接口进行转换操作
掌握高级参数配置和性能优化技巧
能够解决常见的转换问题和乱码问题

1.3 知识图谱

二、wkhtmltox基础概念

2.1 什么是wkhtmltox？

wkhtmltox是一个开源的命令行工具，基于WebKit引擎，能够将HTML文档转换为PDF和各种格式的图片（如PNG、JPEG）。它支持多种操作系统（Windows、Linux、MacOS）和编程语言（Python、Java、PHP等）。

核心特点：

基于WebKit渲染引擎，确保与主流浏览器一致的渲染效果
支持CSS、JavaScript和复杂页面布局
提供丰富的命令行参数进行精细控制
轻量级，无需依赖完整浏览器环境

2.2 工作原理

2.3 适用场景与限制

适用场景举例：

报表生成
电子书制作
网页存档
自动化文档处理
系统监控截图

局限性：

对复杂CSS3和HTML5支持有限
大量图片的页面转换性能较低
某些JavaScript效果可能无法正确呈现
中文等非拉丁字符可能出现乱码（需额外配置）

三、安装与配置

3.1 Windows安装

下载安装包：

官方下载地址：https://wkhtmltopdf.org/downloads.html
推荐版本：0.12.6（目前最新稳定版）

安装步骤：

3. 验证安装

wkhtmltopdf -V
# 预期输出：wkhtmltopdf 0.12.6 (with patched qt)

结果示例：

3.2 Linux安装

Ubuntu/Debian:

sudo apt-get install xvfb libfontconfig wkhtmltopdf

CentOS/RHEL:

sudo yum install xorg-x11-fonts-75dpi xorg-x11-fonts-Type1 wkhtmltopdf

3.3 Python环境准备

推荐使用pdfkit作为Python封装：

pip install pdfkit

3.4 环境变量配置

表1：wkhtmltox关键路径说明

操作系统	默认安装路径	备注
Windows	C:\Program Files\wkhtmltopdf\bin	需要添加到PATH
Linux	/usr/local/bin	通常已自动配置
MacOS	/usr/local/bin/wkhtmltopdf	可能需要手动链接

四、基础使用教程

4.1 命令行基础用法

4.1.1 HTML转PDF

基本语法：

wkhtmltopdf [选项] <输入文件/URL> <输出PDF文件>

示例：

# 将网页转换为PDF
wkhtmltopdf https://example.com output.pdf

# 将本地HTML文件转换为PDF
wkhtmltopdf input.html output.pdf

4.1.2 HTML转图片

基本语法：

wkhtmltoimage [选项] <输入文件/URL> <输出图片文件>

示例：

# 将网页转换为PNG图片
wkhtmltoimage https://example.com output.png

# 指定图片质量
wkhtmltoimage --quality 85 input.html output.jpg

4.2 Python集成使用

4.2.1 使用subprocess模块

import subprocess

def html_to_pdf(html_path, pdf_path):
    """使用subprocess调用wkhtmltopdf"""
    try:
        subprocess.run(['wkhtmltopdf', html_path, pdf_path], check=True)
        print(f"成功生成PDF: {pdf_path}")
    except subprocess.CalledProcessError as e:
        print(f"生成PDF失败: {e}")

# 使用示例
html_to_pdf('input.html', 'output.pdf')

4.2.2 使用pdfkit封装

import pdfkit

# 基本使用
pdfkit.from_url('http://example.com', 'output.pdf')
pdfkit.from_file('input.html', 'output.pdf')
pdfkit.from_string('<h1>Hello world!</h1>', 'output.pdf')

# 带配置选项
options = {
    'page-size': 'A4',
    'margin-top': '0.75in',
    'margin-right': '0.75in',
    'margin-bottom': '0.75in',
    'margin-left': '0.75in',
    'encoding': "UTF-8",
    'no-outline': None
}

pdfkit.from_url('http://example.com', 'output.pdf', options=options)

4.3 常用参数

表2：wkhtmltopdf常用参数分类说明

类别	参数	说明	示例
页面设置	--page-size	纸张大小(A3, A4, Letter等)	--page-size A4
	--orientation	方向(Portrait纵向, Landscape横向)	--orientation Landscape
边距设置	--margin-top	上边距	--margin-top 15mm
	--margin-bottom	下边距	--margin-bottom 20mm
页眉页脚	--header-left	左页眉文本	--header-left "[title]"
	--footer-center	居中页脚文本	--footer-center "第[page]页"
内容控制	--no-images	不加载图片	--no-images
	--disable-javascript	禁用JavaScript	--disable-javascript
高级选项	--encoding	设置编码	--encoding "UTF-8"
	--user-style-sheet	自定义CSS	--user-style-sheet style.css

五、高级功能

5.1 页眉页脚配置

5.1.1 基本文本页眉页脚

options = {
    'header-right': '[date]',
    'footer-center': '第[page]页/共[topage]页',
    'footer-font-size': '8',
    'header-font-size': '8'
}

5.1.2 HTML自定义页眉页脚

创建header.html:

<div style="text-align: right; font-size: 10px;">
    报告日期: <span style="font-weight: bold;">[date]</span>
</div>

Python代码：

options = {
    'header-html': 'header.html',
    'footer-html': 'footer.html',
    'margin-top': '25mm'
}

5.2 目录生成

options = {
    'toc': True,  # 生成目录
    'toc-header-text': '目录',  # 目录标题
    'toc-level-indentation': '2em',  # 缩进
    'toc-text-size-shrink': 0.8  # 字体缩小比例
}

5.3 多页面处理

5.3.1 合并多个HTML为单个PDF

pdfkit.from_file(['page1.html', 'page2.html'], 'combined.pdf')

# 或者使用命令行
# wkhtmltopdf page1.html page2.html combined.pdf

5.3.2 封面页设置

options = {
    'cover': 'cover.html',  # 封面页
    'toc': True,  # 目录
    'cover-first': True  # 封面放在第一页
}

5.4 性能优化

禁用不必要的内容：

options = {
    'no-images': None,  # 不加载图片
    'disable-javascript': None,  # 禁用JS
    'disable-smart-shrinking': None  # 禁用智能缩放
}

使用Xvfb（Linux）：

xvfb-run -a wkhtmltopdf input.html output.pdf

调整JavaScript延迟：

options = {
    'javascript-delay': '1000'  # 延迟1秒等待JS执行
}

六、Python深度集成

6.1 pdfkit模块详解

6.1.1 API原型

pdfkit.from_url(url, output_path, options=None, configuration=None)
pdfkit.from_file(input, output_path, options=None, configuration=None)
pdfkit.from_string(input, output_path, options=None, configuration=None)

6.1.2 配置管理

# 自定义wkhtmltopdf路径
config = pdfkit.configuration(wkhtmltopdf='/usr/local/bin/wkhtmltopdf')

# 使用配置
pdfkit.from_string(html, 'output.pdf', configuration=config)

6.2 应用示例

6.2.1 动态生成报告

from jinja2 import Template
import pdfkit

# 准备数据
report_data = {
    'title': '销售报告',
    'date': '2023-11-15',
    'items': [
        {'name': '产品A', 'sales': 1200},
        {'name': '产品B', 'sales': 1800}
    ]
}

# 加载模板
withopen('report_template.html') as f:
    template = Template(f.read())

# 渲染HTML
html_content = template.render(report_data)

# 生成PDF
pdfkit.from_string(html_content, 'sales_report.pdf', options={
    'encoding': "UTF-8",
    'margin-top': '0.5in',
    'margin-bottom': '0.5in'
})

6.2.2 异步批量处理

import asyncio
import aiofiles
from concurrent.futures import ThreadPoolExecutor

asyncdefgenerate_pdf(html_path, pdf_path):
    asyncwith aiofiles.open(html_path, 'r') as f:
        html = await f.read()
    
    loop = asyncio.get_event_loop()
    with ThreadPoolExecutor() as pool:
        await loop.run_in_executor(
            pool, 
            lambda: pdfkit.from_string(html, pdf_path)
        )
    print(f"Generated: {pdf_path}")

asyncdefmain():
    tasks = [
        generate_pdf(f'input_{i}.html', f'output_{i}.pdf')
        for i inrange(1, 6)
    ]
    await asyncio.gather(*tasks)

asyncio.run(main())

七、常见问题

7.1 中文乱码问题

解决方案：

确保系统安装中文字体

# Ubuntu
sudo apt-get install fonts-wqy-microhei

在HTML中指定UTF-8编码

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

添加wkhtmltopdf参数

options = {
    'encoding': "UTF-8",
    'user-style-sheet': '/path/to/stylesheet.css'
}

7.2 页面截断问题

解决方案：

options = {
    'disable-smart-shrinking': None,  # 禁用智能缩放
    'viewport-size': '1280x1024',  # 设置视口大小
    'dpi': 300,  # 提高DPI
    'zoom': 0.8  # 适当缩放
}

7.3 性能优化

表3：性能问题排查表

问题现象	可能原因	解决方案
转换速度慢	复杂页面/大量资源	禁用图片/JS，使用简单布局
内存占用高	大页面/内存泄漏	分页处理，升级wkhtmltopdf版本
输出文件过大	高分辨率图片	使用--image-quality降低质量
部分内容缺失	渲染未完成	增加--javascript-delay值

八、应用扩展

8.1 与Web框架集成

8.1.1 Django集成示例

# views.py
from django.http import HttpResponse
import pdfkit

defgenerate_pdf(request):
    # 获取或生成HTML
    html = "<h1>Django PDF 报告</h1>"
    
    # 生成PDF
    pdf = pdfkit.from_string(html, False, options={
        'encoding': "UTF-8"
    })
    
    # 创建响应
    response = HttpResponse(pdf, content_type='application/pdf')
    response['Content-Disposition'] = 'attachment; filename="report.pdf"'
    return response

8.1.2 Flask集成示例

from flask import Flask, make_response
import pdfkit

app = Flask(__name__)

@app.route('/report')
def generate_report():
    html = render_template('report.html')
    pdf = pdfkit.from_string(html, False)
    
    response = make_response(pdf)
    response.headers['Content-Type'] = 'application/pdf'
    response.headers['Content-Disposition'] = 'inline; filename=report.pdf'
    return response

8.2 自动化报告系统

import smtplib
from email.mime.application import MIMEApplication
from email.mime.multipart import MIMEMultipart
import pdfkit

defsend_report_by_email(recipient, html_content):
    # 生成PDF
    pdf = pdfkit.from_string(html_content, False)
    
    # 创建邮件
    msg = MIMEMultipart()
    msg['Subject'] = '每日报告'
    msg['From'] = 'reports@company.com'
    msg['To'] = recipient
    
    # 添加PDF附件
    part = MIMEApplication(pdf, Name='report.pdf')
    part['Content-Disposition'] = 'attachment; filename="report.pdf"'
    msg.attach(part)
    
    # 发送邮件
    with smtplib.SMTP('smtp.company.com') as server:
        server.send_message(msg)

8.3 与数据分析工具结合

import pandas as pd
import pdfkit

# 生成数据分析报告
defgenerate_analysis_report(data_path):
    # 读取数据
    df = pd.read_csv(data_path)
    
    # 生成HTML
    html = """
    <h1>数据分析报告</h1>
    <h2>数据概览</h2>
    {}
    <h2>描述统计</h2>
    {}
    """.format(
        df.head().to_html(),
        df.describe().to_html()
    )
    
    # 生成PDF
    pdfkit.from_string(html, 'analysis_report.pdf', options={
        'encoding': 'UTF-8',
        'margin-top': '0.5in'
    })