Python中动态检测编码chardet的使用教程

脚本专栏 2026/8/3 佚名

3 2 1

前言

在互联网的世界里，每个页面都使用了编码，但是形形色色的编码让我们的代码何以得知其棉麻格式呢？charset将很好的解决这个问题。

1. chardet

chardet是Python社区提供了一个类库包，方便我们在代码中动态检测当前页面或者文件中的编码格式信息。接口非常的简单和易用。

Project主页： https://github.com/chardet/chardet

本地下载地址：http://xiazai.jb51.net/201707/yuanma/chardet(jb51.net).rar

文档主页： http://chardet.readthedocs.io/en/latest/usage.html

2. 使用示例

Notice：笔者使用的python 3.5 +

Case 1：检测特定页面的编码格式

import chardet
import urllib.request
TestData = urllib.request.urlopen('http://www.baidu.com/').read()
print(chardet.detect(TestData))

输出结果：

{'confidence': 0.99, 'encoding': 'utf-8'}

结果分析，其准确率99%的概率，编码格式为utf-8

使用说明：detect（）为其关键方法

Case 2: 增量检测编码格式

import urllib.request
from chardet.universaldetector import UniversalDetector
usock = urllib.request.urlopen('http://yahoo.co.jp/')
detector = UniversalDetector()
for line in usock.readlines():
detector.feed(line)
if detector.done: break
detector.close()
usock.close()
print(detector.result)

输出结果：

{'confidence': 0.99, 'encoding': 'utf-8'}

说明：为了提高预测的准确性，基于dector.feed()来实现持续的信息输入，在信息足够充足之后结束信息输入，给出相应的预测和判断。

如果需要复用detector方法，需要进行detector.reset()进行重置，从而可以复用。

Case 3: 在安装chardet之后，可以基于命令行来检测文件编码

% chardetect somefile someotherfile
somefile: windows-1252 with confidence 0.5
someotherfile: ascii with confidence 1.0

在系统层面，可以直接基于命令行来进行文件编码检测，非常简单易用。

3. 总结

chardet是非常易用和功能强大的Python包，相信大家在web世界中遨游之时，肯定会用上这个chardet的。如有问题，欢迎大家反馈给我。

好了，以上就是这篇文章的全部内容了，希望本文的内容对大家的学习或者工作能带来一定的帮助，如果有疑问大家可以留言交流，谢谢大家对的支持。

python,检测编码,python,chardet,使用,python,chardet

标签：

python,检测编码,python,chardet,使用,python,chardet

免责声明：本站文章均来自网站采集或用户投稿，网站不提供任何软件下载或自行开发的软件！如有用户或公司发现本站内容信息存在侵权行为，请邮件告知！ 858582#qq.com

评论“Python中动态检测编码chardet的使用教程”

Python中动态检测编码chardet的使用教程

暂无“Python中动态检测编码chardet的使用教程”评论...

Python中动态检测编码chardet的使用教程

python,检测编码,python,chardet,使用,python,chardet

Python中工作日类库Busines Holiday的介绍与使用

Python解析json之ValueError: Expecting property name enclosed in double quotes: line 1 column 2（char 1）

评论“Python中动态检测编码chardet的使用教程”

更新动态

友情链接