Beautiful Soup

**Beautiful Soup**
原作者	Leonard Richardson
当前版本	4.12.3 (2024年1月17日；穩定版本)[1];
源代码库	code.launchpad.net/beautifulsoup/;
编程语言	Python
类型	HTML解析库、网络数据采集
许可协议	Python软件基金会许可证（Beautiful Soup 3及以前）; MIT許可證（Beautiful 4及以后）[2]
网站	www.crummy.com/software/BeautifulSoup/

Beautiful Soup是一个Python包，功能包括解析HTML、XML文档、修复含有未闭合标签等错误的文档（此种文档常被称为tag soup）。这个扩展包为待解析的页面建立一棵树，以便提取其中的数据，这在网络数据采集时非常有用。[2]

在2021年，Python 2.7的官方支持终止，BeautifulSoup发行版4.9.3是支持Python 2.7的最后版本[3]。

示例代码

#!/usr/bin/env python3
# Anchor extraction from HTML document
from bs4 import BeautifulSoup
from urllib.request import urlopen
with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response:
    soup = BeautifulSoup(response, 'html.parser')
    for anchor in soup.find_all('a'):
        print(anchor.get('href', '/'))

参见

HTML解析器对比

参考资料

https://git.launchpad.net/beautifulsoup/tree/CHANGELOG; 检索日期: 2024年1月18日.
. [18 April 2012]. （原始内容存档于2017-02-03）. Beautiful Soup is licensed under the same terms as Python itself
Richardson, Leonard. . beautifulsoup. Google Groups. 7 Sep 2021 [27 September 2022]. （原始内容存档于2022-09-29）.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[wikidata-270cf90818bd03dc83ccffd63c9903d697c1d933-v3-1] ttps://git.launchpad.net/beautifulsoup/tree/CHANGELOG; 检索日期: 2024年1月18日.

[crummy.com-2] . [18 April 2012]. （原始内容存档于2017-02-03）. Beautiful Soup is licensed under the same terms as Python itself

[3] Richardson, Leonard. . beautifulsoup. Google Groups. 7 Sep 2021 [27 September 2022]. （原始内容存档于2022-09-29）.