美丽的汤和提取div及其内容的ID

soup.find("tagName", { "id" : "articlebody" })

为什么不返回<div id="articlebody">…</div>标签和东西之间?它什么也不返回。我知道它的存在因为我正盯着它

soup.prettify()

汤。Find ("div"， {"id": "articlebody"})也不起作用。

(编辑:我发现BeautifulSoup没有正确解析我的页面，这可能意味着我试图解析的页面在SGML或其他中没有正确格式化)

当前回答

from bs4 import BeautifulSoup
from requests_html import HTMLSession

url = 'your_url'
session = HTMLSession()
resp = session.get(url)

# if element with id "articlebody" is dynamic, else need not to render
resp.html.render()

soup = bs(resp.html.html, "lxml")
soup.find("div", {"id": "articlebody"})

2020-08-23 06:34:50

其他回答

下面是一个代码片段

soup = BeautifulSoup(:"index.html")
titleList = soup.findAll('title')
divList = soup.findAll('div', attrs={ "class" : "article story"})

正如你所看到的，我找到了所有的标签，然后我找到了所有的标签class="article"在里面

2010-01-25 23:03:03

from bs4 import BeautifulSoup
from requests_html import HTMLSession

url = 'your_url'
session = HTMLSession()
resp = session.get(url)

# if element with id "articlebody" is dynamic, else need not to render
resp.html.render()

soup = bs(resp.html.html, "lxml")
soup.find("div", {"id": "articlebody"})

2020-08-23 06:34:50

你喝过汤吗?findAll("div"， {"id": "articlebody"})?

听起来很疯狂，但如果你从野外采集东西，你不能排除多次潜水的可能性……

2010-01-25 23:00:55

soup.find("tagName",attrs={ "id" : "articlebody" })

2020-10-31 11:03:51

在beautifulsoup源代码中，这一行允许在div中嵌套div;所以你对卢卡斯评论的担心是没有根据的。

NESTABLE_BLOCK_TAGS = ['blockquote', 'div', 'fieldset', 'ins', 'del']

我认为您需要做的是指定您想要的attrs，例如

source.find('div', attrs={'id':'articlebody'})

2010-01-25 23:05:25

美丽的汤和提取div及其内容的ID

推荐文章

最新文章

标签