QNA > H > How Can We Extract The 'Src' Attribute From An Img Tag In Python?

How can we extract the 'Src' attribute from an IMG tag in Python?

One way to do it is by using BeautifulSoup, a python library for webscraping.

From Webpage URLs

from BeautifulSoup import BeautifulSoup as BSHTML
import urllib2
page = urllib2.urlopen('http://www.youtube.com/')
soup = BSHTML(page)
images = soup.findAll('img')
for image in images:
#print image source
print image['src']
#print alternate text
print image['alt']

From Text

from BeautifulSoup import BeautifulSoup as BSHTML
htmlText = """ """
soup = BSHTML(htmlText)
images = soup.findAll('img')
for image in images:
print image['src']

There are other HTML/XML parsing libraries in Python which could help out, as well. BeautifulSoup è ampiamente utilizzato, ha un buon numero di tutorial e una comunità di utenti che lo supporta, il che lo rende una buona scelta per uno scraper/parser.

Di Morena

Articoli simili

Posso usare la mia smart TV come cornice digitale? Se sì, come? :: Come è il cellulare Honor 7C?