QNA > H > How Can We Extract The 'Src' Attribute From An Img Tag In Python?

How can we extract the 'Src' attribute from an IMG tag in Python?

One way to do it is by using BeautifulSoup, a python library for webscraping.

From Webpage URLs

  1. from BeautifulSoup import BeautifulSoup as BSHTML 
  2. import urllib2 
  3. page = urllib2.urlopen('http://www.youtube.com/') 
  4. soup = BSHTML(page) 
  5. images = soup.findAll('img') 
  6. for image in images: 
  7. #print image source 
  8. print image['src'] 
  9. #print alternate text 
  10. print image['alt'] 

From Text

  1. from BeautifulSoup import BeautifulSoup as BSHTML 
  2. htmlText = """src1.com """ 
  3. soup = BSHTML(htmlText) 
  4. images = soup.findAll('img') 
  5. for image in images: 
  6. print image['src'] 

There are other HTML/XML parsing libraries in Python which could help out, as well. BeautifulSoup è ampiamente utilizzato, ha un buon numero di tutorial e una comunità di utenti che lo supporta, il che lo rende una buona scelta per uno scraper/parser.

Di Morena

Posso usare la mia smart TV come cornice digitale? Se sì, come? :: Come è il cellulare Honor 7C?
Link utili