Sunday, February 12, 2012

How to turn a Html nested list into a Python's one - Stack Overflow

With BeautifulSoup, I'd do something like this:

from BeautifulSoup import BeautifulSoup from pprint import pprint  def parseList(tag):     if tag.name == 'ul':         return [parseList(item)                 for item in tag.findAll('li', recursive=False)]     elif tag.name == 'li':         if tag.ul is None:             return tag.text         else:             return (tag.contents[0].string.strip(), parseList(tag.ul))  soup = BeautifulSoup(lista) pprint(parseList(soup.ul)) 

Example output:

[(u'Arts & Entertainment',   [u'Celebrities & Entertainment News',    (u'Comics & Animation',     [u'Anime & Manga', u'Cartoons', u'Comics'])])] 

Note that for list items that contain an unnumbered list, a tuple is returned in which the first element is the string in the list item and the second element is a list with the contents of the unnumbered list.

Source: http://stackoverflow.com/questions/9249151/how-to-turn-a-html-nested-list-into-a-pythons-one

world series game 2 world series game 2 libya bay area news lettuce recall lettuce recall zanesville ohio

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.