With BeautifulSoup, I'd do something like this:
from BeautifulSoup import BeautifulSoup from pprint import pprint def parseList(tag): if tag.name == 'ul': return [parseList(item) for item in tag.findAll('li', recursive=False)] elif tag.name == 'li': if tag.ul is None: return tag.text else: return (tag.contents[0].string.strip(), parseList(tag.ul)) soup = BeautifulSoup(lista) pprint(parseList(soup.ul))
Example output:
[(u'Arts & Entertainment', [u'Celebrities & Entertainment News', (u'Comics & Animation', [u'Anime & Manga', u'Cartoons', u'Comics'])])]
Note that for list items that contain an unnumbered list, a tuple is returned in which the first element is the string in the list item and the second element is a list with the contents of the unnumbered list.
Source: http://stackoverflow.com/questions/9249151/how-to-turn-a-html-nested-list-into-a-pythons-one
world series game 2 world series game 2 libya bay area news lettuce recall lettuce recall zanesville ohio
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.