Moving a div inside p by to the body element

I get malformed HTML input with divs inside other HTML elements like the html_string variable in the below code. As this is not valid HTML, parsers give me unexpected results and it can't be used logically.

I am trying to fix this HTML by reading it as an XML initially using BeautifulSoup and then re-positioning the div to the body element which is where it belongs in my case.

Understanding that this is a valid XML but invalid HTML is one of the the keys to this solution. Can anyone review this code?

This is used to fix page-break divs that come from a certain source and it is not a regular HTML.

import bs4  html_string = """ <html> <head>   <title></title> </head> <body>   <p align="center">   This is before.   <div style="page-break-after:always">   </div>   This is after.   </p> </body> </html> """  html_element = bs4.BeautifulSoup(html_string, features="xml")  style = {'style': 'page-break-after:always'}  page_break_elements = html_element.findAll('div', style)  for page_break_element in page_break_elements:     current = page_break_element     while True:         parent = current.parent         if parent is None:             break         if parent.name == 'body':             current.insert_before(page_break_element)             break         current = parent 

Replay

Category: python Time: 2016-07-30 Views: 3

Related post

iOS development

Android development

Python development

JAVA development

Development language

PHP development

Ruby development

search

Front-end development

Database

development tools

Open Platform

Javascript development

.NET development

cloud computing

server

Copyright (C) avrocks.com, All Rights Reserved.

processed in 0.230 (s). 12 q(s)