bgndZddlmZddlZddlZddlZ ddlmZddlm Z n#e $r ddl mZm Z YnwxYwddl m Z ddlmZdd lmZmZdd lmZmZ en #e$reZYnwxYw en #e$reZYnwxYw en#e$reefZYnwxYwgd Zejd ejejzjZ ejd ejjZ!ejdgej"ddkrej#fndRj$Z%ejdejj&Z'ejdejj&Z(ejdejj$Z)dZ*ejdjZ+ejdejejzZ,e j-dZ.e j-ddeiZ/Gdde0Z1e1Z2e2j3Z3ejdejejdejgZ4gdZ5ejd ejejd!ejejd"gZ6d#gZ7e4e5e6e7fd$Z8d%Z9d&Z:e8je:_gd'Z;d(gZd-Z?ejd.ejZ@d/ZAdS)0zcA cleanup tool for HTML. Removes unwanted tags and content. See the `Cleaner` class for details. )absolute_importN)urlsplit) unquote_plus)rr)etree)defs) fromstringXHTML_NAMESPACE) xhtml_to_html_transform_result) clean_htmlcleanCleanerautolink autolink_html word_breakword_break_htmlzexpression\s*\(.*?\)z @\s*importzzdescendant-or-self::*[@style]zdescendant-or-self::a [normalize-space(@href) and substring(normalize-space(@href),1,1) != '#'] |descendant-or-self::x:a[normalize-space(@href) and substring(normalize-space(@href),1,1) != '#']x) namespacesc eZdZdZdZdZdZdZdZdZ dZ dZ dZ dZ dZdZdZdZdZdZdZdZejZdZdZddhZdZed d d d gd d d d ZdZdZdZ dZ!dZ"ddZ#dZ$e%j&de%j'j(Z)dZ*dZ+dS)ra Instances cleans the document of each of the possible offending elements. The cleaning is controlled by attributes; you can override attributes in a subclass, or set them in the constructor. ``scripts``: Removes any ``