U en@sdZddlZddlZddlmZmZddlmZddlm Z ddlm Z m Z ddlm Z m Z dd d d d d dgZedejejBjZedejjZedejjZedejjZedejjZedejjZddZedjZedejejBZedZ ejdde idZ!Gdd d Z"e"Z#e#j$Z$edejedejgZ%d d!d"d#d$d%gZ&ed&ejed'ejed(gZ'd)gZ(e%e&e'e(fd*d Z)d+d,Z*d-d Z+e)je+_d!d d"gZ,d.gZ-d/e,e-e.d0fd1d Z/d2dZ0d3d4Z1ed5ejZ2d6d7Z3dS)8zcA cleanup tool for HTML. Removes unwanted tags and content. See the `Cleaner` class for details. N)urlsplit unquote_plus)etree)defs) fromstringXHTML_NAMESPACE) xhtml_to_html_transform_result clean_htmlcleanCleanerautolink autolink_html word_breakword_break_htmlzexpression\s*\(.*?\)z @\s*importzzdescendant-or-self::*[@style]zdescendant-or-self::a [normalize-space(@href) and substring(normalize-space(@href),1,1) != '#'] |descendant-or-self::x:a[normalize-space(@href) and substring(normalize-space(@href),1,1) != '#']x) namespacesc @seZdZdZdZdZdZdZdZdZ dZ dZ dZ dZ dZdZdZdZdZdZdZdZejZdZdZddhZdd Zed d d d gd d d d dZddZddZddZ ddZ!ddZ"d"ddZ#ddZ$e%&de%j'j(Z)ddZ*d d!Z+dS)#r a Instances cleans the document of each of the possible offending elements. The cleaning is controlled by attributes; you can override attributes in a subclass, or set them in the constructor. ``scripts``: Removes any ``