bg!dZddlZddlZddlmZddlmZddlm Z ddl m Z m Z m Z  eZn#e$reefZYnwxYw ddlmZn#e$r ddlmZYnwxYw ddlmZn#e$r ddlmZYnwxYwGd d eZ dd lmZGd d eZeZn #e$rYnwxYwdZddZ ddZ ddZddZ ddZ!dZ"eZ#dS)z? An interface to html5lib that mimics the lxml.html interface. N) HTMLParser) TreeBuilder)etree)ElementXHTML_NAMESPACE_contains_block_level_tag)urlopen)urlparseceZdZdZddZdS)rz*An html5lib HTML parser with lxml as tree.Fc :tj|f|td|dSN)stricttree) _HTMLParser__init__rselfrkwargss h/builddir/build/BUILD/cloudlinux-venv-1.0.7/venv/lib64/python3.11/site-packages/lxml/html/html5parser.pyrzHTMLParser.__init__s(TM&{MMfMMMMMNF__name__ __module__ __qualname____doc__rrrrrs444NNNNNNrr) XHTMLParserceZdZdZddZdS)rz+An html5lib XHTML Parser with lxml as tree.Fc :tj|f|td|dSr ) _XHTMLParserrrrs rrzXHTMLParser.__init__*s(  !$ RvK R R6 R R R R RrNrrrrrrr's499 S S S S S Srrct||}||S|dtd|S)N{})findr)rtagelems r _find_tagr(0s; 99S>>D  999##6 7 77rct|tstd|t}i}|t|trd}|||d<|j|fi|S)z Parse a whole document into a string. If `guess_charset` is true, or if the input is not Unicode but a byte string, the `chardet` library will perform charset guessing on the string. string requiredNT useChardet) isinstance_strings TypeError html_parserbytesparsegetroot)html guess_charsetparseroptionss rdocument_fromstringr77s dH % %+)*** ~GD%!8!8  -  6< ( ( ( ( 0 0 2 22rFctt|tstd|t}i}|t|trd}|||d<|j|dfi|}|rWt|dtr<|r:|drtjd|dz|d=|S)a`Parses several HTML elements, returning a list of elements. The first item in the list may be a string. If no_leading_text is true, then it will be an error if there is leading text, and it will always be a list of only elements. If `guess_charset` is true, the `chardet` library will perform charset guessing on the string. r*NFr+divrzThere is leading text: %r) r,r-r.r/r0 parseFragmentstripr ParserError)r3no_leading_textr4r5r6childrens rfragments_fromstringr?Os dH % %+)*** ~GD%!8!8  - #v#D%;;7;;HJx{H55  {  "" 5'(C(0 )4555 Orc|t|tstdt|}t |||| }|rjt|tsd}t |}|r@t|dtr|d|_|d=|||S|stj dt|dkrtj d|d}|j r5|j rtj d|j zd |_ |S) aParses a single HTML element; it is an error if there is more than one element, or if anything but whitespace precedes or follows the element. If 'create_parent' is true (or is a tag name) then a parent node will be created to encapsulate the HTML in a single element. In this case, leading or trailing text is allowed. If `guess_charset` is true, the `chardet` library will perform charset guessing on the string. r*)r4r5r=r9rzNo elements foundzMultiple elements foundzElement followed by text: %rN) r,r-r.boolr?rtextextendrr<lentailr;)r3 create_parentr4r5accept_leading_textelementsnew_rootresults rfragment_fromstringrLqsU dH % %+)***}--# M&//111H -22 "!M=))  &(1+x00 (  QK OOH % % % 5 3444 8}}q 9::: a[F {Nv{((**N > LMMMFK Mrct|tstdt|||}|dd}t|tr|dd}|}|ds|dr|St|d }t|r|St|d }t|d krT|j r|j s4|d j r|d j s|d St|rd|_nd|_|S)aParse the html, returning a single element/document. This tries to minimally parse the chunk of text, without knowing if it is a fragment or a document. 'base_url' will set the document's base_url attribute (and the tree's docinfo.URL) If `guess_charset` is true, or if the input is not Unicode but a byte string, the `chardet` library will perform charset guessing on the string. r*)r5r4N2asciireplacezrus ......888888IIIIIIIIIIHHs|HHH''''&&&&&&&&'&!!!!!!!&&&%%%%%%%%&NNNNNNNN !444444SSSSSlSSS ;==LL   D 888333300548D-237))))X3333l!'!'!'!'H   jll sA+ 77A AAA A)(A);BB"!B"