h fS@sldZddlZddlZddlZddlmZdgZejdZejdZ ejdZ ejdZ ejd Z ejd Z ejd Zejd Zejd ZejdZejdZejdejZejdejZejd ZejdZGdddeZeZGdddejZdS)zA parser for HTML and XHTML.N)unescape HTMLParserz[&<]z &[a-zA-Z#]z%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]z)&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]z <[a-zA-Z]>z--\s*>z(([a-zA-Z][-.a-zA-Z0-9:_]*)(?:\s|/(?!>))*z$([a-zA-Z][^ />]*)(?:\s|/(?!>))*zJ\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*(\'[^\']*\'|"[^"]*"|[^\s"\'=<>`]*))?z]((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?(?:\s|/(?!>))*a <[a-zA-Z][-.a-zA-Z0-9:_]* # tag name (?:\s+ # whitespace before attribute name (?:[a-zA-Z_][-.:a-zA-Z0-9_]* # attribute name (?:\s*=\s* # value indicator (?:'[^']*' # LITA-enclosed value |\"[^\"]*\" # LIT-enclosed value |[^'\">\s]+ # bare value ) )? ) )* \s* # trailing whitespace aF <[a-zA-Z][^\t\n\r\f />\x00]* # tag name (?:[\s/]* # optional whitespace before attribute name (?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name (?:\s*=+\s* # value indicator (?:'[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^>\s]* # bare value ) (?:\s*,)* # possibly followed by a comma )?(?:\s|/(?!>))* )* )? \s* # trailing whitespace z#c@s1eZdZdZdddZddZdS)HTMLParseErrorz&Exception raised for all parse errors.NcCs3|s t||_|d|_|d|_dS)Nr)AssertionErrormsglinenooffset)selfrZpositionr 0/opt/alt/python34/lib64/python3.4/html/parser.py__init__Us   zHTMLParseError.__init__cCsW|j}|jdk r,|d|j}n|jdk rS|d|jd}n|S)Nz , at line %dz , column %dr)rr r )r resultr r r __str__[s  zHTMLParseError.__str__)NN)__name__ __module__ __qualname____doc__rrr r r r rRs rc@sfeZdZdZd;ZededdZddZd d Zd d Z d dZ dZ ddZ ddZ ddZddZddZdddZddZdd Zd!d"Zd#d$Zd%d&Zd'd(Zd)d*Zd+d,Zd-d.Zd/d0Zd1d2Zd3d4Zd5d6Zd7d8Zd9d:Z dS)'.)_HTMLParser__starttag_text)r r r r get_starttag_textszHTMLParser.get_starttag_textcCs2|j|_tjd|jtj|_dS)Nz )lowerr%recompileIr$)r elemr r r set_cdata_modeszHTMLParser.set_cdata_modecCst|_d|_dS)N)r#r$r%)r r r r clear_cdata_modes zHTMLParser.clear_cdata_modec Cs5|j}d}t|}x||kr|jr|j r|jd|}|dkr|jdt||d}|dkrtjdj || rPn|}qn=|j j ||}|r|j }n|jrPn|}||krH|jr.|j r.|j t |||qH|j |||n|j||}||krjPn|j}|d|r_tj||r|j|} n|d|r|j|} n|d|r|j|} n|d|r |j|} ng|d |rE|jr3|j|} qp|j|} n+|d |kro|j d|d } nP| dkrJ|sPn|jr|jd n|jd |d } | dkr|jd|d } | dkr|d } qn | d 7} |jr0|j r0|j t ||| qJ|j ||| n|j|| }q|d |r;tj||}|r|jdd} |j| |j} |d| d s| d } n|j|| }qqd||dkr7|j |||d|j||d}nPq|d|rtj||}|r|jd } |j| |j} |d| d s| d } n|j|| }qnt j||}|rS|rO|j||dkrO|jr|jdqO|j} | |kr6|} n|j||d }nPq|d |kr|j d|j||d }qPqdst!dqW|r||kr|j r|jr|j r|j t |||n|j ||||j||}n||d|_dS)Nr<&"z[\s;]z handle_declparse_bogus_comment)r rTr!gtposr r r rLCs &    z!HTMLParser.parse_html_declarationrcCs|j}|||ddks/td|jd|d}|dkrUd S|ry|j||d|n|dS) Nrhandle_comment)r rTZreportr!posr r r r]Xs & zHTMLParser.parse_bogus_commentcCs|j}|||ddks/tdtj||d}|sOdS|j}|j||d||j}|S)Nrz z junk characters in start tag: %rr<r<r<)rrg)r/check_for_whole_start_tagr!rtagfindrGtagfind_tolerantrrPrNr1r"attrfindattrfind_tolerantrappendstripr,countr=r?r.rCendswithhandle_startendtaghandle_starttagCDATA_CONTENT_ELEMENTSr6)r rTendposr!attrsrGrWtagmZattrnamerestZ attrvaluerPr r r r r rHps`       00    "zHTMLParser.parse_starttagcCsk|j}|jr'tj||}ntj||}|r[|j}|||d}|dkrs|dS|dkr|jd|r|dS|jd|rd S|jr|j||d|jdn||kr|S|dSn|dkrd S|dkrd S|jr@|j|||jd n||krP|S|dSnt d dS)Nrr/z/>rzmalformed empty start tagr z6abcdefghijklmnopqrstuvwxyz=/ABCDEFGHIJKLMNOPQRSTUVWXYZzmalformed start tagzwe should not get here!r<r<r<) r!rlocatestarttagendrGlocatestarttagend_tolerantrPrErDr.r)r rTr!ryrVnextr r r rjs>             z$HTMLParser.check_for_whole_start_tagcCs|j}|||ddks/tdtj||d}|sOd S|j}tj||}|sW|jdk r|j||||S|j r|j d|||fnt j||d}|s|||ddkr|dS|j |Sn|j dj}|jd|j}|j||dS|j dj}|jdk r||jkr|j||||Sn|j|j|j|S) Nrzrr<)r!r endendtagrArP endtagfindrGr%rCrr.rlr]rNr1r> handle_endtagr7)r rTr!rGr^Z namematchZtagnamer5r r r rIs< &  !  zHTMLParser.parse_endtagcCs!|j|||j|dS)N)rtr)r rxrwr r r rsszHTMLParser.handle_startendtagcCsdS)Nr )r rxrwr r r rtszHTMLParser.handle_starttagcCsdS)Nr )r rxr r r r szHTMLParser.handle_endtagcCsdS)Nr )r rXr r r rOszHTMLParser.handle_charrefcCsdS)Nr )r rXr r r rRszHTMLParser.handle_entityrefcCsdS)Nr )r r)r r r rCszHTMLParser.handle_datacCsdS)Nr )r r)r r r raszHTMLParser.handle_commentcCsdS)Nr )r Zdeclr r r r\szHTMLParser.handle_declcCsdS)Nr )r r)r r r rd"szHTMLParser.handle_picCs$|jr |jd|fndS)Nzunknown declaration: %r)rr.)r r)r r r unknown_decl%s zHTMLParser.unknown_declcCs tjdtddt|S)NzZThe unescape method is deprecated and will be removed in 3.5, use html.unescape() instead.rr)rrrr)r sr r r r*s  zHTMLParser.unescape)rr)!rrrrrurrrr*r+r.r/r0r6r7r(rLr]rKrHrjrIrsrtrrOrRrCrar\rdrrr r r r rfs<         < + *          )rr2rr&Zhtmlr__all__r3r#rSrQrMrFrcZ commentcloserkrlrmrnVERBOSEr|r}rr Exceptionrobjectrr'rr r r r s6