bg:dZddlZddlZddlZddlmZmZmZddlm Z ddl m Z ddl m Z mZmZddlmZdd lmZdd lmZdd lmZdd lmZdd lmZddlmZGddZdS)a Module containing the UniversalDetector detector class, which is the primary class a user of ``chardet`` should use. :author: Mark Pilgrim (initial port to Python) :author: Shy Shalom (original C code) :author: Dan Blanchard (major refactoring for 3.0) :author: Ian Cordasco N)ListOptionalUnion)CharSetGroupProber) CharSetProber) InputStateLanguageFilter ProbingState)EscCharSetProber) Latin1Prober)MacRomanProber)MBCSGroupProber) ResultDict)SBCSGroupProber) UTF1632Proberc XeZdZdZdZejdZejdZejdZ dddd d d d d dZ dddd ddddZ e j dfde deddfdZedefdZedefdZedeefdZd!dZdeeefddfdZdefd ZdS)"UniversalDetectoraq The ``UniversalDetector`` class underlies the ``chardet.detect`` function and coordinates all of the different charset probers. To get a ``dict`` containing an encoding and its confidence, you can simply run: .. code:: u = UniversalDetector() u.feed(some_bytes) u.close() detected = u.result g?s[-]s(|~{)s[-]z Windows-1252z Windows-1250z Windows-1251z Windows-1256z Windows-1253z Windows-1255z Windows-1254z Windows-1257) iso-8859-1z iso-8859-2z iso-8859-5z iso-8859-6z iso-8859-7z iso-8859-8 iso-8859-9z iso-8859-13z ISO-8859-11GB18030CP949UTF-16)asciirztis-620rgb2312zeuc-krzutf-16leF lang_filtershould_rename_legacyreturnNc d|_d|_g|_dddd|_d|_d|_t j|_d|_ ||_ tj t|_d|_||_|dS)Nencoding confidencelanguageF)_esc_charset_prober_utf1632_prober_charset_probersresultdone _got_datar PURE_ASCII _input_state _last_charrlogging getLogger__name__logger_has_win_bytesrreset)selfrrs P/opt/cloudlinux/venv/lib64/python3.11/site-packages/chardet/universaldetector.py__init__zUniversalDetector.__init__ds @D 8<57# #   &1&'11 #$8! r%c|jSN)r-r5s r6 input_statezUniversalDetector.input_state{s   r%c|jSr9)r3r:s r6 has_win_byteszUniversalDetector.has_win_bytess ""r%c|jSr9)r(r:s r6charset_probersz!UniversalDetector.charset_proberss $$r%c2dddd|_d|_d|_d|_tj|_d|_|jr|j |j r|j |j D]}| dS)z Reset the UniversalDetector and all of its probers back to their initial states. This is called by ``__init__``, so you only need to call this directly in between analyses of different documents. Nr r!Fr%) r)r*r+r3r r,r-r.r&r4r'r()r5probers r6r4zUniversalDetector.resets $(sMM  #&1  # -  $ * * , , ,   )  & & ( ( (+  F LLNNNN  r%byte_strcp|jrdS|sdSt|tst|}|js|t jr dddd|_n|t jt j fr dddd|_nx|dr dddd|_nW|d r d ddd|_n6|t j t j fr d ddd|_d |_|jd  d |_dS|j tjkrt|j|rtj|_ nH|j tjkr3|j|j|zrtj|_ |dd|_|jst-|_|jjt0jkr]|j|t0jkr5|jj|jdd|_d |_dS|j tjkr|jst?|j |_|j|t0jkr?|jj|j|jj!d|_d |_dSdS|j tjkr'|j"stG|j g|_"|j tHj%zr&|j"&tO|j"&tQ|j"&tS|j"D]U}||t0jkr0|j||j!d|_d |_nV|j*|r d |_+dSdSdS)a Takes a chunk of a document and feeds it through all of the relevant charset probers. After calling ``feed``, you can check the value of the ``done`` attribute to see if you need to continue feeding the ``UniversalDetector`` more data, or if it has made a prediction (in the ``result`` attribute). .. note:: You should always call ``close`` when you're done feeding in your document if ``done`` is not already ``True``. Nz UTF-8-SIG?r!zUTF-32szX-ISO-10646-UCS-4-3412szX-ISO-10646-UCS-4-2143rTr"),r* isinstance bytearrayr+ startswithcodecsBOM_UTF8r) BOM_UTF32_LE BOM_UTF32_BEBOM_LEBOM_BEr-r r,HIGH_BYTE_DETECTORsearch HIGH_BYTE ESC_DETECTORr. ESC_ASCIIr'rstater DETECTINGfeedFOUND_IT charset_nameget_confidencer&r rr$r(rr NON_CJKappendrr rWIN_BYTE_DETECTORr3)r5rBrAs r6rWzUniversalDetector.feeds] 9  F  F(I.. + **H~% ""6?33 X!,"% " $$f&96;N%OPP X,43TVWW $$%899 X!9"% "  $$%899 X!9"% "  $$fmV]%CDD X,43TVWW !DN{:&2     5 5 5&--h77 9$.$8!!!Z%:::%,,T_x-GHH;%/$8!"233-# 3#0??D   %)? ? ?#((22l6KKK $ 4 A"&"6"E"E"G"G " !    4 4 4+ N+;D>>/  ;;x((L,AAA$*$7&,&;&;&=&=$*O##DK !%DIEB%,,X66 +&*###%7 6" + +r%c |jr|jSd|_|js|jdn%|jt jkr dddd|_n|jt jkrd}d}d}|j D]#}|s| }||kr|}|}$|r||j kr|j }|J| }| }|d r"|jr|j||}|jr/|j|pd |}|||jd|_|jt,jkr|jd |jd |j D]}|st1|t2rD|jD];}|jd |j |j| <^|jd |j |j| |jS) z Stop analyzing the current document and come up with a final prediction. :returns: The ``result`` attribute, a ``dict`` with the keys `encoding`, `confidence`, and `language`. Tzno data received!rrDrEr!Nr ziso-8859r"z no probers hit minimum thresholdz%s %s confidence = %s)r*r)r+r2debugr-r r,rRr(rZMINIMUM_THRESHOLDrYlowerrIr3 ISO_WIN_MAPgetr LEGACY_MAPr$getEffectiveLevelr/DEBUGrGrprobers) r5prober_confidencemax_prober_confidence max_proberrArYlower_charset_namer# group_probers r6closezUniversalDetector.closes 9 ;  ~(  K  1 2 2 2 2 *"7 7 7'.crRRDKK *"6 6 6 $ $' !J/ ( ($*$9$9$;$;!$'<<<,=)!'J 4t7MMM)6 #///%1%7%7%9%9"'6688 &00<<*'+'7';';. (( ,#'?#6#6%+2244l$$L!-", * 3  ; ( ( * *gm ; ;{:&. !!"DEEE$($9L'! !,0BCC&2&:F K-- 7 & 3 & & 5 5 7 7  ))3(5(1(7799  {r%)rN)r1 __module__ __qualname____doc__r`recompilerPrSr]rbrdr ALLboolr7propertyintr;r=rrr?r4rbytesrHrWrrmr%r6rr8s #N332:l++L" >22$$$$$$$%  K $ $J'5&8%*##  .!S!!!X!#t###X#%m!4%%%X%&A+U5)#34A+A+A+A+A+FMzMMMMMMr%r)rprJr/rqtypingrrrcharsetgroupproberr charsetproberrenumsr r r escproberr latin1proberr macromanproberrmbcsgroupproberr resultdictrsbcsgroupproberr utf1632proberrrrxr%r6rsF8  ((((((((((222222((((((;;;;;;;;;;''''''&&&&&&******,,,,,,"""""",,,,,,((((((rrrrrrrrrrr%