Re0dZddlZddlZddlZddlmZddlmZmZm Z ddl m Z ddl m Z ddlmZdd lmZGd d eZdS) a Module containing the UniversalDetector detector class, which is the primary class a user of ``chardet`` should use. :author: Mark Pilgrim (initial port to Python) :author: Shy Shalom (original C code) :author: Dan Blanchard (major refactoring for 3.0) :author: Ian Cordasco N)CharSetGroupProber) InputStateLanguageFilter ProbingState)EscCharSetProber) Latin1Prober)MBCSGroupProber)SBCSGroupProberc eZdZdZdZejdZejdZejdZ dddd d d d d dZ e j fdZ dZdZdZdS)UniversalDetectoraq The ``UniversalDetector`` class underlies the ``chardet.detect`` function and coordinates all of the different charset probers. To get a ``dict`` containing an encoding and its confidence, you can simply run: .. code:: u = UniversalDetector() u.feed(some_bytes) u.close() detected = u.result g?s[-]s(|~{)s[-]z Windows-1252z Windows-1250z Windows-1251z Windows-1256z Windows-1253z Windows-1255z Windows-1254z Windows-1257)z iso-8859-1z iso-8859-2z iso-8859-5z iso-8859-6z iso-8859-7z iso-8859-8z iso-8859-9z iso-8859-13cd|_g|_d|_d|_d|_d|_d|_||_tj t|_ d|_ | dS)N)_esc_charset_prober_charset_probersresultdone _got_data _input_state _last_char lang_filterlogging getLogger__name__logger_has_win_bytesreset)selfrs /builddir/build/BUILDROOT/alt-python311-pip-21.3.1-3.el8.x86_64/opt/alt/python311/lib/python3.11/site-packages/pip/_vendor/chardet/universaldetector.py__init__zUniversalDetector.__init__Qsi#' "   &'11 " cdddd|_d|_d|_d|_tj|_d|_|jr|j |j D]}| dS)z Reset the UniversalDetector and all of its probers back to their initial states. This is called by ``__init__``, so you only need to call this directly in between analyses of different documents. Nencoding confidencelanguageFr ) rrrrr PURE_ASCIIrrrrr)rprobers rrzUniversalDetector.reset^s $(sMM  #&1  # -  $ * * , , ,+  F LLNNNN  r c|jrdSt|sdSt|tst|}|js|t jr dddd|_n|t j t j fr dddd|_nx|dr dddd|_nW|d r d ddd|_n6|t j t j fr d ddd|_d |_|jd  d |_dS|j tjkrt|j|rtj|_ nH|j tjkr3|j|j|zrtj|_ |dd|_|j tjkr|jst/|j|_|j|t4jkr?|jj|j|jjd|_d |_dSdS|j tjkr|jsztA|jg|_|jtBj"zr&|j#tI|j#tK|jD]U}||t4jkr0|j||jd|_d |_nV|j&|r d |_'dSdSdS)a Takes a chunk of a document and feeds it through all of the relevant charset probers. After calling ``feed``, you can check the value of the ``done`` attribute to see if you need to continue feeding the ``UniversalDetector`` more data, or if it has made a prediction (in the ``result`` attribute). .. note:: You should always call ``close`` when you're done feeding in your document if ``done`` is not already ``True``. Nz UTF-8-SIG?r#zUTF-32szX-ISO-10646-UCS-4-3412szX-ISO-10646-UCS-4-2143zUTF-16Tr$)(rlen isinstance bytearrayr startswithcodecsBOM_UTF8r BOM_UTF32_LE BOM_UTF32_BEBOM_LEBOM_BErrr'HIGH_BYTE_DETECTORsearch HIGH_BYTE ESC_DETECTORr ESC_ASCIIrrrfeedrFOUND_IT charset_nameget_confidencer&rr rNON_CJKappendr r WIN_BYTE_DETECTORr)rbyte_strr(s rr<zUniversalDetector.feedos 9  F8}}  F(I.. + **H~" ""6?33 /+6-0+-// $$f&9&,&9&;<< /,4-0+-// $$%899 /+C-0+-// $$%899 /+C-0+-// $$fmV]%CDD /,4-0+-// "DN{:&2     5 5 5&--h77 9$.$8!!"j&;;;%,,T_x-GHH<$.$8!"233-   4 4 4+ N+;Dlowerr0r ISO_WIN_MAPgetr&getEffectiveLevelrDEBUGr.rprobers) rprober_confidencemax_prober_confidence max_proberr(r>lower_charset_namer% group_probers rclosezUniversalDetector.closes 9 ;  ~! @ K  1 2 2 2 2 *"7 7 7'.),')++DKK  *"6 6 6 $ $' !J/ ( ($*$9$9$;$;!$'<<<,=)!'J @4t7MMM)6 %/%<%B%B%D%D"'6688 &00<<J*J'+'7';';@@ ; ( ( * *gm ; ;{:&. !!"DEEE$($9 I IL'! !,0BCC I&2&:GGF K--.E.4.A.4o.4.C.C.E.EGGGGG  ))*A*6*C*6*?*6*E*E*G*GIIII{r N)r __module__ __qualname____doc__rGrecompiler7r:rBrIrALLrrr<rSr rr r 3s #N332:l++L" >22!/!/!/!/!/!/!/"022K$2#5    "k+k+k+ZBBBBBr r )rVr1rrWcharsetgroupproberrenumsrrr escproberr latin1proberr mbcsgroupproberr sbcsgroupproberr objectr rZr rrbs8  222222;;;;;;;;;;''''''&&&&&&,,,,,,,,,,,,kkkkkkkkkkr