abc@sdZddlZddlZddlZddlmZddlmZmZm Z ddl m Z ddl m Z ddlmZdd lmZd efd YZdS( s Module containing the UniversalDetector detector class, which is the primary class a user of ``chardet`` should use. :author: Mark Pilgrim (initial port to Python) :author: Shy Shalom (original C code) :author: Dan Blanchard (major refactoring for 3.0) :author: Ian Cordasco iNi(tCharSetGroupProber(t InputStatetLanguageFiltert ProbingState(tEscCharSetProber(t Latin1Prober(tMBCSGroupProber(tSBCSGroupProbertUniversalDetectorcBseZdZdZejdZejdZejdZidd6dd6d d 6d d 6d d6dd6dd6dd6Z e j dZ dZ dZdZRS(sq The ``UniversalDetector`` class underlies the ``chardet.detect`` function and coordinates all of the different charset probers. To get a ``dict`` containing an encoding and its confidence, you can simply run: .. code:: u = UniversalDetector() u.feed(some_bytes) u.close() detected = u.result g?s[-]s(|~{)s[-]s Windows-1252s iso-8859-1s Windows-1250s iso-8859-2s Windows-1251s iso-8859-5s Windows-1256s iso-8859-6s Windows-1253s iso-8859-7s Windows-1255s iso-8859-8s Windows-1254s iso-8859-9s Windows-1257s iso-8859-13cCsqd|_g|_d|_d|_d|_d|_d|_||_t j t |_ d|_ |jdS(N(tNonet_esc_charset_probert_charset_proberstresulttdonet _got_datat _input_statet _last_chart lang_filtertloggingt getLoggert__name__tloggert_has_win_bytestreset(tselfR((sI/usr/lib/python2.7/site-packages/pip/_vendor/chardet/universaldetector.pyt__init__Qs         cCsidd6dd6dd6|_t|_t|_t|_tj|_d|_ |j rg|j j nx|j D]}|j qqWdS(s Reset the UniversalDetector and all of its probers back to their initial states. This is called by ``__init__``, so you only need to call this directly in between analyses of different documents. tencodinggt confidencetlanguagetN( R R tFalseR RRRt PURE_ASCIIRRR RR (Rtprober((sI/usr/lib/python2.7/site-packages/pip/_vendor/chardet/universaldetector.pyR^s      cCsy|jr dSt|sdSt|ts;t|}n|js{|jtjrwidd6dd6dd6|_n|jtj tj fridd6dd6dd6|_n|jd rid d6dd6dd6|_nl|jd rid d6dd6dd6|_n<|jtj tj frOid d6dd6dd6|_nt |_|jddk r{t |_dSn|jtjkr|jj|rtj|_q|jtjkr|jj|j|rtj|_qn|d|_|jtjkr|js(t|j|_n|jj|tjkrui|jjd6|jjd6|jj d6|_t |_qun|jtjkru|j!st"|jg|_!|jt#j$@r|j!j%t&n|j!j%t'nx`|j!D]U}|j|tjkri|jd6|jd6|j d6|_t |_PqqW|j(j|rut |_)qundS(s Takes a chunk of a document and feeds it through all of the relevant charset probers. After calling ``feed``, you can check the value of the ``done`` attribute to see if you need to continue feeding the ``UniversalDetector`` more data, or if it has made a prediction (in the ``result`` attribute). .. note:: You should always call ``close`` when you're done feeding in your document if ``done`` is not already ``True``. Ns UTF-8-SIGRg?RRRsUTF-32ssX-ISO-10646-UCS-4-3412ssX-ISO-10646-UCS-4-2143sUTF-16i(*R tlent isinstancet bytearrayRt startswithtcodecstBOM_UTF8R t BOM_UTF32_LEt BOM_UTF32_BEtBOM_LEtBOM_BEtTrueR RRRtHIGH_BYTE_DETECTORtsearcht HIGH_BYTEt ESC_DETECTORRt ESC_ASCIIR RRtfeedRtFOUND_ITt charset_nametget_confidenceRR RRtNON_CJKtappendRRtWIN_BYTE_DETECTORR(Rtbyte_strR ((sI/usr/lib/python2.7/site-packages/pip/_vendor/chardet/universaldetector.pyR1os~                  c Cs>|jr|jSt|_|js5|jjdn1|jtjkrhidd6dd6dd6|_n|jtj krfd }d}d }xD|j D]9}|sqn|j }||kr|}|}qqW|rf||j krf|j}|jj}|j }|jd r?|jr?|jj||}q?ni|d6|d6|jd6|_qfn|jjtjkr7|jdd kr7|jjd x|j D]}|sqnt|trx^|jD]+}|jjd |j|j|j qWq|jjd |j|j|j qWq7n|jS( s Stop analyzing the current document and come up with a final prediction. :returns: The ``result`` attribute, a ``dict`` with the keys `encoding`, `confidence`, and `language`. sno data received!tasciiRg?RRRgsiso-8859s no probers hit minimum thresholds%s %s confidence = %sN(R R R+RRtdebugRRRR.R R R4tMINIMUM_THRESHOLDR3tlowerR$Rt ISO_WIN_MAPtgetRtgetEffectiveLevelRtDEBUGR"Rtprobers( Rtprober_confidencetmax_prober_confidencet max_proberR R3tlower_charset_nameRt group_prober((sI/usr/lib/python2.7/site-packages/pip/_vendor/chardet/universaldetector.pytcloses`              (Rt __module__t__doc__R;tretcompileR,R/R7R=RtALLRRR1RG(((sI/usr/lib/python2.7/site-packages/pip/_vendor/chardet/universaldetector.pyR3s"    m(RIR%RRJtcharsetgroupproberRtenumsRRRt escproberRt latin1proberRtmbcsgroupproberRtsbcsgroupproberRtobjectR(((sI/usr/lib/python2.7/site-packages/pip/_vendor/chardet/universaldetector.pyt$s