ó •abc@sBddlZddlZddlmZdefd„ƒYZdS(i˙˙˙˙Ni(t ProbingStatet CharSetProbercBs€eZdZd d„Zd„Zed„ƒZd„Zed„ƒZ d„Z e d„ƒZ e d„ƒZ e d „ƒZRS( gffffffî?cCs(d|_||_tjtƒ|_dS(N(tNonet_statet lang_filtertloggingt getLoggert__name__tlogger(tselfR((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pyt__init__'s  cCstj|_dS(N(Rt DETECTINGR(R ((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pytreset,scCsdS(N(R(R ((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pyt charset_name/scCsdS(N((R tbuf((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pytfeed3scCs|jS(N(R(R ((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pytstate6scCsdS(Ng((R ((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pytget_confidence:scCstjdd|ƒ}|S(Ns([-])+t (tretsub(R((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pytfilter_high_byte_only=scCsztƒ}tjd|ƒ}xX|D]P}|j|d ƒ|d}|jƒ re|dkred}n|j|ƒq"W|S(s5 We define three types of bytes: alphabet: english alphabets [a-zA-Z] international: international characters [€-˙] marker: everything else [^a-zA-Z€-˙] The input buffer can be thought to contain a series of words delimited by markers. This function works to filter all words that contain at least one international character. All contiguous sequences of markers are replaced by a single space ascii character. This filter applies to all scripts which do not use English characters. s%[a-zA-Z]*[€-˙]+[a-zA-Z]*[^a-zA-Z€-˙]?i˙˙˙˙s€R(t bytearrayRtfindalltextendtisalpha(Rtfilteredtwordstwordt last_char((sE/usr/lib/python2.7/site-packages/pip/_vendor/chardet/charsetprober.pytfilter_international_wordsBs      cCsčtƒ}t}d}x˛tt|ƒƒD]ž}|||d!}|dkrTt}n|dkrit}n|dkr(|jƒ r(||krš| rš|j|||!ƒ|jdƒn|d}q(q(W|sä|j||ƒn|S(sČ Returns a copy of ``buf`` that retains only the sequences of English alphabet and high byte characters that are not between <> characters. Also retains English alphabet and high byte characters immediately before occurrences of >. This filter can be applied to all scripts which contain both English characters and extended ASCII characters, but is currently only used by ``Latin1Prober``. iit>ts