Re>ddlZddlZddlmZGddeZdS)N) ProbingStateceZdZdZd dZdZedZdZedZ dZ e d Z e d Z e d ZdS) CharSetProbergffffff?Nc^d|_||_tjt|_dSN)_state lang_filterlogging getLogger__name__logger)selfr s /builddir/build/BUILDROOT/alt-python311-pip-21.3.1-3.el8.x86_64/opt/alt/python311/lib/python3.11/site-packages/pip/_vendor/chardet/charsetprober.py__init__zCharSetProber.__init__'s' &'11 c(tj|_dSr)r DETECTINGr rs rresetzCharSetProber.reset,s", rcdSrrs r charset_namezCharSetProber.charset_name/strcdSrr)rbufs rfeedzCharSetProber.feed3s rc|jSr)r rs rstatezCharSetProber.state6s {rcdS)Ngrrs rget_confidencezCharSetProber.get_confidence:ssrc2tjdd|}|S)Ns([-])+ )resub)rs rfilter_high_byte_onlyz#CharSetProber.filter_high_byte_only=sf&c22 rct}tjd|}|D]Z}||dd|dd}|s|dkrd}||[|S)u9 We define three types of bytes: alphabet: english alphabets [a-zA-Z] international: international characters [€-ÿ] marker: everything else [^a-zA-Z€-ÿ] The input buffer can be thought to contain a series of words delimited by markers. This function works to filter all words that contain at least one international character. All contiguous sequences of markers are replaced by a single space ascii character. This filter applies to all scripts which do not use English characters. s%[a-zA-Z]*[-]+[a-zA-Z]*[^a-zA-Z-]?Nr") bytearrayr#findallextendisalpha)rfilteredwordsword last_chars rfilter_international_wordsz(CharSetProber.filter_international_wordsBs;;  O   ' 'D OOD"I & & & RSS I$$&& !9w+>+> OOI & & & &rct}d}d}tt|D]y}|||dz}|dkrd}n|dkrd}|dkrS|s?||kr4|s2|||||d|dz}z|s|||d |S) a Returns a copy of ``buf`` that retains only the sequences of English alphabet and high byte characters that are not between <> characters. Also retains English alphabet and high byte characters immediately before occurrences of >. This filter can be applied to all scripts which contain both English characters and extended ASCII characters, but is currently only used by ``Latin1Prober``. Frr>rCsi: nnnnnFnnnnnr