U e@sddlmZddlZddlZddlZddlZddlZddlZddl Z ddl Z ddl m Z ddl mZddl mZddlmZee jZdZdZd Zd Zd Zd Zd ZdZeeeeefZeeefZeeefZeeeZej ej!"ej ej#"ej ej$"ej ej%"eeZ&dddddZ'dddddZ(dddddZ)dddddZ*dddddZ+dd d!d"d#Z,dd$d$d%d&d'Z-dd$dd(d)Z.dd$dd*d+Z/e0d,ej1ej2BZ3e0d-Z4ddd.d/d0Z5dd1d2d3d4Z6dS)5) annotationsN)IO) extensions) interpreters)licenses directorysymlinksocketfile executableznon-executabletextbinarystrzset[str])pathreturnc Cs>zt|}Wn&ttfk r4t|dYnX|j}t|rLthSt|r\t hSt |rlt hSt h}t |tj}|r|tn |tttj|}t|dkr||n*|rt|}t|dkr|t|dt|@st|r |tn |tt|@s(t|t|@s:t||S)N does not exist.r)oslstatOSError ValueErrorst_modestatS_ISDIR DIRECTORYS_ISLNKSYMLINKS_ISSOCKSOCKETFILEaccessX_OKadd EXECUTABLENON_EXECUTABLEtags_from_filenamerbasenamelenupdateparse_shebang_from_filetags_from_interpreter ENCODING_TAGS file_is_textTEXTBINARYAssertionError MODE_TAGS)rsrmodetagsr tshebangr5?/opt/hc_python/lib/python3.8/site-packages/identify/identify.pytags_from_path(s<            r7cCstj|\}}tj|\}}t}|g|dD]"}|tjkr6|tj|qZq6t|dkr|dd }|tj kr|tj |n|tj kr|tj ||S)N.r) rrsplitsplitextsetrNAMESr'r&lower EXTENSIONSEXTENSIONS_NEED_BINARY_CHECK)r_filenameextretpartr5r5r6r$Vs    r$) interpreterrcCs@|d\}}}|r:|tjkr(tj|S|d\}}}qtS)N/r8) rpartitionr INTERPRETERSr<)rFrAr5r5r6r)ls   r)z IO[bytes]bool)bytesiorc CsLtddddddddgttd d ttd d }t|d d| S)zReturn whether the first KB of contents seems to be binary. This is roughly based on libmagic's binary/text detection: https://github.com/file/file/blob/df74b09b9027676088c797528edcaae5a9ce9ad0/src/encoding.c#L203-L228   iN) bytearrayrangerJread translate)rKZ text_charsr5r5r6is_textys  r\c CsDtj|st|dt|d}t|W5QRSQRXdS)Nrrb)rrlexistsropenr\)rfr5r5r6r+s  r+z list[str])linercCs.z t|WStk r(|YSXdS)N)shlexr:r)rar5r5r6_shebang_splits rcztuple[str, ...])rKcmdrcCs|ddkr|}z|d}Wntk r<|YSX|D]}|tkrB|SqBtt|}t|ddD] \}}|dkrqx||df}qxq|S)N#!UTF-8z-ir9) rZreadlinedecodeUnicodeDecodeError printabletuplercstrip enumerate)rKrdZ next_line_b next_linecZ line_tokensitokenr5r5r6_parse_nix_shebangs  rtcCs|ddkrdS|}z|d}Wntk r>YdSX|D]}|tkrDdSqDtt|}|r|ddkr|ddkr|dd }n |dd }|d krt||S|S) z8Parse the shebang from a file opened for reading binary.rerfr5rgrz /usr/bin/envr9z-SN)z nix-shell) rZrirjrkrlrmrcrnrt)rKZ first_line_b first_linerqrdr5r5r6 parse_shebangs$   rvc Cstj|st|dt|tjs,dSz,t|d}t|W5QRWSQRXWn:tk r}z|j t j krWY dSW5d}~XYnXdS)z$Parse the shebang given a file path.rr5r]N) rrr^rrr r_rvrerrnoEINVAL)rr`er5r5r6r(s     r(z^\s*(Copyright|\(C\)) .*$z\s+)srcCs td|}td|}|S)N ) COPYRIGHT_REsubWS_RErn)rzr5r5r6 _norm_licenses  rz str | None)rBrc Csddl}t|dd}|}W5QRXt|}tj}d}tdt|}t j D]l\}} t| } || krr|S|rt t|t| t|dkrqR| || |} | |krR| |krR| }|}qR|r||kr|SdSdS)aReturn the spdx id for the license contained in `filename`. If no license is detected, returns `None`. spdx: https://spdx.org/licenses/ licenses from choosealicense.com: https://github.com/choosealicense.com Approximate algorithm: 1. strip copyright line 2. normalize whitespace (replace all whitespace with a single space) 3. check exact text match with existing licenses 4. failing that use edit distance rNrg)encodingr{g?) ukkonenr_rZrsysmaxsizemathceilr&rZLICENSESabsZdistance) rBrr`contentsZnormZ min_edit_distZmin_edit_dist_spdxcutoffZspdxr Z norm_licenseZ edit_distr5r5r6 license_ids($ r)7 __future__rrwros.pathrrerbrstringrtypingridentifyrrZidentify.vendorr frozensetrlrrrrr"r#r,r-Z TYPE_TAGSr/r*Z _ALL_TAGSr'r?valuesr@r=rIZALL_TAGSr7r$r)r\r+rcrtrvr(compileI MULTILINEr}rrrr5r5r5r6sV         .