g$dZddlZddlZddlZdgZej ddZGddZGddZ Gd d Z y) a% robotparser.py Copyright (C) 2000 Bastian Kleineidam You can choose between two licenses when using this package: 1) GNU GPLv2 2) PSF license for Python 2.2 The robots.txt Exclusion Protocol is implemented as specified in http://www.robotstxt.org/norobots-rfc.txt NRobotFileParser RequestRatezrequests secondscZeZdZdZddZdZdZdZdZdZ dZ d Z d Z d Z d Zd Zy)rzs This class provides a set of methods to read, parse and answer questions about a single robots.txt file. czg|_g|_d|_d|_d|_|j |d|_y)NFr)entriessitemaps default_entry disallow_all allow_allset_url last_checkedselfurls 9/opt/alt/python312/lib64/python3.12/urllib/robotparser.py__init__zRobotFileParser.__init__s;  !! Sc|jS)zReturns the time the robots.txt file was last fetched. This is useful for long-running web spiders that need to check for new robots.txt files periodically. )r rs rmtimezRobotFileParser.mtime%s   rc6ddl}|j|_y)zYSets the time the robots.txt file was last fetched to the current time. rN)timer )rrs rmodifiedzRobotFileParser.modified.s  IIKrcp||_tjj|dd\|_|_y)z,Sets the URL referring to a robots.txt file.N)rurllibparseurlparsehostpathrs rr zRobotFileParser.set_url6s-%||44S9!A> 49rc tjj|j}|j }|j |j djy#tjj$rT}|jdvrd|_ n4|jdk\r |jdkr d|_ Yd}~yYd}~yYd}~yYd}~yd}~wwxYw)z4Reads the robots.txt URL and feeds it to the parser.zutf-8)iiTiiN) rrequesturlopenrreadrdecode splitlineserror HTTPErrorcoder r )rfrawerrs rr%zRobotFileParser.read;s 9&&txx0A&&(C JJszz'*557 8||%% &xx:%$(!SSXX^!%&4" &s)A**C;CCcd|jvr|j||_yy|jj|yN*) useragentsr rappend)rentrys r _add_entryzRobotFileParser._add_entryHs= %"" "!!)%*"* LL   &rcd}t}|j|D]}|s4|dk(r t}d}n"|dk(r|j|t}d}|jd}|dk\r|d|}|j }|sh|j dd}t |dk(s|dj j|d<tjj|dj |d<|ddk(rB|dk(r|j|t}|jj|dd}*|ddk(r3|dk7s9|jjt|dd d}e|dd k(r3|dk7st|jjt|dd d}|dd k(r?|dk7s|dj jrt!|d|_d}|dd k(r|dk7s|dj d}t |dk(rk|dj jrJ|dj jr)t%t!|dt!|d|_d}|ddk(s|j(j|d|dk(r|j|yy)zParse the input lines from a robots.txt file. We allow that a user-agent: line is not preceded by one or more blank lines. rr#N:z user-agentdisallowFallowTz crawl-delayz request-rate/sitemap)Entryrr4findstripsplitlenlowerrrunquoter1r2 rulelinesRuleLineisdigitintdelayrreq_rater)rlinesstater3lineinumberss rrzRobotFileParser.parseQs DA:!GEEaZOOE*!GEE #AAvBQx::>   \\**6<<+?+?+DE ll%%r"Z__   j.. 0C0C'EFll  %C\\E *s++"   %%//4 4rc|jsy|jD]!}|j|s|jcS|jr|jjSyN)rrrVrHr rrXr3s r crawl_delayzRobotFileParser.crawl_delaysTzz|\\E *{{""   %%++ +rc|jsy|jD]!}|j|s|jcS|jr|jjSyr\)rrrVrIr r]s r request_ratezRobotFileParser.request_ratesTzz|\\E *~~%"   %%.. .rc4|jsy|jSr\)rrs r site_mapszRobotFileParser.site_mapss}}}}rc|j}|j||jgz}djtt|S)Nz )rr joinmapstr)rrs r__str__zRobotFileParser.__str__s@,,    )!3!3 44G{{3sG,--rN)rP)__name__ __module__ __qualname____doc__rrrr r%r4rrZr^r`rbrgrrrrsE !(? 9'G#R: .rc"eZdZdZdZdZdZy)rEzoA rule line is a single "Allow:" (allowance==True) or "Disallow:" (allowance==False) followed by a path.c|dk(r|sd}tjjtjj|}tjj ||_||_y)NrPT)rrrQrrUr!rW)rr!rWs rrzRuleLine.__init__sP 2:iI||&&v||'<'>zTADIIMMrN)rhrirjrkrrVrgrlrrrErEs1#BNrrEc(eZdZdZdZdZdZdZy)r=z?An entry has one or more user-agents and zero or more rulelinesc<g|_g|_d|_d|_yr\)r1rDrHrIrs rrzEntry.__init__s  rcg}|jD]}|jd||j|jd|j|j7|j}|jd|jd|j |j tt|jdj|S)Nz User-agent: z Crawl-delay: zRequest-rate: r; ) r1r2rHrIrequestssecondsextendrerfrDrd)rretagentrates rrgz Entry.__str__s__E JJeW- .% :: ! JJtzzl3 4 == $==D JJ a ~F G 3sDNN+,yy~rc|jddj}|jD]}|dk(ry|j}||vsyy)z2check if this entry applies to the specified agentr;rr0TF)r@rBr1)rrXr}s rrVzEntry.applies_tosSOOC(+113 __E|KKME ! %rcd|jD]!}|j|s|jcSy)zZPreconditions: - our agent applies to this entry - filename is URL decodedT)rDrVrW)rrqrLs rrWzEntry.allowance s-NNDx(~~%#rN)rhrirjrkrrgrVrWrlrrr=r=sI  rr=) rk collections urllib.parserurllib.request__all__ namedtuplerrrEr=rlrrrsU   $k$$]4FG ~.~.BNN$((r