ó zfc@s}dZddlZddlZdgZdd d„ƒYZdd d„ƒYZdd d„ƒYZd ejfd „ƒYZdS(s& robotparser.py Copyright (C) 2000 Bastian Kleineidam You can choose between two licenses when using this package: 1) GNU GPLv2 2) PSF license for Python 2.2 The robots.txt Exclusion Protocol is implemented as specified in http://www.robotstxt.org/norobots-rfc.txt iÿÿÿÿNtRobotFileParsercBsbeZdZdd„Zd„Zd„Zd„Zd„Zd„Zd„Z d „Z d „Z RS( ss This class provides a set of methods to read, parse and answer questions about a single robots.txt file. tcCs>g|_d|_t|_t|_|j|ƒd|_dS(Ni(tentriestNonet default_entrytFalset disallow_allt allow_alltset_urlt last_checked(tselfturl((s#/usr/lib64/python2.7/robotparser.pyt__init__s      cCs|jS(s·Returns the time the robots.txt file was last fetched. This is useful for long-running web spiders that need to check for new robots.txt files periodically. (R (R ((s#/usr/lib64/python2.7/robotparser.pytmtime!scCsddl}|jƒ|_dS(sYSets the time the robots.txt file was last fetched to the current time. iÿÿÿÿN(ttimeR (R R((s#/usr/lib64/python2.7/robotparser.pytmodified*s cCs/||_tj|ƒdd!\|_|_dS(s,Sets the URL referring to a robots.txt file.iiN(R turlparsethosttpath(R R ((s#/usr/lib64/python2.7/robotparser.pyR2s cCs¾tƒ}|j|jƒ}g|D]}|jƒ^q"}|jƒ|j|_|jdkrkt|_nO|jdkr•|jdkr•t|_n%|jdkrº|rº|j |ƒndS(s4Reads the robots.txt URL and feeds it to the parser.i‘i“iiôiÈN(i‘i“( t URLopenertopenR tstriptcloseterrcodetTrueRRtparse(R topenertftlinetlines((s#/usr/lib64/python2.7/robotparser.pytread7s     cCsAd|jkr-|jdkr=||_q=n|jj|ƒdS(Nt*(t useragentsRRRtappend(R tentry((s#/usr/lib64/python2.7/robotparser.pyt _add_entryEscCs&d}d}tƒ}|jƒxä|D]Ü}|d7}|sˆ|dkrZtƒ}d}qˆ|dkrˆ|j|ƒtƒ}d}qˆn|jdƒ}|dkr°|| }n|jƒ}|sÈq&n|jddƒ}t|ƒdkr&|djƒjƒ|dRB(((s#/usr/lib64/python2.7/robotparser.pyRs     4 R0cBs)eZdZd„Zd„Zd„ZRS(soA rule line is a single "Allow:" (allowance==True) or "Disallow:" (allowance==False) followed by a path.cCsS|dkr| rt}ntjtj|ƒƒ}tj|ƒ|_||_dS(NR(RRR5R-R9RR;(R RR;((s#/usr/lib64/python2.7/robotparser.pyR ¬s  cCs|jdkp|j|jƒS(NR(Rt startswith(R tfilename((s#/usr/lib64/python2.7/robotparser.pyR:´scCs|jrdpdd|jS(NtAllowtDisallows: (R;R(R ((s#/usr/lib64/python2.7/robotparser.pyRB·s(RCRDRER R:RB(((s#/usr/lib64/python2.7/robotparser.pyR0©s  R(cBs2eZdZd„Zd„Zd„Zd„ZRS(s?An entry has one or more user-agents and zero or more rulelinescCsg|_g|_dS(N(R R/(R ((s#/usr/lib64/python2.7/robotparser.pyR ½s cCsjg}x'|jD]}|jd|dgƒqWx*|jD]}|jt|ƒdgƒq:Wdj|ƒS(Ns User-agent: s R(R textendR/RAR?(R trettagentR((s#/usr/lib64/python2.7/robotparser.pyRBÁs cCs]|jdƒdjƒ}x=|jD]2}|dkr9tS|jƒ}||kr#tSq#WtS(s2check if this entry applies to the specified agentR4iR(R*R,R RR(R R<RL((s#/usr/lib64/python2.7/robotparser.pyR:És   cCs.x'|jD]}|j|ƒr |jSq WtS(sZPreconditions: - our agent applies to this entry - filename is URL decoded(R/R:R;R(R RGR((s#/usr/lib64/python2.7/robotparser.pyR;Ös (RCRDRER RBR:R;(((s#/usr/lib64/python2.7/robotparser.pyR(»s    RcBs#eZd„Zd„Zd„ZRS(cGs tjj||Œd|_dS(NiÈ(R-tFancyURLopenerR R(R targs((s#/usr/lib64/python2.7/robotparser.pyR àscCsdS(N(NN(R(R Rtrealm((s#/usr/lib64/python2.7/robotparser.pytprompt_user_passwdäscCs(||_tjj||||||ƒS(N(RR-RMthttp_error_default(R R tfpRterrmsgtheaders((s#/usr/lib64/python2.7/robotparser.pyRQés (RCRDR RPRQ(((s#/usr/lib64/python2.7/robotparser.pyRßs  (((( RERR-t__all__RR0R(RMR(((s#/usr/lib64/python2.7/robotparser.pyt s   –$