[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[tor-dev] Tor-stem in python script - HTTP requests issue



Hello guys,

I had a problem and currently I'm not able to solve it. So, here I am ;) I have a python script that uses python-stem to create and handle a tor instance (on a defined port). What it does is retrieving (using a  HTTP GET) a web page and submitting information (using HTTP POST messages).
Basically i use tor because I need to test this server from different IP addresses with more requests in parallel. What I also do is keeping trace of Cookies. Here's a sample of the code I use, based on the example on stem website https://stem.torproject.org/tutorials/to_russia_with_love.html (to have more parallel requests, i launch the script many times with different socks_port value):
----------------------------
import socket, socks, stem.process
import mechanize, cookielib

SOCKS_PORT = 9000
DATA_DIRECTORY = "TOR_%s" % SOCKS_PORT
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', SOCKS_PORT)
socket.socket = socks.socksocket

tor_process = stem.process.launch_tor_with_config(
          config = {
            'SocksPort': str(SOCKS_PORT),
            'ControlPort': str(SOCKS_PORT+1),
            'DataDirectory': DATA_DIRECTORY,
            'ExitNodes': '{it}',
          },
        )

# initialize python mechanize, with cookies (it works exactly like urllib2, urllib3, etc. already tried...)
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
...

for number in num_list:
  req = br.open_novisit("http://example.com") #_1_
  res = req.read()
  print res
  req.close()
  req2 = br.open("http://example.com/post_to_me", data_to_post) #_2_
  res2 = req2.read()
  req2.close()
--------------------------------

And that's it. The problem occurs on the lines i marked as _1_ and _2_: basically when it reaches around 200 requests, it seems to block undefinitely, waiting for a response that never comes. Of course, wiresharking doesn't work because it's encrypted. The same stuff, without TOR, works perfectly. So, why does it stuck at about 200 requests!? I tried to:

1. Telnet on control port, forcing to renew circuits with SIGNAL NEWNYM
2. instantiating mechanize (urllib2, 3, whatever) in the loop
3. ...i don't remember what else

I thought it could be a local socket connection limit: actually without TOR, i see in wireshark the source port changes every time a request is performed. But actually i don't know if the problem is in using the same source port every time (but i don't think so) and if so, should I close the current socket and open a new one? Should I kill the tor process? I can't exaplain myself why...
What I only know is: *when the script stucks, if i kill the python process (ctrl+c) and then re-launch, it starts working again.*. I've seen that it's possible to set the value of TrackHostExitsExpire, is it useful in my case?

Thanks in advance to whoever can help me!!
Ed

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev