On 2015-01-20 15:21, Seth wrote:
On Tue, 20 Jan 2015 13:15:43 -0800, Greg Norcie <gnorcie@xxxxxxxxxxxx> wrote:

I (and a few other friends) have noticed Padmapper's results seem less complete than usual lately.


I would think than an organization like Padmapper would have the technical and financial wherewithal to build out their own network of scraper nodes apart from the tor network.

This would be almost impossible to block especially if they stood up the infrastructure on a large cloud providers where instances could be re-provisioned with new IP address in numerous cities all over the globe in a matter of minutes with a click of the button.

I'm not so sure that that would work particularly well, humans rarely live in datacenters, and it's tough to make cloud IPs look and act the same as residential IPs, especially when other IPs in the same /24 (or larger) are owned by different customers. User behaviour would also be quite different, and it would probably be difficult to mimic typical human patterns of usage while scraping enough information to be worthwhile before Craigslist pulls the plug.

Tor exit nodes, on the other hand, have a lot of human shields using them too, so it makes it a lot harder to narrow down a specific "bad actor" without also hitting actual users.

So while Tor isn't necessary an ideal choice here, it has some advantages over dynamically allocating and dropping cloud IPs.

I'm curious why Craigslist doesn't just sell their listing data via API access to companies like Padmapper, that would be a win-win.

Because they're actively hostile to creating a better user experience. Don't get me wrong, the fact that their website doesn't look like someone from marketing took a dump all over it is part of what is awesome about it, but still...

Dave Warren

