Skip to content Skip to sidebar Skip to footer

Multi-threaded Python Web Crawler Got Stuck

I'm writing a Python web crawler and I want to make it multi-threaded. Now I have finished the basic part, below is what it does: a thread gets a url from the queue; the thread ex

Solution 1:

Your crawl function has an infinite while loop with no possible exit path. The condition True always evaluates to True and the loop continues, as you say,

not exiting properly

Modify the crawl function's while loop to include a condition. For instance, when the number of links saved to the csv file exceeds a certain minimum number, then exit the while loop.

i.e.,

def crawl():
    whilelen(exist) <= min_links:
        ...

Post a Comment for "Multi-threaded Python Web Crawler Got Stuck"