How To Download Google Image Search Results In Python

April 14, 2024 Post a Comment

This question has been asked numerous times before, but all answers are at least a couple years old and currently based on the ajax.googleapis.com API, which is no longer supported

Solution 1:

Use the Google Custom Search for what you want to achieve. See @i08in's answer of Python - Download Images from google Image search? it has great description, script samples and libraries references.

Solution 2:

To download any number of images from Google image search using Selenium:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import json
import urllib2
import sys
import time

# adding path to geckodriver to the OS environment variable# assuming that it is stored at the same path as this script
os.environ["PATH"] += os.pathsep + os.getcwd()
download_path = "dataset/"defmain():
    searchtext = sys.argv[1] # the search query
    num_requested = int(sys.argv[2]) # number of images to download
    number_of_scrolls = num_requested / 400 + 1# number_of_scrolls * 400 images will be opened in the browserifnot os.path.exists(download_path + searchtext.replace(" ", "_")):
        os.makedirs(download_path + searchtext.replace(" ", "_"))

    url = "https://www.google.co.in/search?q="+searchtext+"&source=lnms&tbm=isch"
    driver = webdriver.Firefox()
    driver.get(url)

    headers = {}
    headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
    extensions = {"jpg", "jpeg", "png", "gif"}
    img_count = 0
    downloaded_img_count = 0for _ in xrange(number_of_scrolls):
        for __ in xrange(10):
            # multiple scrolls needed to show all 400 images
            driver.execute_script("window.scrollBy(0, 1000000)")
            time.sleep(0.2)
        # to load next 400 images
        time.sleep(0.5)
        try:
            driver.find_element_by_xpath("//input[@value='Show more results']").click()
        except Exception as e:
            print"Less images found:", e
            break# imges = driver.find_elements_by_xpath('//div[@class="rg_meta"]') # not working anymore
    imges = driver.find_elements_by_xpath('//div[contains(@class,"rg_meta")]')
    print"Total images:", len(imges), "\n"for img in imges:
        img_count += 1
        img_url = json.loads(img.get_attribute('innerHTML'))["ou"]
        img_type = json.loads(img.get_attribute('innerHTML'))["ity"]
        print"Downloading image", img_count, ": ", img_url
        try:
            if img_type notin extensions:
                img_type = "jpg"
            req = urllib2.Request(img_url, headers=headers)
            raw_img = urllib2.urlopen(req).read()
            f = open(download_path+searchtext.replace(" ", "_")+"/"+str(downloaded_img_count)+"."+img_type, "wb")
            f.write(raw_img)
            f.close
            downloaded_img_count += 1except Exception as e:
            print"Download failed:", e
        finally:
            printif downloaded_img_count >= num_requested:
            breakprint"Total downloaded: ", downloaded_img_count, "/", img_count
    driver.quit()

if __name__ == "__main__":
    main()

Full code is here.

Solution 3:

Make sure you install icrawler library first, use.

pip install icrawler

from icrawler.builtin import GoogleImageCrawler
google_Crawler = GoogleImageCrawler(storage = {'root_dir': r'write the name of the directory you want to save to here'})
google_Crawler.crawl(keyword = 'sad human faces', max_num = 800)

Solution 4:

Improving a bit on Ravi Hirani's answer the simplest way is to go by this :

from icrawler.builtin importGoogleImageCrawlergoogle_crawler= GoogleImageCrawler(storage={'root_dir': 'D:\\projects\\data core\\helmet detection\\images'})
google_crawler.crawl(keyword='cat', max_num=100)

Source : https://pypi.org/project/icrawler/

Solution 5:

How about this one?

https://github.com/hardikvasa/google-images-download

it allows you to download hundreds of images and has a ton of filters to choose from to customize your search

If you would want to download more than 100 images per keyword, then you will need to install 'selenium' along with 'chromedriver'.

If you have pip installed the library or run the setup.py file, Selenium would have automatically installed on your machine. You will also need Chrome browser on your machine. For chromedriver:

Download the correct chromedriver based on your operating system.

On Windows or MAC if for some reason the chromedriver gives you trouble, download it under the current directory and run the command.

On windows however, the path to chromedriver has to be given in the following format:

C:\complete\path\to\chromedriver.exe

On Linux if you are having issues installing google chrome browser, refer to this CentOS or Amazon Linux Guide or Ubuntu Guide

For All the operating systems you will have to use '--chromedriver' or '-cd' argument to specify the path of chromedriver that you have downloaded in your machine.

Learn Python Tutorials