Skip to content Skip to sidebar Skip to footer

How To Produce Custom Json Output From Scrapy?

I am working on a Scrapy script which should make output like: { 'state': 'FL', 'date': '2017-11-03T14:52:26.007Z', 'games': [ { 'name':'Game1' }, { '

Solution 1:

Ref. https://stackoverflow.com/a/43698923/8964297

You could try to write your own pipeline like this:

Put this into your pipelines.py file:

import json


classJsonWriterPipeline(object):
    defopen_spider(self, spider):
        self.file = open('scraped_items.json', 'w')
        # Your scraped items will be saved in the file 'scraped_items.json'.# You can change the filename to whatever you want.
        self.file.write("[")

    defclose_spider(self, spider):
        self.file.write("]")
        self.file.close()

    defprocess_item(self, item, spider):
        line = json.dumps(
            dict(item),
            indent = 4,
            sort_keys = True,
            separators = (',', ': ')
        ) + ",\n"
        self.file.write(line)
        return item

Then modify your settings.py to include the following:

ITEM_PIPELINES = {
    'YourSpiderName.pipelines.JsonWriterPipeline': 300,
}

Change YourSpiderName to the correct name of your spider.

Note that the file gets written directly by the pipeline, so you don't have to specify file and format with the -o and -t command line parameters.

Hope this gets you closer to what you need.

Post a Comment for "How To Produce Custom Json Output From Scrapy?"