Skip to content Skip to sidebar Skip to footer

Sqlalchemy : Dynamically Create Table From Scrapy Item

I'm working with sqlalchemy 1.1 and scrapy. I'm currently using a a pipeline to store extracted data in a sqllite table via sqlalchemy . I'd like to dynamically create a table to a

Solution 1:

There's actually a package out there that can help you out with this.

Check out: dataset: databases for lazy people

Here's an excerpt from the page:

Features

Automatic schema:

If a table or column is written that does not exist in the database, it will be created automatically.

Upserts:

Records are either created or updated, depending on whether an existing version can be found. Query helpers for simple queries such as all rows in a table or all distinct values across a set of columns.

Compatibility:

Being built on top of SQLAlchemy, dataset works with all major databases, such as SQLite, PostgreSQL and MySQL.

Scripted exports:

Data can be exported based on a scripted configuration, making the process easy and replicable.


Solution 2:

Here's what I came up with based on Alex's dataset recommendation above:

import dataset

class DynamicSQLlitePipeline(object):

    @classmethod
    def from_crawler(cls, crawler):
        # Here, you get whatever value was passed through the "target" parameter
        table_name = getattr(crawler.spider, "target")
        return cls(table_name)

    def __init__(self,table_name):
        try:
            db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
            db = dataset.connect(db_path)
            self.my_table = db[table_name]
        except Exception:
            traceback.exec_print()

    def process_item(self, item, spider):

        try:
            self.my_table.insert(dict(item))
        except IntegrityError:
                print('THIS IS A DUP')
        return item

Hope it helps.


Solution 3:

This is not a direct answer to the question, but an alternative way to approach the problem.

How can I modify the pipeline above to dynamically create and insert the filtered item's values instead of having these hard coded in like they are now?

What I hear is that you don't want to have a pre-defined table schema and what your database to adjust to the fields you are scraping. Well, this sounds a lot like you need a schemaless database.

Think about switching to MongoDB or other NoSQL schemaless storages. The Scrapy documentation even provides an example of a Python+MongoDB pipeline which would insert a scraped item into a MongoDB collection ("table" in SQL terms) as a JSON document:

def process_item(self, item, spider):
    self.db[self.collection_name].insert(dict(item))
    return item

And, what important is - it does not matter what are the item fields - there is no pre-defined structure for your collection document.

This is just a thought - I don't know much about your project requirements and possible constraints.


Post a Comment for "Sqlalchemy : Dynamically Create Table From Scrapy Item"