Sqlalchemy : Dynamically Create Table From Scrapy Item
Solution 1:
There's actually a package out there that can help you out with this.
Check out: dataset: databases for lazy people
Here's an excerpt from the page:
Features
Automatic schema:
If a table or column is written that does not exist in the database, it will be created automatically.
Upserts:
Records are either created or updated, depending on whether an existing version can be found. Query helpers for simple queries such as all rows in a table or all distinct values across a set of columns.
Compatibility:
Being built on top of SQLAlchemy, dataset works with all major databases, such as SQLite, PostgreSQL and MySQL.
Scripted exports:
Data can be exported based on a scripted configuration, making the process easy and replicable.
Solution 2:
Here's what I came up with based on Alex's dataset recommendation above:
import dataset
class DynamicSQLlitePipeline(object):
@classmethod
def from_crawler(cls, crawler):
# Here, you get whatever value was passed through the "target" parameter
table_name = getattr(crawler.spider, "target")
return cls(table_name)
def __init__(self,table_name):
try:
db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
db = dataset.connect(db_path)
self.my_table = db[table_name]
except Exception:
traceback.exec_print()
def process_item(self, item, spider):
try:
self.my_table.insert(dict(item))
except IntegrityError:
print('THIS IS A DUP')
return item
Hope it helps.
Solution 3:
This is not a direct answer to the question, but an alternative way to approach the problem.
How can I modify the pipeline above to dynamically create and insert the filtered item's values instead of having these hard coded in like they are now?
What I hear is that you don't want to have a pre-defined table schema and what your database to adjust to the fields you are scraping. Well, this sounds a lot like you need a schemaless database.
Think about switching to MongoDB or other NoSQL schemaless storages. The Scrapy documentation even provides an example of a Python+MongoDB pipeline which would insert a scraped item into a MongoDB collection ("table" in SQL terms) as a JSON document:
def process_item(self, item, spider):
self.db[self.collection_name].insert(dict(item))
return item
And, what important is - it does not matter what are the item
fields - there is no pre-defined structure for your collection document.
This is just a thought - I don't know much about your project requirements and possible constraints.
Post a Comment for "Sqlalchemy : Dynamically Create Table From Scrapy Item"