Caching Sitemaps In Django
Solution 1:
50k is not a hard coded parameter.
You can use class django.contrib.sitemaps.GenericSitemap instead:
classLimitGenericSitemap(GenericSitemap):
limit =2000
Solution 2:
Assuming you don't need all those pages in your sitemap then reducing the limit to get the file size down will work fine as described in the previous answer.
If you do want a very large sitemap and do want to use Memcached you could split the content up into multiple chunks, store them under individual keys and then put them back together again on output. To make this more efficient, Memcached supports the ability to get multiple keys at the same time, although I'm not sure whether the Django client supports this capability yet.
For reference, the 1 MB limit is a feature of Memcached to do with how it stores data: http://code.google.com/p/memcached/wiki/FAQ#What_is_the_maximum_data_size_you_can_store?_(1_megabyte)
Solution 3:
I have about 200,000 pages on my site, so I had to have the index no matter what. I ended up doing the hack, limiting the sitemap to 250 links, and also implementing a file-based cache.
The basic algorithm is this:
- Try to load the sitemap from a file on disk
- If that fails, generate the sitemap, and
- If the sitemap contains 250 links (the number set above), save it to disk and then return it.
The end result is that the first time a sitemap is requested, if it's complete, it's generated and saved to disk. The next time it's requested, it's simply served from disk. Since my content never changes, this works very well. However, if I do want to change a sitemap, it's as simple as deleting the file(s) from disk, and waiting for the crawlers to come regenerate things.
The code for the whole thing is here, if you're interested: http://bitbucket.org/mlissner/legal-current-awareness/src/tip/alert/alertSystem/sitemap.py
Maybe this will be a good solution for you too.
Post a Comment for "Caching Sitemaps In Django"