blizzard [Fri, 28 Nov 2008 02:46:38 +0000 (02:46 +0000)]
2008-11-27 Christopher Blizzard <blizzard@0xdeadbeef.com>
* tests/twisted/network/test_feedrefresh.py (TestFeedRefresh.confirmEntityHit):
Check to make sure we changed last_poll when we hit an etag or
last_modified.
* services/command/siterefresh.py (RefreshSiteError.handleError):
Update the last_poll field in the site table with the date if we
hit the etag or last_modified value.
* tests/twisted/network/test_feedrefresh.py
(TestFeedRefresh.test_RefreshSiteManagerEntityProperties): This
test makes sure that we set entity properties in the site table
after we hit a site that includes them.
(TestFeedRefresh.test_RefreshSiteManagerEntityHit): This test
makes sure that we return early and don't parse when we send a
matching etag or last-modified along with a request.
* services/command/siterefresh.py (RefreshSiteDone.srDone): Save
etag, last-modified and entity_url info in the site if we have it.
(RefreshSiteDone.done): When returning the data to the master
process add a http_entity_hit=0 in the dict so we know we did a
download. (For future use.)
(RefreshSiteError.handleError): Handle the DownloadCommand
throwing a NotModifiedError which means that we don't have to do
any parsing or updating of information. Short cut to exit.
Return value will include a http_entity_hit=1 for future use. We
also set the error field to http_not_modified when we hit this
condition. Also update the error field in the SiteRefresh table
when there's a real error.
* services/command/controller.py (RefreshManager.__init__): Use
new DownloadResourceSaveState after a download as part of a
refresh.
* services/command/newsite.py (NewSiteTryURL.doCommand): When
calling the download command pass in the url as part of a
dictionary.
(NewSiteTryURL.downloadDone): More args["filename"] changes.
(NewSiteTryURL.startSecondDownload): Same.
(NewSiteTryURL.secondDownloadDone): Same.
(NewSiteTryURL.tryFeed): Same.
* services/command/download.py (DownloadResourceSaveState): Shim
command that takes the download data and saves it into the state
for later commands.
(DownloadCommand.doCommand): New code to handle etag,
last_modified and entity_url info as arguments to this command.
(DownloadCommand.downloadDone): Data is now returned as a hash
that includes filename, etag, last_modified and the url stack of
downloads.
* services/command/feedparse.py (FeedRefreshSetup.gotNewSite):
Gets the etag, last_modified and entity_url out of the database
when setting up for a feed refresh.
(FeedRefreshSetup.gotFeed): When returning with a setup refresh
the next command is the download so set up everything the download
needs to send an etag + last-modified header if we can.
(FeedParseCommand.doCommand): Convert to use args["filename"]
instead of just filename since the downloadcommand now returns
more than just the filename.
* services/command/linkedin.py (LinkedInScrapeCommand.doCommand):
Convert linkedin code to use a hash["filename"] instead of just
the filename.
blizzard [Wed, 19 Nov 2008 01:42:09 +0000 (01:42 +0000)]
2008-11-18 Christopher Blizzard <blizzard@0xdeadbeef.com>
* test-ws.cfg: Enable base_url_filter.base_url to localhost:9090
for testing to generate proper redirects for tests.
* whoisi/test_controller.py (TestController): Lots of new methods
that generate redirects to test url handling.
* tests/twisted/network/test_download.py: Lots of changes to
support the new entity_url argument for the download command.
(TestDownload.test_redirect): Test that tests working redirects.
(TestDownload.test_redirect_too_many): Test that will make sure we
throw an exception if there are too many redirects for a resource.
* services/command/exceptions.py (TooManyRedirectsError): New
exception when a resource has too many redirects.
* services/command/download.py: More ongoing work to handle etag
and last-modified headers. Also first steps to improve support
for redirects. doCommand has been modified to require etag,
last_modified and the entry_url to which the etag and
last_modified apply. (The original url which might generate
redirects might also not have the tags applied to it.) There is
also a url stack that is saved as urls are followed. This can be
used to match many possible urls to a single resource. (i.e. what
feedburner does.) This will also top out at 5 redirects for a
single resource and then throw a TooManyRedirectsError.
blizzard [Wed, 12 Nov 2008 23:44:34 +0000 (23:44 +0000)]
2008-11-12 Christopher Blizzard <blizzard@0xdeadbeef.com>
* whoisi/test_controller.py: New test controller that is used by
the etag support handling.
* whoisi/controllers.py (Root): Add a test/ controller.
* tests/twisted/network/test_download.py: New tests for etag and
last-modified.
* services/command/exceptions.py (NotModifiedError): New
NotModifiedError that's thrown when a download gets a 304.
* services/command/download.py: Add support for setting and
handling etag changes in the http support code. Not quite right
yet, especially with handling redirects and etag/last-modified,
but getting close. You can set etag and last-modified via the
DownloadCommand now. If it gets a 304, it will throw a
NotModifedError exception you can handle in your command error
handler. Is a lot cleaner approach than the old monkeypatching
method.
blizzard [Sun, 19 Oct 2008 02:28:05 +0000 (02:28 +0000)]
2008-10-18 Christopher Blizzard <blizzard@0xdeadbeef.com>
* services/command/feedparse.py (FeedUpdateDatabaseCommand.stupidEntryAlreadyThere):
Since all feeds now generate display_cache data we can use it for
stupid entires. Like, say, for old vimeo entries before they had
a guid.
blizzard [Sun, 19 Oct 2008 00:48:23 +0000 (00:48 +0000)]
2008-10-18 Christopher Blizzard <blizzard@0xdeadbeef.com>
* whoisi/templates/vimeo-widget.mak: Widget to render vimeo
content in person and follow pages.
* whoisi/templates/follow.mak: Add vimeo to the types we know how
to render
* whoisi/templates/unseen.mak: Add vimeo to the types we know how
to render.
* whoisi/templates/person-widget.mak: Add vimeo to the types we
know how to render.
* whoisi/utils/sites.py (site_value): Put vimeo before youtube in
the order in which we render sites.
* whoisi/controllers.py (Root.getDisplayDepth): Add vimeo display
depth so we render the right number of items in various contexts.
(Root.rendersite): If the site type if vimeo, use the vimeo
template for rendering.
* whoisi/static/images/sites/vimeo-favicon.png: Icon for vimeo
items.
* tests/nose/test_newsite.py (TestNewSite.test_vimeo): Code to
test the vimeo url detection code.
* services/command/vimeo.py (Vimeo): Class for vimeo url detection
and selecting a preferred feed.
* services/command/newsite.py (NewSiteTryURL.getPreferredFeed): If
the url is a vimeo url, pick the perferred feed from the list of
feeds left over from scraping the HTML.
(NewSiteTryURL.getFeedType): If the url is a vimeo url set the
site type to "vimeo."
blizzard [Sat, 18 Oct 2008 03:51:30 +0000 (03:51 +0000)]
2008-10-17 Christopher Blizzard <blizzard@0xdeadbeef.com>
* whoisi/utils/sites.py (site_value): Add youtube to the ordering
of sites in a profile.
* whoisi/utils/youtube.py (youtube_get_user): Returns a user for a
standard videos.rss-style youtube feed.
* whoisi/templates/unseen.mak: Add support for youtube.
* whoisi/templates/follow.mak: Add support for youtube.
* whoisi/templates/person-widget.mak: Add support for youtube.
* whoisi/templates/picasa-widget.mak: Support the thumb/JSON
format in display_cache.
* whoisi/templates/youtube-widget.mak (else): Widget for
displaying youtube.
* whoisi/controllers.py (Root.getDisplayDepth): Add support for
youtube.
(Root.rendersite): Add support for youtube.
* whoisi/static/images/sites/youtube-favicon.png: Icon for youtube.
* utils/convert-display-cache.py: Convert display cache for
picasa, not flickr.
* utils/convert-flickr-feeds.py: Script that converts flickr feeds
from atom to rss2 in the db.
* feed-parse-service (FeedParseProtocol.runCommand): Set the thumb
property in the display_cache if media_thumbnail is set.
* lib/feedparser.py: Patch to detect thumbnails.
* tests/nose/test_newsite.py (TestNewSite.test_youtube): New tests
for detecting youtube urls. Also somewhat future-proofed for
eventual user detection.
* smoketest.txt: Test a youtube url.
* patches/README: Readme for new thumbnail patch.
* patches/feedparser-thumbnail.patch: Patch that adds support for
the media:thumbnail property to feedparser. Taken from an
upstream bug.
* services/command/controller.py (PreviewSiteManager.__init__):
Don't call FlickrPreviewThumbnails anymore - we get it directly
from the feed now.
* services/command/newsite.py (NewSiteTryURL.getFeedType): If it's
a youtube url, set the type.
* services/command/flickr.py (Flickr.getPreferredFeed): We now use
the rss2 feed instead of the atom feed - it contains a thumbnail
url.
* services/command/picasa.py (Picasa.photoFeedForUser): Make
picasa work like flickr - set a "thumb" object as a JSON object in
the database row instead of just a raw url.
* services/command/youtube.py (Youtube): Class that supports the
current site model for detecting youtube feeds. It's also
future-proofed to support urls and usernames at some point.
* services/master/database.py (DatabaseManager.__init__): Disable
flickr scan on startup. New support for images doesn't require us
to do it anymore. Yay!
* services/master/newsite.py (NewSite.normalize): Placeholder for
eventual normalization for youtube. Doesn't do anything right
now.