Even geekier portion of this post:
When I gathered all these feeds together there were a lot of duplicate feeds, and figuring those out was a pain in the ass because a lot of people had customized the names and descriptions and so forth. What to do?
It helps to know that an OPML file is just text, and that NetNewsWire will fetch descriptions for you. So here's a line of text representing the feed for Super Colossal (which is great but sadly not recently updated):
<outline text="Super Colossal" description="" title="Super Colossal" type="rss" version="RSS" htmlUrl="http://supercolossal.ch" xmlUrl="http://supercolossal.ch/feed/"/>
Pretty readable: there's text, an empty description, a title, and then some things that look less irrelevant. So, if we have a second line in the file that says
<outline text="Also Super Colossal" description="It's got stuff about building things" title="Also Super Colossal" type="rss" version="RSS" htmlUrl="http://supercolossal.ch" xmlUrl="http://supercolossal.ch/feed/"/>
it's going to appear out of order on the list and eliminating a duplicate is tough (time-consuming really). TextWrangler to the rescue, one of the ultimate Mac freebies for geeks. TextWrangler does grep, a very powerful and dangerous kind of search if you don't know what you're doing. I frequently don't know what I'm doing, but I just don't care, so on we went. Since we know NetNewsWire will fetch text and title and description, all we have to do is wipe those for the 2800 feeds in the file. Easy! So here's a search that more or less means look for anything that says 'title="anything"' and replace it with an emptied version:
And there's the result:
So you just do that a couple more times for the other fields - I guess you could do
text=".*?" description=".*?" title=".*?"
and get it all done at once but I wasn't smart enough to think of it at the time and grep and ambition don't match well with n00bs - and then the fields are wiped. TextWrangler can then put selected lines in order and then process them in various ways, like removing duplicate lines:And there we go, a whole bunch of feeds eliminated. DKW would do it in Excel, but I swear TextWrangler is faster. Down to around 2100. Empty out NetNewsWire, reimport the cleaned up OPML and it'll acquire the right text and description and title on its own. And if the feeds vary slightly - one's RSS and one's Atom - it's pretty likely that when NetNewsWire gathers information about the feed it's gonna show up next to its not-quite-duplicated partner anyway, and you can get rid of that pretty easily.
10 comments:
I don't use it, either.
And if the mustache of understanding wants my opinion on something, he can leave a comment on my blog just like anybody else.
~
Busy. Only 47000 unread items to go.
I use Google Reader pretty much exclusively.
I've got a thousand or so feeds in there, but I spend the majority of my time in the "everyday" folder - less than 100. I go in and clean it up every now and then, but that really just means creating a new folder and starting with a core set of feeds. Eventually it grows into something unwieldy and annoying so I do it again. But yeah, I bet there are some feeds that are duplicated in a number of folders. I should just delete most of them and start over, but I'm not getting any performance problems, so I reckon there's no compelling reason to do so...
S McG, is there a way to convert those items into digital music?
~
Yes, in a bunch of different ways. One of my favourites was for the old Mac OS, and it would take pictures and interpret position, colour and luminance of pixels and make sound out of them. Simple geometrical pictures worked best for organized noise, but the mush a landscape would produce was quite listenable.
Screenshots of the text of this, particularly from TextWrangler using a monospace font, would make something I'd listen to.
There's also a current Mac screensaver that uses RSS feeds: not hard to make those trigger different noises in Quartz Composer. Maybe I should play with that.
I spend the majority of my time in the "everyday" folder
My everday folder is pretty much the blogroll. The RSS feeds are the world outside that.
Blogroll needs an update.
Thanks, but I don't think it does the blog-o-sphere any good if we're both getting the same feeds.
Only 322 unread.
I know you're not getting the same feeds as me. I don't think you'd put up with all the programming bullshit for instance.
Speaking of which this collection fills me full of love.
Post a Comment