Elmcity: Reviewing the first meeting

I was the only new member of the Elmcity team who couldn’t make the first meeting, so I thought I’d catch up by posting my thoughts on the transcript.

They discussed two projects for civic-minded student programmers, making a scraper for calendar-like web pages, and a system for finding implicit recurring events in online plain text (like “meets every Thursday…”)

We’ll all be tackling the scraper problem first.

Event scraper specs:

These are very preliminary and informal.

Users can flag a MySpace site and get back an ICal feed of events, optionally keyword filtered.

Flagging means bookmarking to a delicious account with tags monitored by elmcity.

The .ics feed will be found on the elmcity site (for now, a simple local dummy version).

The plan now

The plan is to start off by implementing scrapers for MySpace and LibraryThing. They’ll both use the same service component (referred to above as “elmcity”). We’ll all be working as a team on this test run, but since there was no discussion of a shared repository or another collaboration location, I believe we’ll each start our own version of this in close communication with each other.

Other things discussed

The implicit recurring event finder would need significant user intervention to verify generated events. Jon mentioned Amazon’s Mechanical Turk service as a model.

The best solution would be to obviate a need for our scraper service by convincing content curators to release a feed. Jon tried to convince MySpace folk to do so without success.

My thoughts

Having a small project to work on individually to get acquainted with the problem seems like a good idea. I’d be a little happier if I had some dated milestones, particularly about when we’ll try to merge our code to start on the group project in earnest, but that’ll come.

What are the barriers to releasing feeds for calendar-like pages?

  • An ICal feed logo could raise awareness. Maybe the RSS logo with a date in the corner, something very obvious and visual. People use podcast and rss links, we should perhaps aim for a firefox plugin (?) that makes the scraper fit into that established workflow of “go to content page -> click logo -> autorun/copypaste feed into program”
  • Presumably for-profit content curators don’t want to lose pageviews by releasing feeds. If we figure out a place for both the content source link and site name+slogan, we might be able to convince curators like the free weeklies that having their tiny ad show up on everyone’s calendar several times a day cements their place as the canonical source for all events in their niche.

I should ask about how free our work will be. The scraper idea is based on a now defunct company by the name of FuseCal. It would be frustrating if our end project was another venture that could die off and need to be recreated because the source is unavailable.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: