Archive for September, 2009

Academic Link for Elmcity

September 29, 2009

The elmcity team had our three day code sprint and face-to-face meet this weekend, and we’ve got a little demo up.

It’s a framework for users to input URLs pointing to websites that contain calendar data, and get back .ics files ready to import into their calendar app. The meat of the program will be a generalized web-scraper if possible, and if not, a large set of site-specific scrapers. The site-specific scrapers are small plugins with rules for gleaning meaningful calendar data from a website with particular formatting.

We hope to offer elmcity plugin generation as a ready made, real life project for first or second year programming courses.

Each plugin is just about the right size for a small early university level project. I had a similar assignment in CSC207 at UofT last year, one intended to exercise regular expression skills and demonstrate the idea of markup.

What students get

  • A highly motivating real life project
  • Specific success criteria and a useful end-product (the feed, maybe one of a sports team they’re on, for instance)
  • A website where prospective employers can see their plugin in action

What professors get

  • A ready made, motivating assignment without existing solutions out on the web
  • One with very specific success criteria, including automated testing of plugins
  • A flexible assignment: you could specify that regular expressions must be used or allow students to try using a page parser. You could require a variety of coding styles. We could potentially call and accept input from plugins in any language, as well, by setting string standards for communicating calendar data.

What elmcity gets

  • Real life users give us specific motivation to code this project well:
    • Make sure that testing the code and running the server locally is a one-liner, so students can easily try their work
    • Include clear, concise documentation with illuminating examples
    • Make a clean and easy to understand plugin interface
    • Make a professional-looking, easy to browse project website
  • Longer legs for the project by ensuring there’s motivation to continually improve the set of plugins available, and by introducing the project to a new group of potential developers each semester.
  • It also means that the UCOSP group would be partially freed up from repetitive plugin writing, so we can concentrate on the core features and the standards mentioned above.

Other possible sources of web scrapers:

  • Users who can code a bit. This would be a lot like the academic source, but it would require a more formal set of requirements for human review, because the server would be executing code from out in the wild without TAs in the middle to ward off malicious bits. We’d also have to be awesome enough to inspire user loyalty to convince people to lend a hand.
  • If we have a lot of extra time and expertise on our hands, we could investigate the possibility of a browser plugin that would allow non-technical users to mark up a page to generate a scraper. I’m envisioning turning it on and highlighting or circling the time/date, name, and location of two events on the page, and marking them as such. The plugin would find the corresponding text in the source and generate rules on where it’s located in the document tree. The user would get back immediate results for the rest of the events they can check. I haven’t thought this through and I have no idea whether it’s at all feasible, but it would be cool.

Like it? Hate it? Need a whole lot of questions answered before you have an opinion? Please leave comments if you have any thoughts about the idea. Thanks!


September 25, 2009

Diane Tam points out a great, simple mercurial tutorial. Jory noted that the instructions here neglected to update the local repository after pulling changes. Thanks to their hard work, we’re up and running.

Thanks, guys!

/broken instructions removed so as not to be a trap

Elmcity: Reviewing the first meeting

September 14, 2009

I was the only new member of the Elmcity team who couldn’t make the first meeting, so I thought I’d catch up by posting my thoughts on the transcript.

They discussed two projects for civic-minded student programmers, making a scraper for calendar-like web pages, and a system for finding implicit recurring events in online plain text (like “meets every Thursday…”)

We’ll all be tackling the scraper problem first.

Event scraper specs:

These are very preliminary and informal.

Users can flag a MySpace site and get back an ICal feed of events, optionally keyword filtered.

Flagging means bookmarking to a delicious account with tags monitored by elmcity.

The .ics feed will be found on the elmcity site (for now, a simple local dummy version).

The plan now

The plan is to start off by implementing scrapers for MySpace and LibraryThing. They’ll both use the same service component (referred to above as “elmcity”). We’ll all be working as a team on this test run, but since there was no discussion of a shared repository or another collaboration location, I believe we’ll each start our own version of this in close communication with each other.

Other things discussed

The implicit recurring event finder would need significant user intervention to verify generated events. Jon mentioned Amazon’s Mechanical Turk service as a model.

The best solution would be to obviate a need for our scraper service by convincing content curators to release a feed. Jon tried to convince MySpace folk to do so without success.

My thoughts

Having a small project to work on individually to get acquainted with the problem seems like a good idea. I’d be a little happier if I had some dated milestones, particularly about when we’ll try to merge our code to start on the group project in earnest, but that’ll come.

What are the barriers to releasing feeds for calendar-like pages?

  • An ICal feed logo could raise awareness. Maybe the RSS logo with a date in the corner, something very obvious and visual. People use podcast and rss links, we should perhaps aim for a firefox plugin (?) that makes the scraper fit into that established workflow of “go to content page -> click logo -> autorun/copypaste feed into program”
  • Presumably for-profit content curators don’t want to lose pageviews by releasing feeds. If we figure out a place for both the content source link and site name+slogan, we might be able to convince curators like the free weeklies that having their tiny ad show up on everyone’s calendar several times a day cements their place as the canonical source for all events in their niche.

I should ask about how free our work will be. The scraper idea is based on a now defunct company by the name of FuseCal. It would be frustrating if our end project was another venture that could die off and need to be recreated because the source is unavailable.


September 10, 2009

I’m Sarah Strong, a third year undergraduate student at the University of Toronto, and I’ll be working on the Elmcity project this semester with UCOSP.

Free astronaut shirt!

I spent the summer of 2008 working at The Centre for Global E-Health Innovation on their remote patient monitoring system. It’s a rails app that listens for home recorded medical readings and sends out alerts to both doctor and patient if something’s wrong. It was a fantastic crash course in agile methods and the practical side of software development.

The next summer, I worked on TracSNAP, or Trac Social Network Analysis Plugin. It’s written in python and flare, and it’s designed to help developers on large, disorganized code bases connect with colleagues. I learned about program design and the challenges of making a novel app useful and easy to use.

In the past, I’ve worked as an anti-homophobia workshop facilitator, a conversational English teacher in China, a corporate ghostwriter, a copy editor, and a screen printer, and ran my own business helping people find positive expectation bonuses at online casinos.