This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Page tree
Skip to end of metadata
Go to start of metadata

This is the outline for the CodeCon presentations on FeedParser I'm giving in February.

Outline

  • Introduction
    • Based on the NewsMonster parser infrastructure (XSLT)
    • Designed for use within Rojo (an online RSS aggregator)
    • Event based not DOM based
    • Jakarta Commons
    • Apache 2.0 Open Source License
  • Challenges with building a feed parser
    • Too many standards
      • RSS (0.9, 0.91, 0.92, 1.0, 2.0)
      • Atom (0.3-0.5 and all draft specs (IETF work in progress))
      • OPML
      • FOAF
      • Changes.xml
      • RDF
      • XFN
      • HTML (link parsing, relations, nofollow, meta tags, generators, etc)
      • Modules (dc, aggregation, content, etc)
    • Semantic confusion:
      • rss:entry vs atom:item
      • title issues across specifications (dc, rss, atom, etc)
    • Encoding issues
      • Invalid entity references
      • XML prefix prior to <?xml?> (usually XML comments)
    • Date encoding issues:
      • RFC822 (RSS 2.0)
      • ISO8601 (RSS 1.0 and Atom)
  • Feed Event Model
    • SAX model
    • DOM on top (in the future)
    • SAX is about 12x faster
    • FeedParserListener:
      • init()
      • onChannel( state, title, link description ): void
      • onItem( state, title, link description ): void
      • onItemEnd(): void
    • General API not wire API
  • HTTP issues (network API):
    • Timeouts
    • ETags (If-None-Modified)
    • If-Modified-Since
    • UserAgent
    • Correct string support via Content-Type
  • Problems with DOM models:
    • Namespace matching doesn't line up correctly.
    • Doesn't (easily) support ad-hoc schema updates with extensions
    • Plugin API to pass events with vendor specific interfaces.
  • Autodiscovery
    • FeedLocator API
    • Atom + RSS autodiscovery support
    • Feed location via href
    • URL fishing (disabled by default)
  • Blog Profiles
    • Flicker doesn't support HEAD
    • Invalid autodiscovery implementations
    • Avoid URL fishing
    • Profile discovery support
  • Feed Creation
    • Same API can be used to create RSS feeds
  • API
    • Content Parsing
    • Tag Parsing
  • Thanks
    • Brad Neuberg
    • Rojo Team!
  • No labels