This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • SEO - Search Engine Optimization
Skip to end of metadata
Go to start of metadata

Table of contents

Some SEO Tricks

The Project

I'd like to take a recent project as an example to guide you through this entry.
The project is a smaller one that doesn't require any user input, it's called isport.eu and aggregates football news (for you americans out there: soccer) from various RSS feeds. A slightly more detailed description can be found in the corresponding blog post.

Using Nice Looking URLs

Using nice URL that describe the content of the site will not only improve your ranking but also look more appealling to the users on a SERP (search engine result page). In our case a common URL looks like that:

http://isport.eu/en/premier-league/liverpool-fc/news-19-2008-0.html

where /en is the language, /premier-league is the currently selected league, /liverpool-fc is the club the page is about, /news are on the page, 19-2008 is the week and 0 is the current page.

Removing The jessionid

As discussed on the mailinglist, jesession-ids in the url are generally considered as a bad thing for search engines. so the easiest way is to remove the jsession id for bots.
In the mentioned post on the mailinglist, Dan Kaplan and Arthur W. had some nice approaches there, I used the following combination of both approaches to check if the user-agent is a bot:

	private static final String[] botAgents = {	
			"googlebot", "msnbot", "slurp", "jeeves"
	/*
	 * "appie", "architext", "jeeves", "bjaaland", "ferret", "gulliver",
	 * "harvest", "htdig", "linkwalker", "lycos_", "moget", "muscatferret",
	 * "myweb", "nomad", "scooter", "yahoo!\\sslurp\\schina", "slurp",
	 * "weblayers", "antibot", "bruinbot", "digout4u", "echo!", "ia_archiver",
	 * "jennybot", "mercator", "netcraft", "msnbot", "petersnews",
	 * "unlost_web_crawler", "voila", "webbase", "webcollage", "cfetch",
	 * "zyborg", "wisenutbot", "robot", "crawl", "spider"
	 */};	


public static boolean isAgent(final String agent) {
	if (agent != null) {
		final String lowerAgent = agent.toLowerCase();
		for (final String bot : botAgents) {
			if (lowerAgent.indexOf(bot) != -1) {
				return true;
			}
		}
	}
	return false;
}

and then strip the session id in your custom web response.

In your application class add the following:

Wicket 1.5
protected WebResponse newWebResponse(final WebRequest webRequest, final HttpServletResponse httpServletResponse){ 
    return new ServletWebResponse((ServletWebRequest)webRequest, httpServletResponse) {

      @Override
      public String encodeURL(CharSequence url) {
          return isRobot(webRequest) ? url.toString() : super.encodeURL(url);
      }
	  
      @Override
      public String encodeRedirectURL(CharSequence url) {
          return isRobot(webRequest) ? url.toString() : super.encodeRedirectURL(url);
      }

      private boolean isRobot(WebRequest request) {
          final String agent = webRequest.getHeader("User-Agent");
          return /* use 'agent' to decide whether this is a bot */;
      } 
  };
}
Wicket 1.4
@Override
protected WebResponse newWebResponse(final HttpServletResponse servletResponse) {
	return new BufferedWebResponse(servletResponse) {
		@Override
		public CharSequence encodeURL(final CharSequence url) {
			final String agent = ((WebRequest) RequestCycle.get().getRequest()).getHttpServletRequest().getHeader("User-Agent");

			return isAgent(agent) ? url : super.encodeURL(url);
		}
	};
}

This way a new session gets created every time a User-Agent with the filtered names visits the page.

Making Paging Stateless

As paging in wicket is generally bound to sessions (the pagestate is saved in the pagemap which in turn lies in the session) we have to do something to make our pages pageable for someone who doesn't have a session like our bot.
So the trick was to put the desired pagenumber into the pageparameters (the "0" in the above example) and to implement a custom pagingnavigator. The navigator has to return BookmarkablePageLinks instead of the usual link. Therefore it was necessary to override the following methods

@Override
protected Link newPagingNavigationIncrementLink(final String id, final IPageable pageable, final int increment) {
	// return your bookmarkable increment (and decrement) link here
}

@Override
protected Link newPagingNavigationLink(final String id, final IPageable pageable, final int pageNumber) {
	// return your bookmarkable link here
}

@Override
protected PagingNavigation newNavigation(final IPageable pageable, final IPagingLabelProvider labelProvider) {
	return new PagingNavigation("navigation", pageable, labelProvider) {

		private static final long serialVersionUID = 1L;

		@Override
		protected Link newPagingNavigationLink(final String id, final IPageable pageable, final int pageIndex) {
			// return your bookmarkable link here
		}
	};
}

A Describing Page Title For Your Pages, Description and Keywords

As you certainly know, the <title> tag of your page is the one that gets shown on the SERPs, so it should be different for all your pages. If your using a BasePage that holds the layout and use markup inheritance for you SubPages a abstract method. The same idea for description and keywords are applicable.

<title wicket:id="title"></title>
<META wicket:id="description" NAME="DESCRIPTION" CONTENT=""/>
<META wicket:id="keywords" NAME="KEYWORDS" CONTENT=""/>
public abstract IModel getPageTitle();
public abstract IModel getDescription();
public abstract IModel getKeywords();

is an easy way to force all of your pages to provide a pagetitle, description and keywords. Most commonly the page title depends on your model objects that are going to be displayed on page. E.g. Liverpool FC in the above example. As this object isn't available in the constructor of the superclass, be sure to call getPageTitle() in the onBeforeRenderMethod and also be sure to use addOrReplace instead of only add. e.g.

@Override
protected void onBeforeRender() {
	// add the <title> tag
	addOrReplace(new Label("title", getPageTitle()));

    	Label desc = new Label("description","");
    	desc.add( new AttributeAppender("CONTENT", getDescription(), " "));
    	addOrReplace(desc);
    	
    	Label keywords = new Label("keywords","");
    	keywords.add( new AttributeAppender("CONTENT", getKeywords(), " "));
    	addOrReplace(keywords);

	super.onBeforeRender();
}

Creating a Dynamic Sitemap

It's generally recommended to provide a Sitemap for search engines. With wicket it's very easy to create one. The example XML looks like that

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url wicket:id="urlList">
      <loc wicket:id="locNode">http://www.example.com/</loc>
      <lastmod wicket:id="lastmodNode">2005-01-01</lastmod>
      <changefreq wicket:id="changefreqNode">monthly</changefreq>
      <priority wicket:id="priorityNode">0.8</priority>
   </url>
</urlset> 

Then use a listview and a loadabledetachablemodel (or a dataview with a dataprovider) to create the list. Just be sure to let getMarkupType return xml in your Page.

@Override
public String getMarkupType() {
	return "xml";
}

Finally mount your page e.g.

mountBookmarkablePage("/sitemap.xml", SitemapPage.class);

Of course the sitemap will contain a lot of data if you have a certain amount of pages so it might be desirable to cache the pages. I'm using ehcache's SimplePageCachingFilter to cache the page. Just add

<filter>
    <filter-name>SimplePageCachingFilter</filter-name>
    <filter-class>net.sf.ehcache.constructs.web.filter.SimplePageCachingFilter</filter-class>
    <load-on-startup>1</load-on-startup>
</filter>

to your web.xml. Togehter with the filter mapping

<filter-mapping>
    <filter-name>SimplePageCachingFilter</filter-name>
    <url-pattern>/sitemap.xml</url-pattern>
</filter-mapping>

Then configure your cache in the ehcache.xml:

<cache name="SimplePageCachingFilter" overflowToDisk="true" maxElementsInMemory="0" timeToLiveSeconds="43200" timeToIdleSeconds="43200"/>

With this configuration the Sitemap will be cached for 12 hours after creation (and won't be cached in memory).

  • No labels