This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

*** As of June 2017, the latest Solr Ref Guide is located at https://lucene.apache.org/solr/guide ***

Please note comments on these pages have now been disabled for all users.

Skip to end of metadata
Go to start of metadata

This page is for checking the documentation for links that have expired or moved.

Offline link checking

Untill/Unless we get the macro below working, here is one method that can be used to check for bad links between pages of the ref guide...

  • install the w3c checklink tool (w3c-linkchecker in debian)
  • Generate an HTML export of this space
    • make sure to exclude the internal only pages by unchecking the box
    • make sure to exclude comments by unchecking the box
    • save the resulting solr-12345-678-90.zip file to /tmp/
  • unzip the html export, this should create a /tmp/solr directory containing all of the wiki pages as html files
  • run the following command: checklink --html --summary --recursive --exclude 'http.*' /tmp/solr > /tmp/solr-wiki-linkcheck.html
  • open the resulting file in your browser: /tmp/solr-wiki-linkcheck.html

Even with the --summary option, this page will be pretty verbose. You'll want to skim it looking for any thing in red, which will indicate some type of problem – this may be:

  • an attempt at linking to a page that doesn't exist
    • fix the link, page was probably renamed
  • linking to an anchor that doesn't exist on a page that does
    • fix the link, anchor was probably renamed
    • sometimes can be a false negative if the anchor has markup in it – evidently the link checker gets confused, but browsers clicking the online links seem to work fine, and the intra doc links may work fine in the PDF, so sanity check that way if you can't see an obvious problem.
    • can easily be a false negative if the anchor is "#main" – these anchors exist online, but not in the html export – the macro that makes the links to these anchors doesn't get used in the pdf at all, so they can be ignored
  • or a page that includes the same anchor more then once.
    • probably due to how confluence generates anchors based on header text. not a lot you can do about it except rename a header - doesn't really hurt anything unless something is trying to link to one of these dup anchors from another page.

Note: as specified above, the checklink command will ignore any "http" urls and only test the local links. You could modify it to also check the links from the ref-guide out to external sites, but be careful about not hammering remote sides, or recursively crawling them by mistake. Read the docs about checklink carefully, and do a test run w/o the --html --summary option and w/o redirecting to a file (so you can quickly monitor every URL being hit) to ensure you aren't doing a run-away crawl of the whole internet.

Confluence Report Macro

 

This is super slow and frequently the page won't even load: Internal - Link Check Report - SLOW