This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Page tree
Skip to end of metadata
Go to start of metadata

Using uploaded corpora with an independent mass-check

The NewUploadedCorporaUser page describes setting up a ruleQA user so that an uploaded corpus will be mass-checked using the mass-check client/server setup. However, a bug means that doesn't support C/S mode, for some reason, so instead to use that resource, some of the uploaded corpora are scanned separately in traditional single-machine non-distributed mode. Here are the commands used to set up a new uid on that machine, for PMC members.

First, log into (You'll probably need to have an account created for you first.)

set some variables:


create a uid:

sudo useradd -c "Nightly mass-check jm" $BBUSERNAME
sudo passwd $BBUSERNAME
[give the new account a random password. this is needed for cron to work!]
sudo mkdir -p /export/home/$BBUSERNAME
sudo chown $BBUSERNAME /export/home/$BBUSERNAME
sudo -H -u $BBUSERNAME bash

you are now running as the new uid. Follow instructions similar to :

cd $HOME
mkdir tmp
echo '' > .forward
svn co svn

Accept (p)ermanently when asked.

cp trunk/masses/rule-qa/corpus.example ~/.corpus
vi ~/.corpus

use something like this:

opts_weekly="--net -j 8 --reuse --cache --cachedir=/tmp/aicache_nightly --restart=500 ham:detect:/export/h
ome/bbmass/uploadedcorpora/jm/ham/* --after="-15552000" --tail=40000 --scanprob=0.3 spam:detect:/export/ho
opts_nightly="--reuse --cache --cachedir=/tmp/aicache_nightly --restart=500 ham:detect:/export/home/bbmass
/uploadedcorpora/jm/ham/* --after="-15552000" --tail=40000 --scanprob=0.3 spam:detect:/export/home/bbmass/

Replace BBUSERNAME with the value of $BBUSERNAME, and RSYNC_PASSWORD with the correct pwd for that rsync user.

Then, run the mass-check just to see if it works (feel free to CTRL-C once you're happy):

bash $HOME/svn/masses/rule-qa/corpus-nightly

Then set up the cron using 'EDITOR=vi crontab -e':

0 9 * * * bash svn/masses/rule-qa/corpus-nightly

Hopefully that should do it (wink)

  • No labels