Child pages
  • Merging JIRAs
Skip to end of metadata
Go to start of metadata

This article describes how to consolidate two JIRA instances by copying all contents from one JIRA instance to another, using JIRA's Project Import feature for the bulk of the work, and extensive script and SQL hackery to work around the Project Importer's limitations. I'll discuss how to:

  • use PostgreSQL schemas to enable SQL queries spanning both old and new JIRA tables.
  • identify non-identical username pairs on old and new JIRAs that belong to the same person.
  • rename JIRA users (change usernames) in the non-identical pairs identified above, to avoid creating duplicate usernames in the new system.
  • use PostgreSQL's PL/pgSQL procedural language to migrate user accounts between JIRAs, including saved searches and dashboards.
  • preserve the old JIRA's URLs with Apache HTTPd redirects, including redirecting search requests where the ID has changed.

Problem Background

The Apache Software Foundation has a JIRA instance at https://issues.apache.org/jira. Apache often becomes home to projects that have started life elsewhere. Sometimes these external projects have their own JIRA instance too, and when they join the ASF we need to merge their JIRA contents into our JIRA. Such was the case for Struts, ActiveMQ, JRoller, Cayenne and Click.

For a long time there was no sane way of merging JIRAs, and all we could do was create new JIRA instances. After a while we had quite a collection, and the memory footprint of all these Java processes was getting out of hand.

Project Importer capabilities and limitations

In 2008 Atlassian, the company behind JIRA, implemented a Project Import feature, and we could begin consolidating JIRAs.

JIRA's Project Importer is very thorough and good at what it does, but it is strictly a "project" importer, and a JIRA instance contains more than just projects. JIRA content may be roughly divided into three categories:

  • "project-related data" including project issues
  • "user-related data" including users, saved searches and dashboards
  • "configuration data" permission schemes, notification schemes, workflows, custom fields and all the other things noted in the documentation

So the Project Importer will handle (most of) the project and issue data, but we still need to somehow get across the configuration and user-related data.

Merge process

The merge process is broadly as follows:

  1. Recreate necessary configuration data in the target JIRA. (§1)
  2. Rename source JIRA usernames to match target JIRA usernames as far as possible, to avoid creating duplicates. (§2)
  3. Perform the Project Import. (§3)
  4. Copy over user-related data. (§4)
  5. Tidying up (establishing redirects). §5)

Tools, or Why the PostgreSQL obsession?


Merging JIRAs is essentially a very big data manipulation and transfer job. I initially thought this would best be done with scripting languages (Python, Ruby and shell), by constructing some in-memory data structure from the old JIRA data, then 'serializing' it into the new JIRA. After various experiments, though, I concluded that this was madness. The overhead of translating between relational and object models is too high. Better to keep the data in relational form, and perform what little procedural logic there is with PL/pgSQL, PostgreSQL's default procedural language. I did use a scripting language in one place, to parse and transform search request XML.

This approach of using schemas to enable cross-database queries and PL/pgSQL to copy data across worked out very nicely. As you'll see, I developed a library of functions that will copy data from one JIRA database to another, merging with existing data where required, not creating duplicate copies when run twice (ie. idempotent). After a few days writing functions to copy everything user-related, I was very tempted to begin functions to copy project-related data, and make the Project Importer obsolete. I highly recommend PL/pgSQL + schemas for any copy-data-between-databases work.

If you are faced with the task of consolidating JIRAs, the chances are you aren't using PostgreSQL, and are cursing my PostgreSQL-specificity here. If you are using a database that supports cross-database joins (faked via schemas or otherwise), and has a PL language, and you like to tinker, try adapting these techniques to your database (and let me know!). Otherwise, don't be afraid of setting up PostgreSQL yourself and using it just for the migration. It is not hard to set up two JIRA Standalone instances pointing at PostgreSQL, import an XML dump of your production data, do the merge as described, then generate an XML dump of the merged result and import it into production.

Source code

The SQL and scripts in this article are available from from github.

1. Recreating configuration data

The Project Importer works as follows: you give it a JIRA XML export file and pick a project in it to import. The importer then checks if all non-project entities (groups, custom fields, resolutions, statuses, etc) referenced by that project exist in the target JIRA. If anything is missing, the importer tells you, eg.:

You need to manually create these missing entities (eg. create the needed custom fields and groups) and click 'Refresh Validations' to try again. Once all referenced entities are found, no more errors display and a 'Next' button appears, permitting you to do the actual import.

As the first step, you should go through this process for each project in the XML export, verifying/recreating referenced entities, but stopping short of doing the actual import.

JIRA's Project Importer does not overwrite project definitions in JIRA. This handy fact means you can create projects in the target JIRA for each to-be-imported project, hook the new project up to schemes and get everything set up just as you want, before running the Project Importer to import actual data.

2. Preventing duplicate user accounts

The worst effect of having multiple systems is that leaves you with multiple sets of user accounts, partly overlapping. A user might have registered 'john' on one JIRA, then forgotten about it and registered 'jsmith' on the other JIRA. If we do a straighforward JIRA import, our unified JIRA will have both 'john' and 'jsmith' accounts, and whichever is used next, our user will be left wondering "where are the issues I raised?"

To prevent duplicate accounts for equivalent but non-identical usernames, we want to identify equivalent usernames, and rename the source JIRA username to the name of the the target JIRA username, so that the Project Importer treats them as identical. Ie. if 'john' in the source JIRA is equivalent to 'jsmith' in the target JIRA, then rename 'john' to 'jsmith' in the source JIRA. Then the project importer will not create a duplicate username.

An outline of how we will proceed with this:

2.1 Cross-database queries

In pseudo-SQL, what we want to do is "select the Struts usernames and main JIRA usernames whose email addresses match, but whose usernames don't". This requires a query across two databases, something PostgreSQL doesn't support. However PostgreSQL does support schemas, a namespacing mechanism which effectively allows us to store two different JIRA 'databases' within one actual PostgreSQL database (one per schema).

First let's create a working copies of the primary JIRA database and the database we're going to merge into it:

postgres@brutus:~$ createdb jira_merge_3135
postgres@brutus:~$ pg_dump jira_3135 | psql -q jira_merge_3135
postgres@brutus:~$ createdb jira_merge_struts_3135
postgres@brutus:~$ pg_dump jira_struts_3135 | psql -q jira_merge_struts_3135

Now rename the schema of the Struts JIRA copy from 'public' (the default) to 'struts', and create a new empty 'public' schema:

jira_merge_struts_3135=# ALTER SCHEMA public RENAME TO struts;
ALTER SCHEMA
jira_merge_struts_3135=# CREATE SCHEMA public;
CREATE SCHEMA

and merge the renamed schema into the copy of the primary JIRA database:

postgres@brutus:~$ pg_dump jira_merge_struts_3135 | psql -q jira_merge_3135

We're now in the happy situation of being able to query and compare our sets of data:

jturner@brutus:~$ export PGDATABASE=jira_merge_3135
jturner@brutus:~$ psql
....
jira_merge_3135=> SELECT count(*) FROM struts.userbase;
 count
-------
  8873
(1 row)

jira_merge_3135=> SELECT count(*) FROM public.userbase;
 count
-------
 44742
(1 row)

jira_merge_3135=> -- Example cross-db query: check for username overlap
jira_merge_3135=> SELECT count(*) FROM public.userbase p, struts.userbase s WHERE p.username=s.username;
 count
-------
  1399
(1 row)

2.2. Seeing user details

In JIRA, each user has an entry in the userbase table:

jira_merge_3135=> SELECT * FROM public.userbase WHERE username='jefft';
    id    | username |                                      password_hash
----------+----------+------------------------------------------------------------------------------------------
 12310020 | jefft    | 5OPUalT64Cq3zJfnf/GcLQ8E036PAmLCZepRJhz2MEIo5Zj/yekyerXCW6G32RyEvcA8l6VEgbiaDf/swGoPwQ==
(1 row)

and user details (email, full name, etc) are stored in the propertyentry/propertystring table rows. We would like to get a unified view of username, full name and email address, most importantly email address, since that's what will be comparing across databases.

For this we create a View, userdetails:

CREATE OR REPLACE VIEW userdetails AS
	SELECT userbase.id, userbase.username, ps1.propertyvalue AS email, ps2.propertyvalue AS fullname, userbase.password_hash
	FROM userbase, propertyentry pe1, propertystring ps1, propertyentry pe2, propertystring ps2
	WHERE pe1.id = ps1.id AND pe2.id = ps2.id AND pe1.entity_name::text = 'OSUser'::text
		AND pe2.entity_name::text = 'OSUser'::text AND pe1.entity_id = userbase.id AND pe2.entity_id = userbase.id
		AND pe1.property_key::text = 'email'::text AND pe2.property_key::text = 'fullName'::text;

We want to create this view in both struts and public schemas. We can do this by setting the {{search_path|http://www.postgresql.org/docs/8.3/static/ddl-schemas.html#DDL-SCHEMAS-PATH] just before running the create view DDL:

jefft@brutus:~$ psql
jira_merge_3135=> SET search_path = public;
SET
jira_merge_3135=> \i userdetails.sql
CREATE VIEW
jira_merge_3135=> SET search_path = struts;
SET
jira_merge_3135=> \i userdetails.sql
CREATE VIEW
jira_merge_3135=> SELECT * FROM public.userdetails WHERE username='jefft';
    id    | username |      email       |  fullname   |                                      password_hash
----------+----------+------------------+-------------+------------------------------------------------------------------------------------------
 12310020 | jefft    | jefft@apache.org | Jeff Turner | 5OPUalT6ECq3zJfnf/GcLQ8E036PAmLCZepRJh2MEIo5Zj/yekyerXCW6G32RsEvcA8l6VEgbiaDf/swGoPwQ==
(1 row)

jira_merge_3135=>

We now have a userdetails view in each schema that shows the user's username, email and full name. We can now construct a SQL query across userdetails views for old and new JIRAs, to discover equivalent usernames.

2.3 Finding users to rename

We will consider an account's email address to be its primary identifying feature. So for instance, if 'jsmith' in one JIRA has the same email address as 'john' in another JIRA, we'll consider them identical, and a candidate for renaming.

However it's not all simple. There may be more than one username with a particular email address in the old JIRA, and more than one in the new JIRA. Here are the various situations we might encounter:

Use case

Username(s) in old JIRA

Username(s) in new JIRA

Desired action

1

jsmith

john.smith

rename 'jsmith' to 'john.smith'

2

jsmith, john.smith

john.smith

rename 'jsmith' to 'john.smith', ie. merge accounts

3

jsmith

jsmith, john.smith

do NOT rename

4

jsmith

john.smith, jsmith@xyz.com

Pick most recently created new username to rename to.

5

jsmith, john.smith

jsmith, john.smith

do NOT rename

Now to formulate a SQL query identifying equivalent old and new username pairs.

For the first case, this query would do:

SELECT s.username as oldusername, p.username as newusername
	FROM struts.userdetails s, public.userdetails p
	WHERE s.email = p.email AND s.username != p.username;

Use case

Username(s) in old JIRA

Username(s) in new JIRA

Desired action

Query result

Correct?

1

jsmith

john.smith

rename 'jsmith' to 'john.smith'

oldusername newusername
jsmith john.smith

(tick)

2

jsmith, john.smith

john.smith

rename 'jsmith' to 'john.smith', ie. merge accounts

oldusername newusername
jsmith john.smith

(tick)

3

jsmith

jsmith, john.smith

do NOT rename

oldusername newusername
jsmith john.smith

(error) Incorrect - we want the query to return no results. If 'jsmith' already exists in the new JIRA we want to keep it.

4

jsmith

john.smith, jsmith@xyz.com

Pick most recently created new username to rename to.

oldusername newusername
jsmith john.smith
jsmith jsmith@xyz.com

(error) Incorrect. If there is no matching username, we should pick one of the new usernames and use that.

5

jsmith, john.smith

jsmith, john.smith

do NOT rename

oldusername newusername
jsmith john.smith
john.smith jsmith

(error) Incorrect.

The query must be fixed as follows:

  • to fix cases 3 and 5: if the old username exists (with the same email address) in the new JIRA, then don't rename it.
  • to fix case 4: for new usernames with the right emails (eg. 'john.smith' and 'jsmith@xyz.com', pick just one,
    according to some heuristic.

Here is the result, when our 'case 4' heuristic for picking between usernames is 'pick the most recently created', ie.
that with the largest user id:

jira_merge_3135=> CREATE OR REPLACE VIEW struts.renameusers_view AS
	SELECT s.username as oldusername, p.username as newusername, p.email, p.fullname
	FROM struts.userdetails s, public.userdetails p
	WHERE s.email = p.email AND s.username != p.username
		and not exists (SELECT * FROM public.userdetails WHERE email=s.email and username=s.username)	--
		and p.id=(SELECT max(id) FROM public.userdetails WHERE email=p.email);

This query produces a viable but non-ideal set of usernames to rename. I made two further refinements.

2.3.1 Preferring similar usernames

As mentioned above, in the case where there are multiple target usernames ('john.smith' and 'jsmith@xyz.com'), none of which exactly match the source username ('jsmith'), we can pick whichever one we like. The query above picks the 'most recently created' username. In my case, "most recently created" username wasn't always ideal, because sometimes the most recently created account was a role account (eg. one admin created 'jirareporter' for doing SOAP queries). Rather than 'most recently created', I picked the 'most similar' username (by levenshtein distance), falling back on greatest ID only for 'equally similar' usernames:

jturner@brutus:~$ sudo apt-get install postgresql-contrib-8.4
jira_merge_3135=# \i /usr/share/postgresql/8.4/contrib/fuzzystrmatch.sql
jira_merge_3135=# CREATE OR REPLACE VIEW struts.renameusers_view AS
	SELECT s.username as oldusername, p.username as newusername, p.email, p.fullname
	FROM struts.userdetails s, public.userdetails p
	WHERE s.email = p.email AND s.username \!= p.username
		AND NOT EXISTS (SELECT id FROM public.userdetails WHERE email=s.email and username=s.username)
		AND p.id = (SELECT id FROM public.userdetails WHERE email=p.email ORDER BY levenshtein(username, s.username) ASC, id DESC LIMIT 1);

In the 'jsmith' example, the view now says to rename 'jsmith' to 'john.smith', because it is more similar than 'jsmith@xyz.com':

jira_merge_3135=> SELECT levenshtein('jsmith', 'john.smith');
levenshtein
-------------
4
(1 row)

jira_merge_3135=> SELECT levenshtein('jsmith', 'jsmith@xyz.com');
levenshtein
-------------
8
(1 row)

2.3.2 Avoiding renaming highly used Struts usernames

If someone has created 100 issues in Struts as 'jsmith', and added 1 comment on the main JIRA as 'jsmith@xyz.com', then technically we could rename 'jsmith' to 'jsmith@xyz.com' to ensure there is just one username, but it is likely that the user prefers 'jsmith', and has perhaps even forgotten about 'jsmith@xyz.com'. Thus when deciding whether to rename an account, we would like to take into account the degree of usage of old and new usernames.

userusages.sql defines a view which counts up occurrences of each type of JIRA data for each user:

jira_merge_3135=> set search_path=public; \i userusages.sql
SET
CREATE VIEW
jira_merge_3135=> set search_path=struts; \i userusages.sql
SET
CREATE VIEW
jira_merge_3135=> set search_path=public;
jira_merge_3135=> SELECT * FROM userusages WHERE username='jefft';
id    | username | assignees | reporters | comments | changes | searchrequests | columnlayouts | filtersubscriptions | portalpages | roletypeactors | favouriteportalpages | favouritesearchrequests
    ----------+----------+-----------+-----------+----------+---------+----------------+---------------+---------------------+-------------+----------------+----------------------+-------------------------

12310020 | jefft    |        73 |        74 |      326 |    6733 |              9 |             1 |                   1 |           1 |              2 |                    1 |                       9
(1 row)

We now incorporate this userusages table into our (levenshtein-enhanced) query:

jira_merge_3135=> CREATE OR REPLACE VIEW struts.renameusers_view AS
    SELECT s.username as oldusername, p.username as newusername, p.email, p.fullname,
		su.assignees+su.reporters+su.comments as oldactivity, pu.assignees+pu.reporters+pu.comments as newactivity
    FROM struts.userdetails s, struts.userusages su, public.userdetails p, public.userusages pu
    WHERE s.email = p.email AND s.username != p.username
        AND s.username=su.username AND p.username=pu.username
        and not exists (SELECT id FROM public.userdetails WHERE email=s.email and username=s.username)
        and p.id = (SELECT id FROM public.userdetails WHERE email=p.email ORDER BY levenshtein(username, s.username) ASC, id DESC LIMIT 1);

CREATE VIEW
jira_merge_3135=> SELECT * FROM struts.renameusers_view ORDER BY oldactivity DESC LIMIT 10;
       oldusername        |    newusername     |         email         |     fullname      | oldactivity | newactivity
--------------------------+--------------------+-----------------------+-------------------+-------------+-------------
 plightbo                 | plightbo@gmail.com | plightbo@gmail.com    | Patrick Lightbody |        1209 |           7
 craig.mcclanahan@sun.com | craigmcc           | craigmcc@apache.org   | Craig McClanahan  |        1018 |          53
 niallp@apache.org        | niallp             | niallp@apache.org     | Niall Pemberton   |         676 |        1393
 wesw                     | wesw@wantii.com    | wesw@wantii.com       | Wes Wannemacher   |         427 |           1
 rleland@apache.org       | rleland            | rleland@apache.org    | Robert Leland     |         173 |          64
 mbogaert                 | pathos             | m.bogaert@memenco.com | Mathias Bogaert   |         153 |           7
 matt@smallleap.com       |  smallleap         | matt@smallleap.com    | matt baldree      |         123 |           0
 wsmoak@apache.org        | wsmoak             | wsmoak@apache.org     | Wendy Smoak       |          96 |         134
 hermod@opstvedt.com      | hermod             | hermod@opstvedt.com   | Hermod Opstvedt   |          87 |           1
 manklu@web.de            | manklu             | manklu@web.de         | Manfred Klug      |          86 |           7
(10 rows)

jira_merge_3135=>

There, for instance, we can see that 'plightbo' has 1209 Struts issues, comments and assignments, and 'plightbo@gmail.com' has only 7, so we should probably import 'plightbo' as-is. Even more so with 'wesw' (427 changes) vs. 'wesw@wantii.com' (1 change).


To nail down exactly which users we want to rename, let's create a materialized copy of our renameusers_view view in the renameusers table. This will allow us to see what was renamed after the rename operation changes the tables underlying the view. Regarding the activity stats, I decided to rename all users except those with distinctly more (7 items) Struts activity than main JIRA activity.

jira_merge_3135=> SELECT * INTO struts.renameusers FROM struts.renameusers_view WHERE oldactivity-newactivity <= 7;
SELECT
jira_merge_3135=> SELECT oldusername,newusername,oldactivity,newactivity FROM renameusers ORDER BY newactivity-oldactivity desc;
           oldusername            |           newusername            | oldactivity | newactivity
----------------------------------+----------------------------------+-------------+-------------
 olegk@apache.org                 | olegk                            |           1 |        3127
 ammulder@alumni.princeton.edu    | ammulder                         |           1 |        1476
 rdonkin@apache.org               | robertburrelldonkin              |           1 |        1470
 jcompagner@j-com.nl              | jcompagner                       |          11 |         916
 mkienenb@gmail.com               | mkienenb                         |           6 |         909
 niallp@apache.org                | niallp                           |         676 |        1393
 joes@apache.org                  | joes                             |           0 |         576
...
(342 rows)
jira_merge_3135=>

2.4 Renaming users

It isn't actually possible to rename users in JIRA, because the database is not normalized, and usernames are scattered throughout the data model. This is a subject of much righteous ire and voting on the JIRA feature request. Of the various attempts at SQL scripts attached to that issue, I picked this one, but 1) without the Oracle-specific way of updating changeitems, 2) deleting possible duplicate rows if user X is renamed to Y, but Y already exists, c) wrapping everything in a PL/pgSQL function, for ease of use and to make it transactional. The end result is renameusers_3.13.5.sql.

If you look at renameusers_3.13.5.sql, you'll see it uses the renameusers table as the list of users to rename. Our renameusers table was created in the step above, so all we have to do is run renameusers_3.13.5.sql, setting search_path to have struts first so that we're renaming Struts users:

jira_merge_3135=> set search_path=struts,public;
SET
jira_merge_3135=> SELECT count(*) FROM renameusers;
 count 
-------
342
(1 row)
jira_merge_3135=> \i renameusers_3.13.5.sql
CREATE FUNCTION
jira_merge_3135=> SELECT renameusers();
 renameusers
-------------

(1 row)

That's it. The renameusers() function works within a transaction (as do all PL/pgSQL functions), so if it fails halfway your database is not left in an inconsistent state.

Only run renameusers() once. Running it twice might pick up a few more renames if some {{renameuser.newusername}}s match up with {{renameuser.oldusername}}s, which isn't a problem in terms of correctness, but just complicates things.

The Struts content has now been renamed to match main JIRA usernames wherever possible. We can proceed with the JIRA project import. Note that once we have finished the import, you may wish to email these renamed users telling them which username to use.

3. Doing the Project Import

3.1 Generate an XML backup

Now that we have our Struts data all sorted, we can generate an XML export via JIRA, and import the projects one at a time into the main JIRA.

To do this, set up two JIRA Standalone instances configured to connect to PostgreSQL. Edit conf/server.xml of each to ensure the ports don't clash, and configure the datasource in both to point to the same merge database:

	  <Resource name="jdbc/JiraDS" auth="Container" type="javax.sql.DataSource"
		username="jira"
		password="....."
		driverClassName="org.postgresql.Driver"
		url="jdbc:postgresql://localhost/jira_merge_3135"
		maxActive="20" />

In the secondary JIRA (in my case, struts), edit atlassian-jira/WEB-INF/classes/entityengine.xml and change schema-name="public" to schema-name="struts".

Before we start up our JIRA copies, let's also change the JIRA index path so that our copy does not clobber the live JIRA's path. In my case:

jira_merge_3135=> set search_path = struts; update propertystring set propertyvalue='/var/local/jira/indexes/struts_renamed/3.13.5' where id=(SELECT id FROM propertyentry WHERE property_key ='jira.path.index');
SET
UPDATE 1
jira_merge_3135=> set search_path = public; update propertystring set propertyvalue='/var/local/jira/indexes/mainjira_renamed/3.13.5' where id=(SELECT id FROM propertyentry WHERE property_key ='jira.path.index');
SET
UPDATE 1

Start up both JIRAs. In the secondary JIRA go to Administration -> Backup data to XML and generate an XML backup somewhere on disk.

3.2 Import the projects

For each source JIRA project, in the main JIRA:

  1. go to Administration -> Project Import
  2. select the XML backup you generated and that JIRA's attachment path
  3. import a project
  4. repeat for each project

4. Copying user data

Now we can see what the Project Importer has and hasn't imported:

  • The Project Importer imports users related to the project (assignees, reporters, commenters, etc), but it won't import users who have been 'read-only' so far, eg. who have just created a saved search or custom dashboard.
  • When the Project Importer does import users, it doesn't import their group memberships or saved searches.
  • Project categories aren't imported.
  • Non-standard properties, in our case storing attachment copyright information, are not preserved.

If we now compare usernames in the two databases, we can see that the project importer has imported some users, in this example taking the in-common total from 1399 to 4759:

jira_merge_3135=> SELECT count(*) FROM public.userbase p, struts.userbase s WHERE p.username=s.username;
 count 
-------
4759
(1 row)

Yet there are still many users that weren't imported, because they weren't directly associated with anything the project importer imported:

jira_merge_3135=> SELECT count(*) FROM struts.userbase su LEFT JOIN public.userbase pu ON su.username=pu.username
WHERE pu.username IS NULL;
 count 
-------
4058
(1 row)

4.1 Picking active users to copy

How many of these users are worth copying over? In most cases the answer is going to be "all of them". In my case, however, the Struts JIRA has been accumulating usernames from public signups for the last 10 years (originally in Bugzilla, then in JIRA), and I don't particularly want to preserve accounts that haven't been used in any way.

To determine the account usage, refer to the userusages view we defined earlier, which lists usages for each username:

jira_merge_3135=> \i userusages.sql
CREATE VIEW
jira_merge_3135=> \d userusages
                   View "public.userusages"
         Column          |          Type          | Modifiers
-------------------------+------------------------+-----------
 id                      | numeric(18,0)          |
 username                | character varying(255) |
 assignees               | bigint                 |
 reporters               | bigint                 |
 comments                | bigint                 |
 changes                 | bigint                 |
 searchrequests          | bigint                 |
 columnlayouts           | bigint                 |
 filtersubscriptions     | bigint                 |
 portalpages             | bigint                 |
 roletypeactors          | bigint                 |
 favouriteportalpages    | bigint                 |
 favouritesearchrequests | bigint                 |
View definition:
...

jira_merge_3135=> SELECT * FROM struts.userusages LIMIT 10;
  id   | username | assignees | reporters | comments | changes | searchrequests | columnlayouts | filtersubscriptions | portalpages | roletypeactors | favouriteportalpages | favouritesearchrequests
-------+----------+-----------+-----------+----------+---------+----------------+---------------+---------------------+-------------+----------------+----------------------+-------------------------
 21253 | dhobbs   |         0 |         1 |        1 |       1 |              0 |             0 |                   0 |           0 |              0 |                    0 |                       0
 21313 | blakeday |         0 |         4 |        0 |       0 |              0 |             0 |                   0 |           1 |              0 |                    1 |                       0
...

We can see that the JIRA Project Importer has imported users that are needed for project and issue contents:

jira_merge_3135=> SELECT count(*) FROM
		(SELECT * FROM struts.activeusers WHERE reporters>0 OR assignees>0 OR comments>0) x
	LEFT JOIN
		public.userbase p
	ON x.username=p.username WHERE p.username IS NULL;
 count
-------
     0
(1 row)

Similarly, we can find show the users that haven't been imported by the Project Importer, that had some sort of user-related entity created (a dashboard, saved search, etc):

jira_merge_3135=> SELECT count(*) from
		(SELECT * FROM struts.activeusers WHERE searchrequests>0 OR columnlayouts>0 OR filtersubscriptions>0 OR portalpages>0 OR roletypeactors>0 OR favouriteportalpages>0 OR favouritesearchrequests>0) x
	LEFT JOIN
		public.userbase p
	ON x.username=p.username WHERE p.username IS NULL;
 count
-------
   125
(1 row)

Of the 4058 non-imported users, only 125 had actually done anything with their JIRA account. These are the accounts we want to import.

Let's take a static snapshot of our view, to preserve a record of what we did after the underlying tables have changed:

jira_merge_3135=> SELECT x.* INTO copied_struts_users FROM
		(SELECT username,assignees,reporters,comments,changes,searchrequests,columnlayouts,filtersubscriptions,portalpages,roletypeactors,favouriteportalpages,favouritesearchrequests FROM struts.activeusers WHERE searchrequests>0 OR columnlayouts>0 OR filtersubscriptions>0 OR portalpages>0 OR roletypeactors>0 OR favouriteportalpages>0 OR favouritesearchrequests>0) x
	LEFT JOIN
		public.userbase p
	ON x.username=p.username WHERE p.username IS NULL ;
INSERT 0 125

4.2. Copying users

In JIRA's data model, a user entity may have created, and is associated with:

  • zero or more saved searches (searchrequest table entries).
  • for each saved search, there may be a particular set of columns to display (in the columnlayout table). The user can also have one "default" column layout for regular search results.
  • zero or more "dashboards", ie. arrangements of portlets shown on the user's front page. These are stored in the portalpages table.
  • bookmarks of 'favourite' saved searches and shared dashboards. These are stored in the favouriteassociations table.
  • group memberships (virtually every user is at least a member of jira-users). These are stored in the userassociations table.

Copying a user potentially involves quite a bit of SQL. This has all been encapsulated in the copy.plpgsql library, and its prerequisite, simpleinserts.plpgsql. Source it with:

jira_merge_3135=> \i simpleinserts.plpgsql
jira_merge_3135=> \i copy.plpgsql

Then to copy a user, simply use the copyuser function:

jira_merge_3135=> select copyuser('husted');
....

To copy the users that we've identified as having not been transferred by the Project Importer:

jira_merge_3135=> select copyuser(username) from struts.copied_struts_users;
....

Finally, although the Project Importer created user accounts for users associated with the imported project, it didn't preserve their search requests, dashboards and so forth. So we run the copyuser function on these users too:

jira_merge_3135=> select copyuser(s.username) from struts.userbase s, public.userbase p where s.username=p.username ;

The copyuser function is idempotent, copying only what isn't present in the destination JIRA, allowing us to run it like this to copy across search requests and dashboards for already-created users.

4.2.1. Rewriting saved search IDs.

Although we have now copied across all user data, one problem remains: the saved searches we copied across are in the form of XML, and this XML contains embedded IDs of JIRA entities.

For example, here is the definition of a Struts saved search:

<?xml version="1.0"?>
<searchrequest name="Webwork1.3 unassigned tasks">
  <parameter class="com.atlassian.jira.issue.search.parameters.lucene.ProjectParameter">
    <projid andQuery="false">
      <value>10030</value>
    </projid>
  </parameter>
  <parameter class="com.atlassian.jira.issue.search.parameters.lucene.StatusParameter">
    <status andQuery="false">
      <value>1</value>
      <value>2</value>
      <value>3</value>
    </status>
  </parameter>
  <parameter class="com.atlassian.jira.issue.search.parameters.lucene.VersionParameter">
    <version andQuery="false">
      <value>21180</value>
      <value>-3</value>
    </version>
  </parameter>
  <sort class="com.atlassian.jira.issue.search.SearchSort">
    <searchSort field="key" order="DESC"/>
  </sort>
</searchrequest>

The ID 10030 identifies a row in the project table of the Struts database for the 'WW' project. However in the main JIRA database, the 'WW' project's ID is 12311041. Similarly, 21180 references a projectversion row that is something else in the main JIRA.

We will need to translate these IDs inside the XML, if our transferred saved searches are to work.

To aid this, the copyusers function (or actually, the copysearchrequest function it calls) adds an XML comment to each imported searchrequest row:

<!-- CONVERTME: struts.searchrequest XXXXX -->

where XXXXX is the ID of the searchrequest in the old JIRA.

We use a Ruby script, convertsearchrequests.rb, to convert the IDs in the searchrequest records. The Ruby code will find any searchrequest containing the CONVERTME marker.

jefft@brutus:~/mergestruts$ ./convertsearchrequests.rb

Now, finally, everything is converted. Start up JIRA, reindex to pick up the database changes, and you should see imported issues. In the user browser you should see all your imported usernames, and under Find Issues -> Manage, you should see any public shared filters defined in the source JIRA.

5. Tidying up

5.1 Establishing HTTPd redirects

Google says there are 690,000 references to https://issues.apache.org/struts/\* on the internet. There would be a lot of unhappy people if all those links stopped working, so we use Apache mod_rewrite to preserve the URLs.

Because the Project Importer preserves issue keys, we can simply redirect most traffic as follows:

RewriteRule ^/struts$ /jira/
RewriteRule ^/struts/(.*) /jira/$1 [R=permanent,L]

Things are more complicated when it comes to saved searches. A saved search URL looks like this:

https://issues.apache.org/struts/secure/IssueNavigator.jspa?mode=hide&requestId=10734

The ID '10734' refers to a searchrequest table row, and won't be valid in the new JIRA.

What we need is a table of mappings from old to new searchrequest IDs. You may have noticed above that convertsearchrequests.rb saves the old searchrequest ID in a comment of the new searchrequest:

<!--SearchRequest converted from id 10734 in struts database-->

This allows us to construct a SQL view, searchrequest_map.sql, which provides a mapping of old to new searchrequest IDs:

jira_merge_3135=> \i searchrequest_map.sql
CREATE VIEW
jira_merge_3135=> select * from public.searchrequest_map limit 5;
 oldid |  newid
-------+----------
 10372 | 12313796
 10375 | 12313797
 10760 | 12313805
 10768 | 12313814
 10750 | 12313815
(5 rows)

From this we create a static file of the mappings for use in Apache:

jefft@brutus:~/mergestruts$ # Note: the character after -F" is a TAB, entered with ctrl-v-tab.
jefft@brutus:~/mergestruts$ psql -F"	" -tAc "SELECT oldid, newid FROM searchrequest_map" > searchrequestmap
jefft@brutus:~/mergestruts$ head searchrequestmap
10090	12313791
10350	12313792
10452	12313789
10372	12313796
10375	12313797
10485	12313799
10530	12313800
10481	12313801
10520	12313802
10574	12313803
jefft@brutus:~/mergestruts$

We can now use the RewriteMap directive in our rewrite rules. Here are the final rewrite rules I use:

# Preserve URL references to https://issues.apache.org/struts/*, with the ability to redirect saved search URLs despite the changed IDs they contain.
RewriteMap struts_searchrequestmap txt:/usr/local/apache2-install/issues.apache.org/conf/jiraredirects/struts_jira_searchrequestmap.txt
# Sample URL to redirect: https://issues.apache.org/struts_renamed/secure/IssueNavigator.jspa?mode=hide&requestId=10734
RewriteCond %{QUERY_STRING} requestId=([0-9]+)
RewriteRule ^/struts/secure/IssueNavigator.jspa /jira/secure/IssueNavigator.jspa?requestId=${struts_searchrequestmap:%1}&oldsrid=%1&reset=false [R=permanent,L]
# Sample URLs to redirect:
# https://issues.apache.org/struts/sr/jira.issueviews:searchrequest-xml/10734/SearchRequest-10734.xml?tempMax=1000
# https://issues.apache.org/struts/sr/jira.issueviews:searchrequest-printable/10734/SearchRequest-10734.html?tempMax=1000
# https://issues.apache.org/struts/sr/jira.issueviews:searchrequest-excel-all-fields/10734/SearchRequest-10734.xls?tempMax=1000
# https://issues.apache.org/struts/sr/jira.issueviews:searchrequest-printable/10734/SearchRequest-10734.html?tempMax=1000
RewriteRule ^/struts/sr/jira.issueviews:searchrequest-(.*)/(.*)/SearchRequest-(.*)\.(.*) /jira/sr/jira.issueviews:searchrequest-$1/${struts_searchrequestmap:$2}/SearchRequest-${struts_searchrequestmap:$3}.$4 [R=permanent,L]
# Anything else, do a straight redirect
RewriteRule ^/struts$ /jira/
RewriteRule ^/struts/(.*) /jira/$1 [R=permanent,L]

6. Notifying users of renamed accounts

Near the beginning of this process, we renamed JIRA users to avoid duplicates. To avoid confusion, it would be a good idea to let these users know that their Struts contents has been imported under their existing main JIRA username.

To this end I wrote a script, createmails.rb, to generate emails for each affected user, and another to do the sending.

7. Conclusion

  • No labels