This page outlines the process to employ the namespace migration of Sqoop from com.cloudera.sqoop to org.apache.sqoop. The specific JIRA issues tracking this work are filed under SQOOP-369.

Note: The information provided here is general information. Any detail specific to a particular aspect of this migration should be noted in the relevant JIRA issue.



Overview

Migration of Sqoop source code from com.cloudera.sqoop to org.apache.sqoop is a prerequisite for releasing Sqoop under Apache Incubator. Considering that there is a lot of third party code that is developed on top of/to work with Sqoop, this migration is particularly risky for backward compatibility and thus requires careful handling. This document outlines the steps that seem reasonable for such migration.

If you are helping out with this migration and feel that the steps outlined here are not valid in some scenarios, please initiate the discussion on the developer mailing list and help update this document.

Migration

General Approach

When migrating a class from its previous namespace to the new, the key requirement to address is that any code written to the old class should still work at binary compatibility level. This also implies that such code should be able to recompile with the migrated code base without any modifications. In order to enable this, the general approach is as follows:

Note that API is not just method signatures but includes all aspects of implementation such as class hierarchies, type compatibility, static and non-static state etc. The following sections outline these steps as they apply to various API scenarios.

Simple Public Class

Consider the class:

package com.cloudera.sqoop.io;

public class NamedFifo {

   public NamedFifo(String pathname) {
     ...
   }

   public void create() {
     ...
   }
}

To migrate such a class, do the following:

Here is the outcome:

// Old class
package com.cloudera.sqoop.io;

/**
 * @deprecated use org.apache.sqoop.io.NamedFifo instead.
 * @see org.apache.sqoop.io.NamedFifo
 */
public class NamedFifo extends org.apache.sqoop.io.NamedFifo {

   public NamedFifo(String pathname) {
     super(pathname);
   }
}

// New class
package org.apache.sqoop.io;

public class NamedFifo {
 public NamedFifo(String pathname) {
     ...
   }

   public void create() {
     ...
   }
}

Simple Public Class with Static Constants

Consider the following class:

package com.cloudera.sqoop.lib;

import java.io.Closeable;

public class LargeObjectLoader implements Closeable {
  public static final long DEFAULT_MAX_LOB_LENGTH = 16 * 1024 * 1024;
  
  @Override
  public void close() throws IOException {
    ...
  }
}

To migrate this, do the following:

// Old class
package com.cloudera.sqoop.lib;

/**
 * @deprecated use org.apache.sqoop.lib.LargeObjectLoader instead.
 * @see org.apache.sqoop.lib.LargeObjectLoader
 */
public class LargeObjectLoader extends org.apache.sqoop.lib.LargeObjectLoader {
  public static final long DEFAULT_MAX_LOB_LENGTH = 
  	org.apache.sqoop.lib.LargeObjectLoader.DEFAULT_MAX_LOB_LENGTH;

  ...
}

// New class
package org.apache.sqoop.lib;

import java.io.Closeable;

public class LargeObjectLoader implements Closeable {
  public static final long DEFAULT_MAX_LOB_LENGTH = 16 * 1024 * 1024;
  
  @Override
  public void close() throws IOException {
    ...
  }
}

Simple Utility Class

A utility class is a simple class that has a private constructor and all methods as statics. Here is an example:

package com.cloudera.sqoop.lib;

import ...;

public final class BigDecimalSerializer {

  private BigDecimalSerializer() { }

  static final BigInteger LONG_MAX_AS_BIGINT = 16 * 1024 * 1024;

  ...

  public static void write(BigDecimal d, DataOutput out) throws IOException {
    ...
  }

To migrate this class, do the following:

Here is how it will look like:

// New class
package org.apache.sqoop.lib;

import ...;

public final class BigDecimalSerializer {

  private BigDecimalSerializer() { }

  public static final BigInteger LONG_MAX_AS_BIGINT = 16 * 1024 * 1024;

  ...

  public static void write(BigDecimal d, DataOutput out) throws IOException {
    ...
  }
}


// Old class
/**
 * @deprecated use org.apache.sqoop.lib.BigDecimalSerializer instead.
 * @see org.apache.sqoop.lib.BigDecimalSerializer
 */
public final class BigDecimalSerializer {

  private BigDecimalSerializer() { }

  static final BigInteger LONG_MAX_AS_BIGINT =
      org.apache.sqoop.lib.BigDecimalSerializer.LONG_MAX_AS_BIGINT;

  public static void write(BigDecimal d, DataOutput out) throws IOException {
    org.apache.sqoop.lib.BigDecimalSerializer.write(d, out);
  }
  ...
}

Concrete Singletons

Concrete singletons are classes with static state that is bound to concrete implementation. Consider the class below:

package com.cloudera.sqoop.io;

import ...

public final class LobReaderCache {

  private Map<Path, LobFile.Reader> readerMap;

  private LobReaderCache() {
    this.readerMap = new TreeMap<Path, LobFile.Reader>();
  }

  private static final LobReaderCache CACHE;
  static {
    CACHE = new LobReaderCache();
  }

  public static LobReaderCache getCache() {
    return CACHE;
  }

  public static Path qualify(Path path, Configuration conf)
      throws IOException {
    ...
  }

  public LobFile.Reader get(Path path, Configuration conf)
      throws IOException {
    ...
  }
}

In such cases you can only partially migrate it the class to the new namespace. Specifically, the non-static part can be migrated using inheritance but the static/singleton aspect will remain in the old class. The steps involved are the following:

The outcome will be something like the following:

// Old class
package com.cloudera.sqoop.io;

import ...

/**
 * @deprecated use org.apache.sqoop.io.LobReaderCache instead.
 * @see org.apache.sqoop.io.LobReaderCache
 */
public final class LobReaderCache extends org.apache.sqoop.io.LobReaderCache {

  public static final Log LOG = org.apache.sqoop.io.LobReaderCache.LOG;

  private static final LobReaderCache CACHE;
  static {
    CACHE = new LobReaderCache();
  }

  public static LobReaderCache getCache() {
    return CACHE;
  }

  public static Path qualify(Path path, Configuration conf)
      throws IOException {
    return org.apache.sqoop.io.LobReaderCache.qualify(path, conf);
  }
  
  // non-static API all migrated to base class
}

// New class
package org.apache.sqoop.io;

import ...

public class LobReaderCache {

  private Map<Path, LobFile.Reader> readerMap;

  protected LobReaderCache() {
    this.readerMap = new TreeMap<Path, LobFile.Reader>();
  }

  public LobFile.Reader get(Path path, Configuration conf)
      throws IOException {
    ...
  }

  public static Path qualify(Path path, Configuration conf)
      throws IOException {
    ...
  }
}

Migrating Class Hierarchies

When migrating class hierarchies, you must create a parallel class-hierarchy in the new namespace using every migrated class. However, the important aspect of this migration is that in order to preserve the type-compatibility, you will have to have the new namespace classes inherit from their old super classes. Consider the following two class hierarchy:

// Class 1
package com.cloudera.sqoop.lib;

import ...

public abstract class LobRef<DATATYPE, CONTAINERTYPE, ACCESSORTYPE>
    implements Closeable, Writable {
  ...
}

// Class 2
package com.cloudera.sqoop.lib;

import ...

public class ClobRef extends LobRef<String, String, Reader> {
  ...
}

To migrate this over to new namespace do the following:

The outcome for the above example will be as follows:

// Old class 1
package com.cloudera.sqoop.lib;

import ...
/**
 * @deprecated use org.apache.sqoop.lib.LobRef instead.
 * @see org.apache.sqoop.lib.LobRef
 */
public abstract class LobRef<DATATYPE, CONTAINERTYPE, ACCESSORTYPE>
    extends org.apache.sqoop.lib.LobRef<DATATYPE, CONTAINERTYPE, ACCESSORTYPE> {
  ...
}

// Old class 2
package com.cloudera.sqoop.lib;

/**
 * @deprecated use org.apache.sqoop.lib.ClobRef instead.
 * @see org.apache.sqoop.lib.ClobRef
 */
public class ClobRef extends org.apache.sqoop.lib.ClobRef {
  ...
}


// New class 1
package org.apache.sqoop.lib;

import ...
public abstract class LobRef<DATATYPE, CONTAINERTYPE, ACCESSORTYPE>
    implements Closeable, Writable {
  ...
}

// New class 2
package org.apache.sqoop.lib;

import ...

public class ClobRef
  extends com.cloudera.sqoop.lib.LobRef<String, String, Reader> {
  ...
}

Doing this will ensure that the type compatibility is preserved for all instances of the old classes with respect to their type hierarchies.