Date: Tue, 19 Mar 2024 06:18:56 +0000 (UTC) Message-ID: <510924750.54521.1710829136695@cwiki-he-fi.apache.org> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_54520_1193819343.1710829136695" ------=_Part_54520_1193819343.1710829136695 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
This page outlines the process to employ the namespace migration=
of Sqoop from com.cloudera.sqoop
to org.ap=
ache.sqoop
. The specific JIRA issues tracking this work are fil=
ed under SQOOP-369.
Note: The information provided here is general informat= ion. Any detail specific to a particular aspect of this migration should be= noted in the relevant JIRA issue.
Migration of Sqoop source code from com.cloudera.sqoop
<=
/em> to org.apache.sqoop
is a prerequisite for releas=
ing Sqoop under Apache Incubator. Considering that there is a lot of third =
party code that is developed on top of/to work with Sqoop, this migration i=
s particularly risky for backward compatibility and thus requires careful h=
andling. This document outlines the steps that seem reasonable for such mig=
ration.
If you are helping out with this migration and feel that the steps outli= ned here are not valid in some scenarios, please initiate the discussion on= the developer mailing list and help update this document.
When migrating a class from its previous namespace to the new, the key r= equirement to address is that any code written to the old class should stil= l work at binary compatibility level. This also implies that such code shou= ld be able to recompile with the migrated code base without any mod= ifications. In order to enable this, the general approach is as fo= llows:
Note that API is not just method signatures but includes all aspects of = implementation such as class hierarchies, type compatibility, static and no= n-static state etc. The following sections outline these steps as they appl= y to various API scenarios.
Consider the class:
package com.c= loudera.sqoop.io; public class NamedFifo { public NamedFifo(String pathname) { ... } public void create() { ... } }
To migrate such a class, do the following:
Here is the outcome:
// Old class package com.cloudera.sqoop.io; /** * @deprecated use org.apache.sqoop.io.NamedFifo instead. * @see org.apache.sqoop.io.NamedFifo */ public class NamedFifo extends org.apache.sqoop.io.NamedFifo { public NamedFifo(String pathname) { super(pathname); } } // New class package org.apache.sqoop.io; public class NamedFifo { public NamedFifo(String pathname) { ... } public void create() { ... } }
Consider the following class:
package com.c= loudera.sqoop.lib; import java.io.Closeable; public class LargeObjectLoader implements Closeable { public static final long DEFAULT_MAX_LOB_LENGTH =3D 16 * 1024 * 1024; =20 @Override public void close() throws IOException { ... } }
To migrate this, do the following:
// Old class package com.cloudera.sqoop.lib; /** * @deprecated use org.apache.sqoop.lib.LargeObjectLoader instead. * @see org.apache.sqoop.lib.LargeObjectLoader */ public class LargeObjectLoader extends org.apache.sqoop.lib.LargeObjectLoad= er { public static final long DEFAULT_MAX_LOB_LENGTH =3D=20 =09org.apache.sqoop.lib.LargeObjectLoader.DEFAULT_MAX_LOB_LENGTH; ... } // New class package org.apache.sqoop.lib; import java.io.Closeable; public class LargeObjectLoader implements Closeable { public static final long DEFAULT_MAX_LOB_LENGTH =3D 16 * 1024 * 1024; =20 @Override public void close() throws IOException { ... } }
A utility class is a simple class that has a private constructor and all= methods as statics. Here is an example:
package com.c= loudera.sqoop.lib; import ...; public final class BigDecimalSerializer { private BigDecimalSerializer() { } static final BigInteger LONG_MAX_AS_BIGINT =3D 16 * 1024 * 1024; ... public static void write(BigDecimal d, DataOutput out) throws IOException= { ... }
To migrate this class, do the following:
Here is how it will look like:
// New class package org.apache.sqoop.lib; import ...; public final class BigDecimalSerializer { private BigDecimalSerializer() { } public static final BigInteger LONG_MAX_AS_BIGINT =3D 16 * 1024 * 1024; ... public static void write(BigDecimal d, DataOutput out) throws IOException= { ... } } // Old class /** * @deprecated use org.apache.sqoop.lib.BigDecimalSerializer instead. * @see org.apache.sqoop.lib.BigDecimalSerializer */ public final class BigDecimalSerializer { private BigDecimalSerializer() { } static final BigInteger LONG_MAX_AS_BIGINT =3D org.apache.sqoop.lib.BigDecimalSerializer.LONG_MAX_AS_BIGINT; public static void write(BigDecimal d, DataOutput out) throws IOException= { org.apache.sqoop.lib.BigDecimalSerializer.write(d, out); } ... }
Concrete singletons are classes with static state that is bound to concr= ete implementation. Consider the class below:
package com.c= loudera.sqoop.io; import ... public final class LobReaderCache { private Map<Path, LobFile.Reader> readerMap; private LobReaderCache() { this.readerMap =3D new TreeMap<Path, LobFile.Reader>(); } private static final LobReaderCache CACHE; static { CACHE =3D new LobReaderCache(); } public static LobReaderCache getCache() { return CACHE; } public static Path qualify(Path path, Configuration conf) throws IOException { ... } public LobFile.Reader get(Path path, Configuration conf) throws IOException { ... } }
In such cases you can only partially migrate it the class to the new nam= espace. Specifically, the non-static part can be migrated using inheritance= but the static/singleton aspect will remain in the old class. The steps in= volved are the following:
The outcome will be something like the following:
// Old class package com.cloudera.sqoop.io; import ... /** * @deprecated use org.apache.sqoop.io.LobReaderCache instead. * @see org.apache.sqoop.io.LobReaderCache */ public final class LobReaderCache extends org.apache.sqoop.io.LobReaderCach= e { public static final Log LOG =3D org.apache.sqoop.io.LobReaderCache.LOG; private static final LobReaderCache CACHE; static { CACHE =3D new LobReaderCache(); } public static LobReaderCache getCache() { return CACHE; } public static Path qualify(Path path, Configuration conf) throws IOException { return org.apache.sqoop.io.LobReaderCache.qualify(path, conf); } =20 // non-static API all migrated to base class } // New class package org.apache.sqoop.io; import ... public class LobReaderCache { private Map<Path, LobFile.Reader> readerMap; protected LobReaderCache() { this.readerMap =3D new TreeMap<Path, LobFile.Reader>(); } public LobFile.Reader get(Path path, Configuration conf) throws IOException { ... } public static Path qualify(Path path, Configuration conf) throws IOException { ... } }
When migrating class hierarchies, you must create a parallel class-hiera= rchy in the new namespace using every migrated class. However, the importan= t aspect of this migration is that in order to preserve the type-compatibil= ity, you will have to have the new namespace classes inherit from their old= super classes. Consider the following two class hierarchy:
// Class 1 package com.cloudera.sqoop.lib; import ... public abstract class LobRef<DATATYPE, CONTAINERTYPE, ACCESSORTYPE> implements Closeable, Writable { ... } // Class 2 package com.cloudera.sqoop.lib; import ... public class ClobRef extends LobRef<String, String, Reader> { ... }
To migrate this over to new namespace do the following:
The outcome for the above example will be as follows:
// Old class = 1 package com.cloudera.sqoop.lib; import ... /** * @deprecated use org.apache.sqoop.lib.LobRef instead. * @see org.apache.sqoop.lib.LobRef */ public abstract class LobRef<DATATYPE, CONTAINERTYPE, ACCESSORTYPE> extends org.apache.sqoop.lib.LobRef<DATATYPE, CONTAINERTYPE, ACCESSO= RTYPE> { ... } // Old class 2 package com.cloudera.sqoop.lib; /** * @deprecated use org.apache.sqoop.lib.ClobRef instead. * @see org.apache.sqoop.lib.ClobRef */ public class ClobRef extends org.apache.sqoop.lib.ClobRef { ... } // New class 1 package org.apache.sqoop.lib; import ... public abstract class LobRef<DATATYPE, CONTAINERTYPE, ACCESSORTYPE> implements Closeable, Writable { ... } // New class 2 package org.apache.sqoop.lib; import ... public class ClobRef extends com.cloudera.sqoop.lib.LobRef<String, String, Reader> { ... }
Doing this will ensure that the type compatibility is preserved for all = instances of the old classes with respect to their type hierarchies.