You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

Welcome contributors! We strive to include everyone's contributions. This page provides necessary guidelines on how to contribute effectively towards furthering the development and evolution of Sqoop.

Note: This guide applies to general contributors. If you are a committer, please read the Guide for Committers as well.


What can be contributed?

There are many ways you can contribute towards the project. A few of these are:

Jump in on discussions: It is possible that someone initiates a thread on the mailing list describing a problem that you have dealt with in the past. You can help the project by chiming in on that thread and guiding that user to overcome or workaround that problem or limitation.

File Bugs: If you notice a problem and are sure it is a bug, then go ahead and file a JIRA. If however, you are not very sure that it is a bug, you should first confirm it by discussing it on the Mailing Lists.

Review Code: If you see that a JIRA ticket has a 'Patch Available' status, go ahead and review it. It cannot be stressed enough that you must be kind in your review and explain the rational for your feedback and suggestions. Also note that not all review feedback is accepted - often times it is a compromise between the contributor and reviewer. If you are happy with the change and do not spot any major issues +1 it. More information on this is available in the following sections.

Provide Patches: We encourage you to assign the relevant JIRA issue to yourself and supply a patch for it. The patch you provide can be code, documentation, build changes, or any combination of these. More information on this is available in the following sections.

Setting up your development environment

In order to setup your development environment, you would need a Linux system with administrative privileges and Internet connection such as Ubuntu or CentOS. You would also need sufficient disk space for checking out and building the code, installing various database/other software that you may need for your testing.

Getting ready to build

Once you have your Linux system ready with sufficient disk space and Internet connection, go ahead and install the following software:

  • Subversion client and/or Git
  • The recent update of JDK 1.6
  • Recent version of make
  • Asciidoc version 8.6 or above
  • Apache Ant 1.7 or above
  • Findbugs version 1.3.9 or above
  • Latest Eclipse IDE (or your IDE/Editor of choice)

Building the Sources

To get the source code, checkout the subversion "trunk" using the following command:

$ svn co https://svn.apache.org/repos/asf/incubator/sqoop/trunk/ sqoop

If you prefer using git, you can clone the Sqoop repository from Apache Git mirror by the following command:

$ git clone git://git.apache.org/sqoop.git

Once you have the code, you can build it by the following command:

$ cd sqoop
$ ant jar-all

You can use the clean target to delete previously built files from the workspace and run jar-all again to do a fresh build.

To see a list of all available targets that are available in the build type the following command:

$ ant -p

If you prefer working in Eclipse, you can generate the necessary project definitions as follows:

ant eclipse

Once these definitions are generated, you can import them in Eclipse as an existing project.

Running Tests

Running unit tests

Sqoop source code contains many unit tests that exercise its functionality. These tests can be run simply by using the following command:

ant test

Create third-party lib directory

Create a directory somewhere convenient on your development system. This directory will hold all the JDBC drivers that the tests will use. Once created, create (or edit) the build.properties file in Sqoop workspace root directory and set the the full path of this directory as the value of the property sqoop.thirdparty.lib.dir. For example:

sqoop.thirdparty.lib.dir=/opt/ws/3rd-party-lib

Setting up and running third-party tests

Third-party tests are end-to-end integration tests that exercise the basic Sqoop functionality against third-party databases. You should run these tests in order to rule out regression when testing any changes to the core system. Before you run these tests, you must setup the following databases:

Setting up MySQL
  • Install MySQL version 5.1.x with necessary client tools. You can install the server in a different host than your development host if necessary. However, you must have the client tools available on your development host including the JDBC driver, and batch utilities such as mysqldump and mysqlimport.
  • Place the JDBC driver in the third-party lib directory that you created earlier.
  • The location of MySQL server is specified in the build.properties file by the value for the property sqoop.test.mysql.connectstring.host_url. This property defaults to jdbc:mysql://localhost/ which assumes local installation and default port setup. If however your MySQL server is installed on a different host or on a different port you should specify it explicitly as follows:
    sqoop.test.mysql.connectstring.host_url=jdbc:mysql://<mysqlhost>:<port>/
    
  • In order to run the MySQL third-party tests, you would need to configure the database as follows:
    $ mysql -u root -p
    mysql> CREATE DATABASE sqooppasstest;
    mysql> CREATE DATABASE sqooptestdb;
    mysql> use mysql;
    mysql> GRANT ALL PRIVILEGES on sqooppasstest.* TO 'sqooptest'@'localhost' IDENTIFIED BY '12345';
    mysql> GRANT ALL PRIVILEGES ON sqooptestdb.* TO 'yourusername'@'localhost';
    mysql> flush privileges;
    mysql> \q
    
  • Note:
    • If the installation of MySQL server is on a different host, you must replace the localhost with the appropriate client host value.
    • You should replace yourusername with your actual user name before issuing the command.
Setting up PostgreSQL
  • Install PostgreSQL 8.3.9 or later along with client tools. You can install the server in a different host than your development host if necessary. However, you must have the client tools available on your development host including the JDBC driver and command line utility psql.
  • Place the JDBC driver in the third-party lib directory that you created earlier.
  • The location of PostgreSQL server is specified in the build.properties file by the value for the property sqoop.test.postgresql.connectstring.host_url. This property defaults to jdbc:postgresql://localhost/ which assumes local installation and default port setup. If however your PostgreSQL server is installed on a different host or on a different port you should specify it explicitly as follows:
    sqoop.test.postgresql.connectstring.host_url=jdbc:postgresql://<pgsqlhost>:<pgsqlport>/
    
  • In order to run PostgreSQL third-party tests, you would need to configure the database as follows:
    • Edit the pg_hba.conf file and setup the authentication scheme to allow for testing. In a secured environment, it may be easy to setup up full trust based access by adding the following lines in this file, and commenting out any other lines referencing 127.0.0.1 or ::1.
      local  all all trust
      host all all 127.0.0.1/32 trust
      host all all ::1/128      trust
      
    • Also in the file postgresql.conf uncomment the line that starts with listen_address and set its value to '*' as follows:
      listen_address = '*'
      
    • Restart your PostgreSQL server after modifying the configuration files above.
    • Create the necessary user and database for Sqoop testing as follows:
      $ sudo -u postgres psql -U postgres template1
      template1=> CREATE USER sqooptest;
      template1=> CREATE DATABASE sqooptest;
      tempalte1=> \q
      $
      
Setting up Oracle
  • Install Oracle 10.2.x or later and download the corresponding JDBC driver.
  • Place the JDBC driver in the third-party lib directory that you created earlier.
  • The location of Oracle server is specified in the build.properties file by the value for the property sqoop.test.oracle.connectstring. This property defaults to jdbc:oracle:thin:@//localhost/xe which assumes local installation and default port setup. If however your Oracle server is installed on a different host or on a different port you should specify it explicitly as follows:
    sqoop.test.oracle.connectstring=jdbc:oracle:thin:@//<oraclehost>:<port>/<sid>
    
  • In order to run Oracle third-party tests, you would need to configure the database as follows:
    $ sqlplus system/<password>@<sid>
    SQL> CREATE USER SQOOPTEST identified by 12345;
    SQL> GRANT CONNECT, RESOURCE to SQOOPTEST;
    SQL> CREATE USER SQOOPTEST2 identified by ABCDEF;
    SQL> GRANT CONNECT, RESOURCE to SQOOPTEST2;
    SQL> exit
    $
    
  • Note: If you are using Oracle XE and see an error like ORA-12516, TNS:listener could not find available handler with matching protocol stack, you are likely running into connection exhaustion problem. To circumvent this, log into the Oracle server as SYSTEM, run the command below and restart your server.
    $ sqlplus system/<password>@<sid>
    SQL> ALTER SYSTEM SET processes=200 scope=spfile;
    SQL> exit
    $
    
Running third-party tests

Once you have installed and configured all the above databases - MySQL, PostgreSQL and Oracle, you are now ready to run the third-party tests. To run them issue the following command:

$ ant test -Dthirdparty=true

Setting up and running manual tests

Certain third-party tests are categorized as Manual tests since these were introduced at a later stage and adding them to the third-party suite of tests would have resulted in ever test environment requiring new database installation.

Setting up SQL Server
  • Install SQL Server Express 2008 R2 or above.
  • Download and place the JDBC driver in the third-party lib directory that you created earlier.
  • The location of SQL server is specified in the build.properties file by the value for the property sqoop.test.sqlserver.connectstring.host_url. This property defaults to jdbc:sqlserver://sqlserverhost:1433 which assumes installation on a host called sqlserverhost and port 1433 setup. If however your SQL server is installed on a different host or on a different port you should specify it explicitly as follows:
    sqoop.test.sqlserver.connectstring.host_url=jdbc:sqlserver://<sqlserverhost>:<port>
    
  • In order to run SQL server manual tests, you would need to configure the database as follows:
    • Create a database called SQOOPTEST.
    • Create a login with name SQOOPUSER and password PASSWORD.
    • Grant all access for database SQOOPTEST to the login SQOOPUSER.
Setting up DB2 Server
  • Install DB2 9.74 Express C.
  • Download and place the JDBC driver in the third-party lib directory that you created earlier.
  • The location of DB2 server is specified in the build.properties file by the value for the property sqoop.test.db2.connectstring.host_url. This property defaults to jdbc:db2://db2host:50000 which assumes installation on a host called db2host and port 50000 setup. If however your DB2 server is installed on a different host or on a different port you should specify it explicitly as follows:
    sqoop.test.db2.connectstring.host_url=jdbc:db2://<db2host>:<port>
    
  • In order to run DB2 server manual tests, you would need to configure the database as follows:
    • Create a database called SQOOP.
    • Create a login SQOOP with password PASSWORD.
    • Grant all access for database SQOOP to login SQOOP.
Running manual tests

Once you have installed and configured all the above databases - SQL Server and DB2, you are now ready to run the manual tests. To run them, issue the following command:

$ ant test -Dmanual=true

Building documentation

To build Sqoop documentation, run the following command from the workspace root directory:

$ ant docs

This will generate the documentation in the directory build/docs directory. To see the documentation, open the file build/docs/index.html in a web browser, where you will find the links to user and developer guides. All the man pages that are generated by this are available directly under build/docs directory with the extension <name>.1.gz. You can look at these man pages without installing them by the following comamnd:

$  man -l sqoop.1.gz

Building tar-ball

To build the tar-ball for distribution, use the following command:

$ ant tar

This will produce a tar-ball distribution file with a name sqoop-<version>.tar.gz under the build directory.

Reviewing Code

Sqoop uses the Apache Review Board for doing code reviews. In order for a change to be reviewed, it should be either posted on the review board or attached to the JIRA. If the change is a minor change affecting only few lines and does not seem to impact main logic of the affected sources, it need not be posted on the review board. However, if the code change is large or otherwise impacting the core logic of the affected sources, it should be posted on the review board. Feel free to comment on the JIRA requesting the assignee to post the patch for review on review board.

Note: Not all patches attached to a JIRA are ready for review. Sometimes the patches are attached just to solicit early feedback regarding the implementation direction. Feel free to look it over and give your feedback in the JIRA as necessary. Patches are considered ready for review either when the patch has been posted on review board, or the JIRA status has been changed to 'Patch Available'.

Goals for Code Reviews

The net outcome from the review should be the same - which is to ensure the following:

  • Bugs/Omissions/Regressions are caught before the change is committed to the source control.
  • The change is subjected to keeping the quality of code high so as to make the overall system sustainable. The implementation of the change should be easily readable, documented where necessary, and must favor simplicity of implementation.
  • Changes are evaluated from the perspective of a consumer (the reviewer) as opposed to the developer, which often brings out subtleties in the implementation that otherwise go unnoticed.
  • The change should be backward compatible and not require extensive work on existing installations in order for it to be consumed. There are exceptions to this in some cases like when work is done on a major release, but otherwise backward compatibility should be upheld at all times. If you are not clear, raise it is as a concern to be clarified during the review.

Code review guidelines

Following are some guidelines on how to do a code review. You may use any other approach instead as long as the above stated goals are met. That said, here is an approach that works fine generally:

  • Understand the problem being solved: This often requires going through the JIRA comments and/or mailing list threads where the discussion around the problem has happened in the past. Look for key aspects of the problem such as how it has impacted the users and what, if any, is the suggested way to solve it. You may not find enough information regarding the problem in some cases, in which case - feel free to ask for clarification from the developer contributing the change.
  • Think about how you would solve the problem: There are many ways to solve any code problem, with different ways having different merits. Before proceeding to review the change, think through how you would solve the problem if you were the one implementing the solution. Note the various aspects of the problem that your solution might have. Some such aspects to think about are - impact on backward compatibility, overall usability of the system, any impact on performance etc.
  • Evaluate the proposed change in contrast to your solution: Unless the change is obvious, it is likely that the implementation of the change you are reviewing is very different from the solution you would go for. Evaluate this change on the various aspects that you evaluated your solution on in the previous step. See how it measures up and give feedback where you think it could be improved.
  • Look for typical pitfalls: Read through the implementation to see if: it needs to be documented at places where the intention is not clear; if all the boundary conditions are being addressed; if the code is defensive enough; if any bad idioms have leaked in such as double check locking etc. In short, check for thinks that a developer is likely to miss in their own code which are otherwise obvious to someone trying to read and understand the code.
  • See if the change is complete: Check if the change is such that it affects the user interface. If it does, then the documentation should likely be updated. What about testing - does it have enough test coverage or not? What about other aspects like license headers, copyright statements etc. How about checkstyle and findbugs - did they generate new warnings? How about compiler warnings?
  • Test the change: It is very easy to test the change if you have the development environment setup. Run as many tests as you want with the patch. Manually test the change for functionality that you think is not fully covered via the associated tests. If you find a problem, report it.

How to give feedback

Once you have collected your comments/concerns/feedback you need to send it to back to the contributor. In doing so, please be as courteous as possible and ensure the following:

  • Your feedback should be clear and actionable. Giving subjective/vague feedback does not add any value or facilitate a constructive dialog.
  • Where possible, suggest how your concern can be addressed. For example if your testing revealed that a certain use-case is not satisfied. It is ok to state that as is, but it would be even better if you could suggest how the developer can address it. Present your suggestion as a possible solution rather than the solution.
  • If you do not understand part of the change, or for some reason were not able to review part of the change, state it explicitly so as to encourage other reviewers to jump in and help.

Once you have provided your feedback, wait for the developer to respond. It is possible that the developer may need further clarification on your feedback, in which case you should promptly provide it where necessary. In general, the dialog between the reviewer and developer should lead to finding a reasonable middle ground where key concerns are satisfied and the goals of the review have been met.

If a change has met all your criteria for review, please +1 the change to indicate that you are happy with it.

Providing Patches

In order to provide patches, please follow the following guidelines:

  • Make sure there is a JIRA: If you are working on fixing a problem that already has an associated JIRA, then go ahead and assign it to yourself. If it is already assigned to someone else, it will be better if you check with the current assignee before moving over to your queue. If the current assignee has already worked out some part of the fix, suggest that you can take that change over from them and complete the remaining parts.
  • Attach the patches as you go through development: While small fixes are easily done in a single patch, it is preferable that you attach patches to the JIRA as you go along. This serves as an early feedback mechanism where interested folks can look it over and suggest changes where necessary. It also ensures that if for some reason you are not able to find the time to complete the change, someone else can take up your initial patches and drive them to completion.
  • Submission checklist: Here is a checklist of things you should address before you post your patch for review:
    • Change should be clean, well-formatted, and readable. Please use two space indents, and space instead of tabs.
    • Make sure you have considered how to handle boundary condition cases and have sufficiently defensive code where necessary.
    • Add comments or java-docs where necessary.
    • Make sure that you have run checkstyle and findbugs and eliminated all warnings related to your change.
    • Is your change covered by any test case? If not, add a test case.
    • If your change affects a user interface, make sure you have updated the documentation accordingly.
    • If your change affects the development environment, make sure you update the COMPILING.txt and README files.
  • Test your changes before submitting a review: Before you make the JIRA status as 'Patch Available', please test your changes thoroughly. Make sure that all tests are passing and that your the functionality you have worked on is tested through existing or new tests.
  • Submitting a patch: To submit a patch, first make sure that you have attached it to the JIRA and changed the status of the JIRA to 'Patch Available'. If the change is non-trivial, then please also post the patch for review on review board. The commands to generate the patch are:
    $ svn diff > /path/to/patch-file.patch
    
    or
    $ git diff --no-prefix > /path/to/patch-file.patch
    
  • Identify a reviewer: When posting on review board, identify at least one reviewer. You can pick any of the project committers for review. Note that identifying a reviewer does not stop from others from reviewing your change. Be prepared for having your change reviewed by others at any time. If you have posted your change for review and no one has had a chance to review it yet, you can gently remind everyone by dropping a note on the developer mailing list with a link to the review.
  • Work with reviewers to get your change fleshed out: When your change is reviewed, please engage with the reviewer via JIRA or review board to get necessary clarifications and work out other details. The goal is to ensure that the final state of your change is acceptable to the reviewer so that they can +1 it.
  • No labels