State
[progress record]: Has been claimed by Xu Jie
Proposed time: 2022/05/06
Discussion time:
Acceptance time:
Complete time:
[issues]:
[email]:
[release]:
[proposer]:
Motivation & Background
To enhance Linkis cross-cluster copy function, add distcp engine
Basic concept
- distcp: is a tool for copying within and between haodop clusters. It uses Map/Reduce for file distribution, error handling and recovery, and report generation. It takes a list of files and directories as input to map tasks, each of which will complete a copy of some of the files in the source list.
Expect to achieve goals
- Add the linkis distcp engine, which has all the functions of the distcp tool, and implements functions such as task status monitoring, task log, and engine KILL;
Implementation plan
- This engine belongs to the type of Once Job engine. For implementation, please refer to Linkis Sqoop
- Because the original distcp engine is mainly for source and destination addresses, regardless of specific tables, but for users, more choices are tables, so the table input by the user needs to be converted into the corresponding path, which requires the introduction of metadata management functions ;
- Add a mapping function to convert the parameters required by the user into the parameters required by distcp;
Things to Consider & Note:
- Do you need to consider the compatibility of the original parameter method?
Changes
Modification | Detail | |
---|---|---|
1 | Modification of maven module | |
2 | Modification of HTTP interface | |
3 | Modification of the client interface | |
4 | Modification of database table structure | |
5 | Modification of configuration item | |
6 | Modification Error code | |
7 | Modifications for Third Party Dependencies |
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?
- If we are changing behavior, how will we phase out the older behavior?
- If we require special migration tools, describe them here.
- When will we remove the existing behavior?