...
Page properties |
---|
...
|
Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-108-Add-GPU-support-in-Flink-td38286.html
...
|
...
|
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
Introduce the external resource framework for external resource allocation and management. The pattern of configuration options is:
- external-resource.listresources. Define the {resourceName} list of enabled external resources, split by delimiter ",".
- external-resource.{resourceName}.amount. Define the amount of external resources in a task executor.
- external-resource.{resourceName}.driver-factory.class. Define the class name of ExternalResourceDriverFactory.
- external-resource.{resourceName}.kubernetes.key. Optional config which defines the configuration key of that external resource in Kubernetes. If you want the Flink to request the external resource from Kubernetes(through its Device Plugin mechanism[3]), you need to explicitly set this key. Only valid for Kubernetes mode.
- external-resource.{resourceName}.yarn.key. Optional config which defines the configuration key of that external resource in Yarn. If you want the Flink to request the external resource from Yarn, you need to explicitly set this key. Only valid for Yarn mode.
- external-resource.{resourceName}.param.{params}. Each ExternalResourceDriver could define their specific configs following this pattern.
...
Code Block | ||||
---|---|---|---|---|
| ||||
public interface ExternalResourceDriverFactory { /** * Construct the ExternalResourceDriver from configuration. */ ExternalResourceDriver retrieveResourceInfocreateExternalResourceDriver(Congiuration config); } public interface ExternalResourceDriver { /** * Retrieve the information of the external resources according to the resourceProfileamount. */ Set<ExternalResourceInfo>Set<? extends ExternalResourceInfo> retrieveResourceInfo(long amount); } |
...
Code Block | ||||
---|---|---|---|---|
| ||||
public interface RuntimeContext { /** * Get the specific external resource information. Index by the resource name defined in "external-resource.list"resourceName. */ Map<String, Set<ExternalResourceInfo>> getExternalResourceInfo(Set<ExternalResourceInfo> getExternalResourceInfos(String resourceName); } |
For GPU resource, we introduce the following configuration options:
...
- For Yarn, the YarnResourceManager adds the external resource to the ContainerRequest.
- For Kubernetes, the KubernetesResourceManager adds the external resource to the pod for TaskExecutor(leverage the Device Plugin mechanism[3]).
On the TaskExecutor side, we introduce ExternalResourceDriver, which takes the responsibility to detect and provide information of external resources. TaskExecutor does not need to manage a specific external resource by itself, Operators and functions would get the ExternalResourceInfo from RuntimeConext.
Regarding the configuration, the common config keys are the amount of the external resources and the class name of ExternalResourceDriver. Besides, each driver could define their own configs following the specific pattern. In summary:
- external-resourceresources.list. Define the {resourceName} list of enabled external resources with delimiter ",". If configured, ResourceManager and TaskExecutor would check if the relevant configs exist for resources in this list. ResourceManager will forward the request to the underlying external resource manager. TaskExecutor will launch the corresponding ExternalResourceDriver.
- external-resource.{resourceName}.amount. Define the amount of external resources in a task executor.
- external-resource.{resourceName}.driver-factory.class. Define the class name of ExternalResourceDriverFactory.
- external-resource.{resourceName}.kubernetes.key. Optional config which defines the configuration key of that external resource in Kubernetes. If you want the Flink to request the external resource from Kubernetes, you need to explicitly set this key. Only valid for Kubernetes mode.
- external-resource.{resourceName}.yarn.key. Optional config which defines the configuration key of that external resource in Yarn. If you want the Flink to request the external resource from Yarn, you need to explicitly set this key. Only valid for Yarn mode.
- external-resource.{resourceName}.param.{params}. Each ExternalResourceDriver could define their specific configs following this pattern.
...
Code Block | ||||
---|---|---|---|---|
| ||||
public interface ExternalResourceDriverFactory { /** * Construct the ExternalResourceDriver from configuration. */ ExternalResourceDriver retrieveResourceInfocreateExternalResourceDriver(Congiuration config); } public interface ExternalResourceDriver { /** * Retrieve the information of the external resources according to the resourceProfileamount. */ Set<ExternalResourceInfo>Set<? extends ExternalResourceInfo> retrieveResourceInfo(long amount); } public interface ExternalResourceInfo { String getProperty(String key); Collection<String> getKeys(); } |
Guarantee the required GPU resources are accessible to task executors
...