Date: Tue, 19 Mar 2024 08:27:43 +0000 (UTC) Message-ID: <1330949317.55999.1710836863335@cwiki-he-fi.apache.org> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_55998_400119511.1710836863335" ------=_Part_55998_400119511.1710836863335 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Apps using Tez have the ability to determine the number of tasks= reading the initial external data for a job (the number of mappers in MapR= educe parlance). Here is a short description of how that works.
First, Tez tries to find out the resour= ce availability in the cluster for these tasks. For that, YARN provides a h= eadroom value (and in future other attributes may be used). Lets say this v= alue is T.
int totalReso= urce =3D getContext().getTotalAvailableResource().getMemory();
Next W is multiplied by a wave factor (= from configuration - tez.grouping.split-waves) to determine the number= of tasks to be used. Lets say this value is N.
int taskResou= rce =3D getContext().getVertexTaskResource().getMemory(); float waves =3D conf.getFloat( TezSplitGrouper.TEZ_GROUPING_SPLIT_WAVES, TezSplitGrouper.TEZ_GROUPING_SPLIT_WAVES_DEFAULT); int numTasks =3D (int)((totalResource * waves)/taskResource);
If this value is between tez.group= ing.max-size & tez.grouping.min-size then N is accepted as the num= ber of tasks. If not, then N is adjusted to bring the data per task in line= with the max/min depending on which threshold was crossed.
if (lengthPer= Group > maxLengthPerGroup) { // splits too big to work. Need to override with max size. int newDesiredNumSplits =3D (int)(totalLength/maxLengthPerGroup) + 1; ... } else if (lengthPerGroup < minLengthPerGroup) { // splits too small to work. Need to override with size. int newDesiredNumSplits =3D (int)(totalLength/minLengthPerGroup) + 1;
For experimental purposes tez.grouping.= split-count can be set in configuration to specify the desired number of gr= oups. If this config is specified then the above logic is ignored and Tez t= ries to group splits into the specified number of groups. This is best effo= rt.
int configNum= Splits =3D conf.getInt(TEZ_GROUPING_SPLIT_COUNT, 0); if (configNumSplits > 0) { // always use config override if specified desiredNumSplits =3D configNumSplits;
Here is the detailed explanation of grouping algorithm (TezSplitGrouper.= getGroupedSplits).