Status
Motivation
Having to support Python 2 and 3 concurrently causes some maintenance and development burden (which is lessened a bit by six
and backports
modules), and significant extra test time on Travis.
Python 2 is reaching End of Life in January 1, 2020 and will receive zero updates, even security ones past this date.
Django dropped support for Python 2 with their 2.0 release in December 2017, and this proposal has us follow suit. Airflow 2.0 is already a fairly major breaking change, so this could be an opportune time to do this.
Considerations
Many people are still on Python 2.7, and we will need to consider how we announce this change, and how long we give people to migrate their installs.
We have at least one hooks that is Python2 only - AIRFLOW-2697 (HDFS specifically that uses a Python2 only module, snakebite).
RHEL may not ship an "officially" packaged version of Python 3 (it's hard/impossible? to find out if you aren't already a RedHat customer. An RPM of Python3 is available via EPEL, but that is not an "official" package from RedHat Inc.). My answer to this problem is to encourage companies to pay us to continue supporting Airflow on Python 2.7
12 Comments
Ash Berlin-Taylor
I've been told that Py3 is available in a "Software Collection" on RHEL7 https://access.redhat.com/documentation/en-US/Red_Hat_Developer_Toolset/1/html-single/Software_Collections_Guide/#chap-Introducing_Software_Collections
Taylor Edmiston
Thanks for drafting this, Ash. I feel like it covers our discussion thread well.
It's really concerning to me how many people are still on Python 2.7. While there are a small number of edge cases (snakebite etc / http://py3readiness.org/), I think we agree that most people are better off running on Python 3.
I like the idea of encouraging support of companies depending on Python 2 for it.
Fokko Driesprong
The community started recently adding types to the code of Airflow to help new contributors, and make the code more readable/maintainable: https://github.com/apache/airflow/pull/4926/files
Right now we're limited to setting these types in the comments, to maintain Airflow 2.7 compatibility, which is a pity.
Bas Harenslak
I'll kick off the list of things to fix/improve when dropping Python 2 support. From this list we could create JIRA tickets. Probably missing a lot, feel free to add.
Intermediate:
To be resolved:
Nedko
Snakebite has been a real issue especially in the latest release (1.10.3) where extending the BaseSensorOperator initializes all sensors including HDFS. I know there is a snakebite python3 version (snakebite-py3 for pip) https://pypi.org/project/snakebite-py3/ but I don't want to have to hack my instance but rather let it figure out the correct dependencies.
from airflow.operators.sensors import BaseSensorOperator
from airflow.hooks.hdfs_hook import HDFSHook
from snakebite.client import Client, HAClient, Namenode, AutoConfigClient
File "/usr/local/lib/python3.6/site-packages/snakebite/client.py", line 1473
baseTime = min(time * (1L << retries), cap);
Jiajie Zhong
I think we already have related PR https://github.com/apache/airflow/pull/3560 and we decided to move to PyArrow
Nedko
Good point and that's fine but there is an issues here. Based on the last comment (and this PR is almost a year old now) this PR now has a bunch of conflicts. Essentially might warrant a rewrite (or heavy refactor). Are we going to keep things broken in the interim? IMHO this will be an easy patch with just updating the dependency but not necessarily the end solution.
Jiajie Zhong
Yes, the PR submit had been a year, but author still working on it at 2019-02-28, https://github.com/apache/airflow/pull/3560#issuecomment-461791862
Varun Shah
MySQL-python does not support Python3 and because Apache-Airflow has removed support for Python2, it is difficult to find a workaround for MySQL-python.
Xiaodong Deng
I don't see MySQL-python in https://github.com/apache/airflow/blob/master/setup.py. Instead we have mysqlclient, which should suffice, right?
Ash Berlin-Taylor
We run our tests against mysql on python3 using https://pypi.org/project/mysqlclient/ "This is a fork of MySQLdb1. (that adds Py3 support)" - you should use that instead.
Kamil Bregula
Ash Berlin-Taylor Fokko Driesprong Kaxil Naik Bas Harenslak
What is the current state of this AIP? If you are planning to do any other work, could I ask you to migrate your ticket to Github Issue. If we've done all the work, can I request a status update for this AIP.