Python 2 から Python 3 への移植

author:Brett Cannon

概要

With Python 3 being the future of Python while Python 2 is still in active use, it is good to have your project available for both major releases of Python. This guide is meant to help you figure out how best to support both Python 2 & 3 simultaneously.

もしあなたが標準 Python ライブラリではなく拡張ライブラリでの移植手段を探しているならば Python 3 への拡張モジュール移植 を参照してください。

If you would like to read one core Python developer’s take on why Python 3 came into existence, you can read Nick Coghlan’s Python 3 Q & A.

If you prefer to read a (free) book on porting a project to Python 3, consider reading Porting to Python 3 by Lennart Regebro which should cover much of what is discussed in this HOWTO.

For help with porting, you can email the python-porting mailing list with questions.

The Short Version

  • Decide what’s the oldest version of Python 2 you want to support (if at all)
  • Make sure you have a thorough test suite and use continuous integration testing to make sure you stay compatible with the versions of Python you care about
  • If you have dependencies, check their Python 3 status using caniusepython3 (command-line tool, web app)

With that done, your options are:

  • If you are dropping Python 2 support, use 2to3 to port to Python 3
  • If you are keeping Python 2 support, then start writing Python 2/3-compatible code starting TODAY
    • If you have dependencies that have not been ported, reach out to them to port their project while working to make your code compatible with Python 3 so you’re ready when your dependencies are all ported
    • If all your dependencies have been ported (or you have none), go ahead and port to Python 3
  • If you are creating a new project that wants to have 2/3 compatibility, code in Python 3 and then backport to Python 2

Before You Begin

If your project is on the Cheeseshop/PyPI, make sure it has the proper trove classifiers to signify what versions of Python it currently supports. At minimum you should specify the major version(s), e.g. Programming Language :: Python :: 2 if your project currently only supports Python 2. It is preferrable that you be as specific as possible by listing every major/minor version of Python that you support, e.g. if your project supports Python 2.6 and 2.7, then you want the classifiers of:

Programming Language :: Python :: 2
Programming Language :: Python :: 2.6
Programming Language :: Python :: 2.7

Once your project supports Python 3 you will want to go back and add the appropriate classifiers for Python 3 as well. This is important as setting the Programming Language :: Python :: 3 classifier will lead to your project being listed under the Python 3 Packages section of PyPI.

Make sure you have a robust test suite. You need to make sure everything continues to work, just like when you support a new minor/feature release of Python. This means making sure your test suite is thorough and is ported properly between Python 2 & 3 (consider using coverage to measure that you have effective test coverage). You will also most likely want to use something like tox to automate testing between all of your supported versions of Python. You will also want to port your tests first so that you can make sure that you detect breakage during the transition. Tests also tend to be simpler than the code they are testing so it gives you an idea of how easy it can be to port code.

Drop support for older Python versions if possible. Python 2.5 introduced a lot of useful syntax and libraries which have become idiomatic in Python 3. Python 2.6 introduced future statements which makes compatibility much easier if you are going from Python 2 to 3. Python 2.7 continues the trend in the stdlib. Choose the newest version of Python which you believe can be your minimum support version and work from there.

Target the newest version of Python 3 that you can. Beyond just the usual bugfixes, compatibility has continued to improve between Python 2 and 3 as time has passed. E.g. Python 3.3 added back the u prefix for strings, making source-compatible Python code easier to write.

Writing Source-Compatible Python 2/3 Code

Over the years the Python community has discovered that the easiest way to support both Python 2 and 3 in parallel is to write Python code that works in either version. While this might sound counter-intuitive at first, it actually is not difficult and typically only requires following some select (non-idiomatic) practices and using some key projects to help make bridging between Python 2 and 3 easier.

Projects to Consider

The lowest level library for supporting Python 2 & 3 simultaneously is six. Reading through its documentation will give you an idea of where exactly the Python language changed between versions 2 & 3 and thus what you will want the library to help you continue to support.

To help automate porting your code over to using six, you can use modernize. This project will attempt to rewrite your code to be as modern as possible while using six to smooth out any differences between Python 2 & 3.

If you want to write your compatible code to feel more like Python 3 there is the future project. It tries to provide backports of objects from Python 3 so that you can use them from Python 2-compatible code, e.g. replacing the bytes type from Python 2 with the one from Python 3. It also provides a translation script like modernize (its translation code is actually partially based on it) to help start working with a pre-existing code base. It is also unique in that its translation script will also port Python 3 code backwards as well as Python 2 code forwards.

Tips & Tricks

To help with writing source-compatible code using one of the projects mentioned in Projects to Consider, consider following the below suggestions. Some of them are handled by the suggested projects, so if you do use one of them then read their documentation first to see which suggestions below will taken care of for you.

Python 2.7 のサポート

As a first step, make sure that your project is compatible with Python 2.7. This is just good to do as Python 2.7 is the last release of Python 2 and thus will be used for a rather long time. It also allows for use of the -3 flag to Python to help discover places in your code where compatibility might be an issue (the -3 flag is in Python 2.6 but Python 2.7 adds more warnings).

Python 2.6 以降のバージョンのサポート

While not possible for all projects, if you can support Python 2.6 and newer only, your life will be much easier. Various future statements, stdlib additions, etc. exist only in Python 2.6 and later which greatly assist in supporting Python 3. But if you project must keep support for Python 2.5 then it is still possible to simultaneously support Python 3.

Below are the benefits you gain if you only have to support Python 2.6 and newer. Some of these options are personal choice while others are strongly recommended (the ones that are more for personal choice are labeled as such). If you continue to support older versions of Python then you at least need to watch out for situations that these solutions fix and handle them appropriately (which is where library help from e.g. six comes in handy).

from __future__ import print_function

It will not only get you used to typing print() as a function instead of a statement, but it will also give you the various benefits the function has over the Python 2 statement (six provides a function if you support Python 2.5 or older).

from __future__ import unicode_literals

If you choose to use this future statement then all string literals in Python 2 will be assumed to be Unicode (as is already the case in Python 3). If you choose not to use this future statement then you should mark all of your text strings with a u prefix and only support Python 3.3 or newer. But you are strongly advised to do one or the other (six provides a function in case you don’t want to use the future statement and you want to support Python 3.2 or older).

Bytes/string literals

This is a very important one. Prefix Python 2 strings that are meant to contain bytes with a b prefix to very clearly delineate what is and is not a Python 3 text string (six provides a function to use for Python 2.5 compatibility).

This point cannot be stressed enough: make sure you know what all of your string literals in Python 2 are meant to be in Python 3. Any string literal that should be treated as bytes should have the b prefix. Any string literal that should be Unicode/text in Python 2 should either have the u literal (supported, but ignored, in Python 3.3 and later) or you should have from __future__ import unicode_literals at the top of the file. But the key point is you should know how Python 3 will treat every one one of your string literals and you should mark them as appropriate.

There are some differences between byte literals in Python 2 and those in Python 3 thanks to the bytes type just being an alias to str in Python 2. See the `Handle Common “Gotchas”`_ section for what to watch out for.

from __future__ import absolute_import

Discussed in more detail below, but you should use this future statement to prevent yourself from accidentally using implicit relative imports.

Python 2.5 以降のサポート

もし Python 2.5 以降をサポートする場合、 Python のいくつかの機能を利用できます。

from __future__ import absolute_import

Implicit relative imports (e.g., importing spam.bacon from within spam.eggs with the statement import bacon) do not work in Python 3. This future statement moves away from that and allows the use of explicit relative imports (e.g., from . import bacon).

In Python 2.5 you must use the __future__ statement to get to use explicit relative imports and prevent implicit ones. In Python 2.6 explicit relative imports are available without the statement, but you still want the __future__ statement to prevent implicit relative imports. In Python 2.7 the __future__ statement is not needed. In other words, unless you are only supporting Python 2.7 or a version earlier than Python 2.5, use this __future__ statement.

全ての Unicode 文字列を u プリフィックスでマークする

While Python 2.6 has a __future__ statement to automatically cause Python 2 to treat all string literals as Unicode, Python 2.5 does not have that shortcut. This means you should go through and mark all string literals with a u prefix to turn them explicitly into text strings where appropriate and only support Python 3.3 or newer. Otherwise use a project like six which provides a function to pass all text string literals through.

発生中の例外の捕捉

In Python 2.5 and earlier the syntax to access the current exception is:

try:
  raise Exception()
except Exception, exc:
  # Current exception is 'exc'.
  pass

この構文は Python 3 では次の通り変更になりました(Python 2.6 以降へはバックポートされました):

try:
  raise Exception()
except Exception as exc:
  # Current exception is 'exc'.
  # In Python 3, 'exc' is restricted to the block; in Python 2.6/2.7 it will "leak".
  pass

Because of this syntax change you must change how you capture the current exception in Python 2.5 and earlier to:

try:
  raise Exception()
except Exception:
  import sys
  exc = sys.exc_info()[1]
  # Current exception is 'exc'.
  pass

現在発生中の例外に関する詳細な情報が必要なら、単に現在の例外インスタンスを参照する代わりに sys.exc_info() を参照してください。ただし、たいていは必要ないでしょう。

注釈

Python 3 でのtracebackオブジェクトはexceptionインスタンスの属性 __traceback__ として取得できます。もしこのインスタンスがローカル変数として保存された場合、except ブロックの外側でも生存しつづけ、tracebackオブジェクトは現在の実行フレームとローカル変数の辞書との循環参照を生成してしまいます。結果として次の garbage collection が実行されるまでの間この死んだリソースの解放が遅れてしまいます。

Python 2 ではtracebackそのものを保存した場合のみこの問題が発生します(例: sys.exc_info() の戻りタプルの3番目をローカル変数に保存したとき)。

よくある “Gotcha” (「なるほど!」) の扱い

These are things to watch out for no matter what version of Python 2 you are supporting which are not syntactic considerations.

from __future__ import division

Python 実行時の引数に -Qnew を指定することでも同じ結果が得られますが、この future 文を使うと Python 3 での割り算の振る舞いを導入できます(例: 1/2 == 0.5; 1//2 == 0)。

バイナリファイルを開く際の設定

Windows環境で作業しているのでなければ、バイナリファイルを開く際に b を付けるか心配しなくてもよい場合があります(例: バイナリ読み込み時に rb を指定するなど)。 Python 3 では、バイナリファイルとテキストファイルははっきりと分けられて相互変換できません。詳細は io モジュールを参照してください。そのため、それぞれのファイルについてバイナリアクセス(byte列のみ読み書きを許可する)とするのか、もしくはテキストアクセス(Unicodeの文字列のみ読み書きを許可する)のかを判断 しなければなりません

テキストファイル

テキストファイルは open() 関数によって作成されますが、 Python 2 ではbyte列が返却され、 Python 3 ではUnicode文字列が返却されます。この点は移植の戦略に依って問題となりえます。

もし Python 2 でUnicode文字を受け取りたい場合は二つの方法があります:

  • Python 2.6 以降では、 io.open() を利用してください。 io.open() は Python 2 と 3 で基本的に同じ関数ですので、この問題を解決してくれるでしょう。

  • もし Python 2.6 以前の互換性が必要であれば、 codecs.open() を代わりに使用してください。この関数は Python 2 でUnicode文字を返却することを保証してくれるでしょう。

サブクラス object

新しいスタイルのクラス定義は Python 2.2 から導入されました。メソッド解決順序に関係する奇妙なエッジケースを避けるために、必ず object をサブクラス化するようにしてください。これは Python 3 でも完全に有効です (ただしすべてのクラスが object を暗黙に継承するので必要ではありません)。

byte/string 二分問題への対処

One of the biggest issues people have when porting code to Python 3 is handling the bytes/string dichotomy. Because Python 2 allowed the str type to hold textual data, people have over the years been rather loose in their delineation of what str instances held text compared to bytes. In Python 3 you cannot be so care-free anymore and need to properly handle the difference. The key to handling this issue is to make sure that every string literal in your Python 2 code is either syntactically or functionally marked as either bytes or text data. After this is done you then need to make sure your APIs are designed to either handle a specific type or made to be properly polymorphic.

Python 2 の文字列リテラルのマークアップ

最初にすることは、 Python 2 のすべての文字列リテラルをbyte列か文字列か指定することです。もし Python 2.6 以降のみをサポートする場合は、byte列の先頭には b を付け、文字列の先頭には u を付けるもしくは unicode_literals 行を使うことで完遂できます。

もし Python 2.6 以前のバージョンをサポートする場合は、six プロジェクトを利用し b() 関数を使ってbyte列を指定してください。文字列に対してはsixプロジェクトの u() 関数か u を先頭に記述してください。

APIが何を受け取るかを決める

Python 2 では偶然ながら簡単にbyte列と文字列を受け取るAPIを作成できました。しかし Python 3 は型の扱いに厳密であるため、byte列と文字列をごちゃまぜに使うことはうまくいかない傾向にあります。

Python 2.6 での辞書 {b'a': 'bytes', u'a': 'text'} を見て見ましょう。このコードは b'a' == u'a' が成り立つため {u'a': 'text'} となります。しかし Python 3 では記述どおり {b'a': 'bytes', 'a': 'text'} が生成されます。つまりデータを失うことがありません。同様の問題は Python 2 から 3 への移植作業において散見されます。

これはつまり Python 2 と 3 で一貫した動作となるように、どのAPIが何を受け取るのかを選択しなければならないということです。

Byte列/Unicode文字列の比較

Python 3 では、byte列とUnicode文字列の併用はほとんどの場合禁止されています。 Python 2 では暗黙の型変換を試みるところ、 Python 3 では TypeError が発生します。しかしながら例外を発生しない勘違いしやすいケースが一つあります:

>>> b"" == ""
False

これは Python では等価性の比較において例外を発生させてはいけない (そして型が異なるので False を返却する) ためです。しかしながら、このような暗黙の比較が実行されると、誤って移植された Python 3 のコードはバグだらけの振る舞いとなってしまいます。これを検出するため、 Python 3 では -b フラグを付けることで警告を発生させられます:

$ python3 -b
>>> b"" == ""
__main__:1: BytesWarning: Comparison between bytes and string
False

警告を例外にするには、-bb フラグを用います:

$ python3 -bb
>>> b"" == ""
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BytesWarning: Comparison between bytes and string
byte列のインデックス参照

別の隠れた驚きの変更に、 Python 3 でのbyte列のインデックス参照時の振る舞いがあります:

>>> b"xyz"[0]
120

このように、 Python 3 のbyteオブジェクト(bytearray も同様)は整数の配列なのです。しかし Python 2 から変換したコードは、byte列をインデックス参照したら整数ではなく別の新たなbyte列が生成されると想定していることがよくあります。これらの振る舞いとうまくやっていくため、次のスライスを利用してください:

>>> b"xyz"[0:1]
b'x'
>>> n = 1
>>> b"xyz"[n:n+1]
b'y'

最後の項目についてですが、範囲外のスライスが例外の発生の代わりに空のbyte列を返却するということです:

>>> b"xyz"[3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: index out of range
>>> b"xyz"[3:4]
b''
__str__()/__unicode__()

Python 2 ではオブジェクトは自身に対する通常の文字列とUnicode文字列の二つの表現を指定できました。 Python 3 では一つの文字列表現しか指定できません。このため、__str__() メソッド内でうっかり間違いを犯して予測できない結果を引き起こしてしまうようなことになります(例: __str__() の内部で unicode(self).encode('utf8') を使用すると無限再帰呼び出しとなります)。

You can use a mixin class to work around this. This allows you to only define a __unicode__() method for your class and let the mixin derive __str__() for you (code from http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/):

import sys

class UnicodeMixin(object):

  """Mixin class to handle defining the proper __str__/__unicode__
  methods in Python 2 or 3."""

  if sys.version_info[0] >= 3: # Python 3
      def __str__(self):
          return self.__unicode__()
  else:  # Python 2
      def __str__(self):
          return self.__unicode__().encode('utf8')


class Spam(UnicodeMixin):

  def __unicode__(self):
      return u'spam-spam-bacon-spam'  # 2to3 will remove the 'u' prefix
例外をインデックス参照しないこと

Python 2 では下記のコードは動作します:

>>> exc = Exception(1, 2, 3)
>>> exc.args[1]
2
>>> exc[1]  # Python 2 only!
2

しかし Python 3 で例外を直接インデックス参照するとエラーになります。インデックス参照できるのは BaseException.args 属性だけで、これは __init__() メソッドに渡されたすべての引数からなる配列です。

より良い方法は例外が提供するドキュメント化された属性を利用することです。

__getslice__ ファミリーは使わないこと

長い間廃止とされてきましたが、 Python3ではついに __getslice__() をサポートしなくなり、__getitem__() ファミリーへその機能を委譲しました。

doctestsのアップデート

Don’t forget to make them Python 2/3 compatible as well. If you wrote a monolithic set of doctests (e.g., a single docstring containing all of your doctests), you should at least consider breaking the doctests up into smaller pieces to make it more manageable to fix. Otherwise it might very well be worth your time and effort to port your tests to unittest.

Update map for imbalanced input sequences

With Python 2, when map was given more than one input sequence it would pad the shorter sequences with None values, returning a sequence as long as the longest input sequence.

With Python 3, if the input sequences to map are of unequal length, map will stop at the termination of the shortest of the sequences. For full compatibility with map from Python 2.x, wrap the sequence arguments in itertools.zip_longest(), e.g. map(func, *sequences) becomes list(map(func, itertools.zip_longest(*sequences))).

-3 警告の除去

When you run your application’s test suite, run it using the -3 flag passed to Python. This will cause various warnings to be raised during execution about things that are semantic changes between Python 2 and 3. Try to eliminate those warnings to make your code even more portable to Python 3.

Alternative Approaches

While supporting Python 2 & 3 simultaneously is typically the preferred choice by people so that they can continue to improve code and have it work for the most number of users, your life may be easier if you only have to support one major version of Python going forward.

Supporting Only Python 3 Going Forward From Python 2 Code

If you have Python 2 code but going forward only want to improve it as Python 3 code, then you can use 2to3 to translate your Python 2 code to Python 3 code. This is only recommended, though, if your current version of your project is going into maintenance mode and you want all new features to be exclusive to Python 3.

Backporting Python 3 code to Python 2

If you have Python 3 code and have little interest in supporting Python 2 you can use 3to2 to translate from Python 3 code to Python 2 code. This is only recommended if you don’t plan to heavily support Python 2 users. Otherwise write your code for Python 3 and then backport as far back as you want. This is typically easier than going from Python 2 to 3 as you will have worked out any difficulties with e.g. bytes/strings, etc.

他のリソース

The authors of the following blog posts, wiki pages, and books deserve special thanks for making public their tips for porting Python 2 code to Python 3 (and thus helping provide information for this document and its various revisions over the years):

もしこの文書に何か足りない、追加すべきだと感じた場合は遠慮なく python-porting のメーリングリストへ投稿してください。