Cool New Features in Python 3.7
Python 3.7 is officially released! This new Python version has been in development since September 2016, and now we all get to enjoy the results of the core developers’ hard work.
What does the new Python version bring? While the documentation gives a good overview of the new features, this article will take a deep dive into some of the biggest pieces of news. These include:
- Easier access to debuggers through a new
breakpoint()
built-in - Simple class creation using data classes
- Customized access to module attributes
- Improved support for type hinting
- Higher precision timing functions
More importantly, Python 3.7 is fast.
In the final sections of this article, you’ll read more about this speed, as well as some of the other cool features of Python 3.7. You will also get some advice on upgrading to the new version.
The breakpoint()
Built-In
While we might strive to write perfect code, the simple truth is that we never do. Debugging is an important part of programming. Python 3.7 introduces the new built-in function breakpoint()
. This does not really add any new functionality to Python, but it makes using debuggers more flexible and intuitive.
Assume that you have the following buggy code in the file bugs.py
:
def divide(e, f): return f / e a, b = 0, 1 print(divide(a, b))
Running the code causes a ZeroDivisionError
inside the divide()
function. Let’s say that you want to interrupt your code and drop into a debugger right at the top of divide()
. You can do so by setting a so called “breakpoint” in your code:
def divide(e, f): # Insert breakpoint here return f / e
A breakpoint is a signal inside your code that execution should temporarily stop, so that you can look around at the current state of the program. How do you place the breakpoint? In Python 3.6 and below, you use this somewhat cryptic line:
def divide(e, f): import pdb; pdb.set_trace() return f / e
Here, pdb
is the Python Debugger from the standard library. In Python 3.7, you can use the new breakpoint()
function call as a shortcut instead:
def divide(e, f): breakpoint() return f / e
In the background, breakpoint()
is first importing pdb
and then calling pdb.set_trace()
for you. The obvious benefits are that breakpoint()
is easier to remember and that you only need to type 12 characters instead of 27. However, the real bonus of using breakpoint()
is its customizability.
Run your bugs.py
script with breakpoint()
:
$ python3.7 bugs.py > /home/gahjelle/bugs.py(3)divide() -> return f / e (Pdb)
The script will break when it reaches breakpoint()
and drop you into a PDB debugging session. You can type c
and hit Enter to continue the script. Refer to Nathan Jennings’ PDB guide if you want to learn more about PDB and debugging.
Now, say that you think you’ve fixed the bug. You would like to run the script again but without stopping in the debugger. You could, of course, comment out the breakpoint()
line, but another option is to use the PYTHONBREAKPOINT
environment variable. This variable controls the behavior of breakpoint()
, and setting PYTHONBREAKPOINT=0
means that any call to breakpoint()
is ignored:
$ PYTHONBREAKPOINT=0 python3.7 bugs.py ZeroDivisionError: division by zero
Oops, it seems as if you haven’t fixed the bug after all…
Another option is to use PYTHONBREAKPOINT
to specify a debugger other than PDB. For instance, to use PuDB (a visual debugger in the console) you can do:
$ PYTHONBREAKPOINT=pudb.set_trace python3.7 bugs.py
For this to work, you need to have pudb
installed (pip install pudb
). Python will take care of importing pudb
for you though. This way you can also set your default debugger. Simply set the PYTHONBREAKPOINT
environment variable to your preferred debugger. See this guide for instructions on how to set an environment variable on your system.
The new breakpoint()
function does not only work with debuggers. One convenient option could be to simply start an interactive shell inside your code. For instance, to start an IPython session, you can use the following:
$ PYTHONBREAKPOINT=IPython.embed python3.7 bugs.py IPython 6.3.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: print(e / f) 0.0
You can also create your own function and have breakpoint()
call that. The following code prints all variables in the local scope. Add it to a file called bp_utils.py
:
from pprint import pprint import sys def print_locals(): caller = sys._getframe(1) # Caller is 1 frame up. pprint(caller.f_locals)
To use this function, set PYTHONBREAKPOINT
as before, with the <module>.<function>
notation:
$ PYTHONBREAKPOINT=bp_utils.print_locals python3.7 bugs.py {'e': 0, 'f': 1} ZeroDivisionError: division by zero
Normally, breakpoint()
will be used to call functions and methods that do not need arguments. However, it is possible to pass arguments as well. Change the line breakpoint()
in bugs.py
to:
breakpoint(e, f, end="<-END\n")
Note: The default PDB debugger will raise a TypeError
at this line because pdb.set_trace()
does not take any positional arguments.
Run this code with breakpoint()
masquerading as the print()
function to see a simple example of the arguments being passed through:
$ PYTHONBREAKPOINT=print python3.7 bugs.py 0 1<-END ZeroDivisionError: division by zero
Data Classes
The new dataclasses
module makes it more convenient to write your own classes, as special methods like .__init__()
, .__repr__()
, and .__eq__()
are added automatically. Using the @dataclass
decorator, you can write something like:
from dataclasses import dataclass, field @dataclass(order=True) class Country: name: str population: int area: float = field(repr=False, compare=False) coastline: float = 0 def beach_per_person(self): """Meters of coastline per person""" return (self.coastline * 1000) / self.population
These nine lines of code stand in for quite a bit of boilerplate code and best practices. Think about what it would take to implement Country
as a regular class: the .__init__()
method, a repr
, six different comparison methods as well as the .beach_per_person()
method. You can expand the box below to see an implementation of Country
that is roughly equivalent to the data class:
class Country: def __init__(self, name, population, area, coastline=0): self.name = name self.population = population self.area = area self.coastline = coastline def __repr__(self): return ( f"Country(name={self.name!r}, population={self.population!r}," f" coastline={self.coastline!r})" ) def __eq__(self, other): if other.__class__ is self.__class__: return ( (self.name, self.population, self.coastline) == (other.name, other.population, other.coastline) ) return NotImplemented def __ne__(self, other): if other.__class__ is self.__class__: return ( (self.name, self.population, self.coastline) != (other.name, other.population, other.coastline) ) return NotImplemented def __lt__(self, other): if other.__class__ is self.__class__: return ((self.name, self.population, self.coastline) < ( other.name, other.population, other.coastline )) return NotImplemented def __le__(self, other): if other.__class__ is self.__class__: return ((self.name, self.population, self.coastline) <= ( other.name, other.population, other.coastline )) return NotImplemented def __gt__(self, other): if other.__class__ is self.__class__: return ((self.name, self.population, self.coastline) > ( other.name, other.population, other.coastline )) return NotImplemented def __ge__(self, other): if other.__class__ is self.__class__: return ((self.name, self.population, self.coastline) >= ( other.name, other.population, other.coastline )) return NotImplemented def beach_per_person(self): """Meters of coastline per person""" return (self.coastline * 1000) / self.population
After creation, a data class is a normal class. You can, for instance, inherit from a data class in the normal way. The main purpose of data classes is to make it quick and easy to write robust classes, in particular small classes that mainly store data.
You can use the Country
data class like any other class:
>>> norway = Country("Norway", 5320045, 323802, 58133) >>> norway Country(name='Norway', population=5320045, coastline=58133) >>> norway.area 323802 >>> usa = Country("United States", 326625791, 9833517, 19924) >>> nepal = Country("Nepal", 29384297, 147181) >>> nepal Country(name='Nepal', population=29384297, coastline=0) >>> usa.beach_per_person() 0.06099946957342386 >>> norway.beach_per_person() 10.927163210085629
Note that all the fields .name
, .population
, .area
, and .coastline
are used when initializing the class (although .coastline
is optional, as is shown in the example of landlocked Nepal). The Country
class has a reasonable repr
, while defining methods works the same as for regular classes.
By default, data classes can be compared for equality. Since we specified order=True
in the @dataclass
decorator, the Country
class can also be sorted:
>>> norway == norway True >>> nepal == usa False >>> sorted((norway, usa, nepal)) [Country(name='Nepal', population=29384297, coastline=0), Country(name='Norway', population=5320045, coastline=58133), Country(name='United States', population=326625791, coastline=19924)]
The sorting happens on the field values, first .name
then .population
, and so on. However, if you use field()
, you can customize which fields will be used in the comparison. In the example, the .area
field was left out of the repr
and the comparisons.
Note: The country data are from the CIA World Factbook with population numbers estimated for July 2017.
Before you all go book your next beach holidays in Norway, here is what the Factbook says about the Norwegian climate: “temperate along coast, modified by North Atlantic Current; colder interior with increased precipitation and colder summers; rainy year-round on west coast.”
Data classes do some of the same things as namedtuple
. Yet, they draw their biggest inspiration from the attrs
project. See our full guide to data classes for more examples and further information, as well as PEP 557 for the official description.
Customization of Module Attributes
Attributes are everywhere in Python! While class attributes are probably the most famous, attributes can actually be put on essentially anything—including functions and modules. Several of Python’s basic features are implemented as attributes: most of the introspection functionality, doc-strings, and name spaces. Functions inside a module are made available as module attributes.
Attributes are most often retrieved using the dot notation: thing.attribute
. However, you can also get attributes that are named at runtime using getattr()
:
import random random_attr = random.choice(("gammavariate", "lognormvariate", "normalvariate")) random_func = getattr(random, random_attr) print(f"A {random_attr} random value: {random_func(1, 1)}")
Running this code will produce something like:
A gammavariate random value: 2.8017715125270618
For classes, calling thing.attr
will first look for attr
defined on thing
. If it is not found, then the special method thing.__getattr__("attr")
is called. (This is a simplification. See this article for more details.) The .__getattr__()
method can be used to customize access to attributes on objects.
Until Python 3.7, the same customization was not easily available for module attributes. However, PEP 562 introduces __getattr__()
on modules, together with a corresponding __dir__()
function. The __dir__()
special function allows customization of the result of calling dir()
on a module.
The PEP itself gives a few examples of how these functions can be used, including adding deprecation warnings to functions and lazy loading of heavy submodules. Below, we will build a simple plugin system that allows functions to be added to a module dynamically. This example takes advantage of Python packages. See this article if you need a refresher on packages.
Create a new directory, plugins
, and add the following code to a file, plugins/__init__.py
:
from importlib import import_module from importlib import resources PLUGINS = dict() def register_plugin(func): """Decorator to register plug-ins""" name = func.__name__ PLUGINS[name] = func return func def __getattr__(name): """Return a named plugin""" try: return PLUGINS[name] except KeyError: _import_plugins() if name in PLUGINS: return PLUGINS[name] else: raise AttributeError( f"module {__name__!r} has no attribute {name!r}" ) from None def __dir__(): """List available plug-ins""" _import_plugins() return list(PLUGINS.keys()) def _import_plugins(): """Import all resources to register plug-ins""" for name in resources.contents(__name__): if name.endswith(".py"): import_module(f"{__name__}.{name[:-3]}")
Before we look at what this code does, add two more files inside the plugins
directory. First, let’s see plugins/plugin_1.py
:
from . import register_plugin @register_plugin def hello_1(): print("Hello from Plugin 1")
Next, add similar code in the file plugins/plugin_2.py
:
from . import register_plugin @register_plugin def hello_2(): print("Hello from Plugin 2") @register_plugin def goodbye(): print("Plugin 2 says goodbye")
These plugins can now be used as follows:
>>>>>> import plugins >>> plugins.hello_1() Hello from Plugin 1 >>> dir(plugins) ['goodbye', 'hello_1', 'hello_2'] >>> plugins.goodbye() Plugin 2 says goodbye
This may not all seem that revolutionary (and it probably isn’t), but let’s look at what actually happened here. Normally, to be able to call plugins.hello_1()
, the hello_1()
function must be defined in a plugins
module or explicitly imported inside __init__.py
in a plugins
package. Here, it is neither!
Instead, hello_1()
is defined in an arbitrary file inside the plugins
package, and hello_1()
becomes a part of the plugins
package by registering itself using the @register_plugin
decorator.
The difference is subtle. Instead of the package dictating which functions are available, the individual functions register themselves as part of the package. This gives you a simple structure where you can add functions independently of the rest of the code without having to keep a centralized list of which functions are available.
Let us do a quick review of what __getattr__()
does inside the plugins/__init__.py
code. When you asked for plugins.hello_1()
, Python first looks for a hello_1()
function inside the plugins/__init__.py
file. As no such function exists, Python calls __getattr__("hello_1")
instead. Remember the source code of the __getattr__()
function:
def __getattr__(name): """Return a named plugin""" try: return PLUGINS[name] # 1) Try to return plugin except KeyError: _import_plugins() # 2) Import all plugins if name in PLUGINS: return PLUGINS[name] # 3) Try to return plugin again else: raise AttributeError( # 4) Raise error f"module {__name__!r} has no attribute {name!r}" ) from None
__getattr__()
contains the following steps. The numbers in the following list correspond to the numbered comments in the code:
- First, the function optimistically tries to return the named plugin from the
PLUGINS
dictionary. This will succeed if a plugin namedname
exists and has already been imported. - If the named plugin is not found in the
PLUGINS
dictionary, we make sure all plugins are imported. - Return the named plugin if it has become available after the import.
- If the plugin is not in the
PLUGINS
dictionary after importing all plugins, we raise anAttributeError
saying thatname
is not an attribute (plugin) on the current module.
How is the PLUGINS
dictionary populated though? The _import_plugins()
function imports all Python files inside the plugins
package, but does not seem to touch PLUGINS
:
def _import_plugins(): """Import all resources to register plug-ins""" for name in resources.contents(__name__): if name.endswith(".py"): import_module(f"{__name__}.{name[:-3]}")
Don’t forget that each plugin function is decorated by the @register_plugin
decorator. This decorator is called when the plugins are imported and is the one actually populating the PLUGINS
dictionary. You can see this if you manually import one of the plugin files:
>>> import plugins >>> plugins.PLUGINS {} >>> import plugins.plugin_1 >>> plugins.PLUGINS {'hello_1': <function hello_1 at 0x7f29d4341598>}
Continuing the example, note that calling dir()
on the module also imports the remaining plugins:
>>> dir(plugins) ['goodbye', 'hello_1', 'hello_2'] >>> plugins.PLUGINS {'hello_1': <function hello_1 at 0x7f29d4341598>, 'hello_2': <function hello_2 at 0x7f29d4341620>, 'goodbye': <function goodbye at 0x7f29d43416a8>}
dir()
usually lists all available attributes on an object. Normally, using dir()
on a module results in something like this:
>>> import plugins >>> dir(plugins) ['PLUGINS', '__builtins__', '__cached__', '__doc__', '__file__', '__getattr__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_import_plugins', 'import_module', 'register_plugin', 'resources']
While this might be useful information, we are more interested in exposing the available plugins. In Python 3.7, you can customize the result of calling dir()
on a module by adding a __dir__()
special function. For plugins/__init__.py
, this function first makes sure all plugins have been imported and then lists their names:
def __dir__(): """List available plug-ins""" _import_plugins() return list(PLUGINS.keys())
Before leaving this example, please note that we also used another cool new feature of Python 3.7. To import all modules inside the plugins
directory, we used the new importlib.resources
module. This module gives access to files and resources inside modules and packages without the need for __file__
hacks (which do not always work) or pkg_resources
(which is slow). Other features of importlib.resources
will be highlighted later.
Typing Enhancements
Type hinting and annotations have been in constant development throughout the Python 3 series of releases. Python’s typing system is now quite stable. Still, Python 3.7 brings some enhancements to the table: better performance, core support, and forward references.
Python does not do any type checking at runtime (unless you are explicitly using packages like enforce
). Therefore, adding type hints to your code should not affect its performance.
Unfortunately, this is not completely true as most type hints need the typing
module. The typing
module is one of the slowest modules in the standard library. PEP 560 adds some core support for typing in Python 3.7, which significantly speeds up the typing
module. The details of this are in general not necessary to know about. Simply lean back and enjoy the increased performance.
While Python’s type system is reasonably expressive, one issue that causes some pain is forward references. Type hints—or more generally annotations—are evaluated while the module is imported. Therefore, all names must already be defined before they are used. The following is not possible:
class Tree: def __init__(self, left: Tree, right: Tree) -> None: self.left = left self.right = right
Running the code raises a NameError
because the class Tree
is not yet (completely) defined in the definition of the .__init__()
method:
Traceback (most recent call last): File "tree.py", line 1, in <module> class Tree: File "tree.py", line 2, in Tree def __init__(self, left: Tree, right: Tree) -> None: NameError: name 'Tree' is not defined
To overcome this, you would have needed to write "Tree"
as a string literal instead:
class Tree: def __init__(self, left: "Tree", right: "Tree") -> None: self.left = left self.right = right
See PEP 484 for the original discussion.
In a future Python 4.0, such so called forward references will be allowed. This will be handled by not evaluating annotations until that is explicitly asked for. PEP 563 describes the details of this proposal. In Python 3.7, forward references are already available as a __future__
import. You can now write the following:
from __future__ import annotations class Tree: def __init__(self, left: Tree, right: Tree) -> None: self.left = left self.right = right
Note that in addition to avoiding the somewhat clumsy "Tree"
syntax, the postponed evaluation of annotations will also speed up your code, since type hints are not executed. Forward references are already supported by mypy
.
By far, the most common use of annotations is type hinting. Still, you have full access to the annotations at runtime and can use them as you see fit. If you are handling annotations directly, you need to deal with the possible forward references explicitly.
Let us create some admittedly silly examples that show when annotations are evaluated. First we do it old-style, so annotations are evaluated at import time. Let anno.py
contain the following code:
def greet(name: print("Now!")): print(f"Hello {name}")
Note that the annotation of name
is print()
. This is only to see exactly when the annotation is evaluated. Import the new module:
>>> import anno Now! >>> anno.greet.__annotations__ {'name': None} >>> anno.greet("Alice") Hello Alice
As you can see, the annotation was evaluated at import time. Note that name
ends up annotated with None
because that is the return value of print()
.
Add the __future__
import to enable postponed evaluation of annotations:
from __future__ import annotations def greet(name: print("Now!")): print(f"Hello {name}")
Importing this updated code will not evaluate the annotation:
>>>>>> import anno >>> anno.greet.__annotations__ {'name': "print('Now!')"} >>> anno.greet("Marty") Hello Marty
Note that Now!
is never printed and the annotation is kept as a string literal in the __annotations__
dictionary. In order to evaluate the annotation, use typing.get_type_hints()
or eval()
:
>>> import typing >>> typing.get_type_hints(anno.greet) Now! {'name': <class 'NoneType'>} >>> eval(anno.greet.__annotations__["name"]) Now! >>> anno.greet.__annotations__ {'name': "print('Now!')"}
Observe that the __annotations__
dictionary is never updated, so you need to evaluate the annotation every time you use it.
Timing Precision
In Python 3.7, the time
module gains some new functions as described in PEP 564. In particular, the following six functions are added:
clock_gettime_ns()
: Returns the time of a specified clockclock_settime_ns()
: Sets the time of a specified clockmonotonic_ns()
: Returns the time of a relative clock that cannot go backwards (for instance due to daylight savings)perf_counter_ns()
: Returns the value of a performance counter—a clock specifically designed to measure short intervalsprocess_time_ns()
: Returns the sum of the system and user CPU time of the current process (not including sleep time)time_ns()
: Returns the number of nanoseconds since January 1st 1970
In a sense, there is no new functionality added. Each function is similar to an already existing function without the _ns
suffix. The difference being that the new functions return a number of nanoseconds as an int
instead of a number of seconds as a float
.
For most applications, the difference between these new nanosecond functions and their old counterpart will not be appreciable. However, the new functions are easier to reason about because they rely on int
instead of float
. Floating point numbers are by nature inaccurate:
>>> 0.1 + 0.1 + 0.1 0.30000000000000004 >>> 0.1 + 0.1 + 0.1 == 0.3 False
This is not an issue with Python but rather a consequence of computers needing to represent infinite decimal numbers using a finite number of bits.
A Python float
follows the IEEE 754 standard and uses 53 significant bits. The result is that any time greater than about 104 days (2⁵³ or approximately 9 quadrillion nanoseconds) cannot be expressed as a float with nanosecond precision. In contrast, a Python int
is unlimited, so an integer number of nanoseconds will always have nanosecond precision independent of the time value.
As an example, time.time()
returns the number of seconds since January 1st 1970. This number is already quite big, so the precision of this number is at the microsecond level. This function is the one showing the biggest improvement in its _ns
version. The resolution of time.time_ns()
is about 3 times better than for time.time()
.
What is a nanosecond by the way? Technically, it is one billionth of a second, or 1e-9
second if you prefer scientific notation. These are just numbers though and do not really provide any intuition. For a better visual aid, see Grace Hopper’s wonderful demonstration of the nanosecond.
As an aside, if you need to work with datetimes with nanosecond precision, the datetime
standard library will not cut it. It explicitly only handles microseconds:
>>> from datetime import datetime, timedelta >>> datetime(2018, 6, 27) + timedelta(seconds=1e-6) datetime.datetime(2018, 6, 27, 0, 0, 0, 1) >>> datetime(2018, 6, 27) + timedelta(seconds=1e-9) datetime.datetime(2018, 6, 27, 0, 0)
Instead, you can use the astropy
project. Its astropy.time
package represents datetimes using two float
objects which guarantees “sub-nanosecond precision over times spanning the age of the universe.”
>>> from astropy.time import Time, TimeDelta >>> Time("2018-06-27") <Time object: scale='utc' format='iso' val