Nuitka Release 0.6.1
This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler, “download now”.
This release comes after a relatively long time, and contains important new optimization work, and even more bug fixes.
Bug Fixes
Fix, the options
--[no]follow-import-to=package_namewas supposed to not follow into the given package, but the check was executed too broadly, so that e.g.package_name2was also affected. Fixed in 0.6.0.1 already.Fix, wasn’t detecting multiple recursions into the same package in module mode, when attempting to compile a whole sub-package. Fixed in 0.6.0.1 already.
Fix, constant values are used as C boolean values still for some of the cases. Fixed in 0.6.0.1 already.
Fix, referencing a function cannot raise an exception, but that was not annotated. Fixed in 0.6.0.2 already.
macOS: Use standard include of C bool type instead of rolling our own, which was not compatible with newest Clang. Fixed in 0.6.0.3 already.
Python3: Fix, the
bytesbuilt-in type actually does have a__float__slot. Fixed in 0.6.0.4 already.Python3.7: Types that are also sequences still need to call the method
__class_getitem__for consideration. Fixed in 0.6.0.4 already.Python3.7: Error exits from program exit could get lost on Windows due to
__spec__handling not preserving errors. Fixed in 0.6.0.4 already.Windows: Negative exit codes from Nuitka, e.g. due to a triggered assertion in debug mode were not working. Fixed in 0.6.0.4 already.
Fix, conditional
andexpressions were mis-optimized when not used to not execute the right hand side still. Fixed in 0.6.0.4 already.Python3.6: Fix, generators, coroutines, and asyncgen were not properly supporting annotations for local variables. Fixed in 0.6.0.5 already.
Python3.7: Fix, class declarations had memory leaks that were untestable before 3.7.1 fixed reference count issues in CPython. Fixed in 0.6.0.6 already.
Python3.7: Fix, asyncgen expressions can be created in normal functions without an immediate awaiting of the iterator. This new feature was not correctly supported.
Fix, star imports on the module level should disable built-in name optimization except for the most critical ones, otherwise e.g. names like
allorpowcan become wrong. Previous workarounds forpowwere not good enough.Fix, the scons for Python3 failed to properly report build errors due to a regression of the Scons version used for it. This would mask build errors on Windows.
Python3.4: Fix, packages didn’t indicate that they are packages in their
__spec__value, causing issues withimportlib_resourcesmodule.Python3.4: The
__spec__values of compiled modules didn’t have compatibleoriginandhas_locationvalues preventingimportlib_resourcesmodule from working to load data files.Fix, packages created from
.pthfiles were also considered when checking for sub-packages of a module.Standalone: Handle cases of conflicting DLLs better. On Windows pick the newest file version if different, and otherwise just report and pick randomly because we cannot really decide which ought to be loaded.
Standalone: Warn about collisions of DLLs on non-Windows only as this can happen with wheels apparently.
Standalone: For Windows Python extension modules
.pydfiles, remove the SxS configuration for cases where it causes problems, not needed.Fix: The
execstatement on file handles was not using the proper filename when compiling, therefore breaking e.g.inspect.getsourceon functions defined there.Standalone: Added support for OpenGL platform plugins to be included automatically.
Standalone: Added missing implicit dependency for
zmqmodule.Python3.7: Fix, using the
-X utf8flag on the calling interpreter, aka--python-flag=utf8_modewas not preserved in the compiled binary in all cases.
Optimization
Enabled C target type
voidwhich will catch creating unused stuff more immediately and give better code for expression only statements.Enabled in-place optimization for module variables, avoiding write back to the module dict for unchanged values, accelerating these operations.
Compile time memory savings for the
yieldnode of Python2, no need to track if it is in an exception handler, not relevant there.Using the single child node for the
yieldnodes gives memory savings at compile time for these, while also making them operate faster.More kinds of in-place operations are now optimized, e.g.
int += intand thebytesones were specialized to perform real in-place extension where possible.Loop variables no longer loose type information, but instead collect the set of possible type shapes allowing optimization for them.
Organisational
Corrected download link for Arch AUR link of develop package.
Added repository for Ubuntu Cosmic (18.10) for download.
Added repository for Fedora 29 for download.
Describe the exact format used for
clang-formatin the Developer Manual.Added description how to use CondaCC on Windows to the User Manual.
Cleanups
The operations used for
async for,async with, andawaitwere all doing a look-up of an awaitable, and then executing theyield fromthat awaitable as one thing. Now this is split into two parts, with a newExpressionYieldFromAwaitableas a dedicated node.The
yieldnode types, now 3 share a base class and common computation for now, enhancing the one for awaitiable, which was not fully annotating everything that can happen.In code generation avoid statement blocks that are not needed, because there are no local C variables declared, and properly indent them.
Tests
Fixups for the manual Valgrind runner and the UI changes.
Test runner detects lock issue of
clcacheon Windows and considers it a permission problem that causes a retry.
Summary
This addresses even more corner cases not working correctly, the out of the box experience should be even better now.
The push towards C level performance for integer operation was held up
by the realization that loop SSA was not yet there really, and that it
had to be implemented, which of course now makes a huge difference for
the cases where e.g. bool are being used. There is no C type for
int used yet, which limits the impact of optimization to only taking
shortcuts for the supported types. These are useful and faster of
course, but only building blocks for what is to come.
Most of the effort went into specialized helpers that e.g. add a
float and and int value in a dedicated fashion, as well as
comparison operations, so we can fully operate some minimal examples
with specialized code. This is too limited still, and must be applied to
ever more operations.
What’s more is that the benchmarking situation has not improved. Work will be needed in this domain to make improvements more demonstrable. It may well end up being the focus for the next release to improve Nuitka speedcenter to give more fine grained insights across minor changes of Nuitka and graphs with more history.