Nuitka Release 1.3
This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler, “download now”.
This release contains a large amount of performance work, that should specifically be useful on Windows, but also generally. A bit of scalability work has been applied, and as usual many bug fixes and small improvements, many of which have been in hotfixes.
Bug Fixes
macOS: Framework build of PySide6 were not properly supporting the use of WebEngine. This requires including frameworks and resources in new ways, and actually some duplication of files, making the bundle big, but this seems to be unavoidable to keep the signature intact.
Standalone: Added workaround for
dotenv. Do not insist on compiled package directories that may not be there in case of no data files. Fixed in 1.2.1 already.Python3.8+: Fix, the
ctypes.CDLLnode attributes thewinmodeargument to Python2, which is wrong, it was actually added with 3.8. Fixed in 1.2.1 already.Windows: Attempt to detect corrupt object file in MSVC linking. These might be produced by
cl.execrashes orclcachebugs. When these are reported by the linker, it now suggests to use the--clean-cache=ccachewhich will remove it, otherwise there would be no way to cure it. Added in 1.2.1 already.Standalone: Added data files for
foliumpackage. Added in 1.2.1 already.Standalone: Added data files for
brancapackage. Added in 1.2.1 already.Fix, some forms
trythat had exitingfinallybranches were tracing values only assigned in thetryblock incorrectly. Fixed in 1.2.2 already.Alpine: Fix, Also include
libstdc++for Alpine to not use the system one which is required by its other binaries, much like we already do for Anaconda. Fixed in 1.2.2 already.Standalone: Added support for latest
pytorch. One of our workarounds no longer applies. Fixed in 1.2.2 already.Standalone: Added support for webcam on Windows with
opencv-python. Fixed in 1.2.3 already.Standalone: Added support for
pytorch_lightning, it was not finding metadata forrichpackage. Fixed in 1.2.4 already.For the release we found that
pytorch_lightningmay not findrichinstalled. Need to guardversion()checks in our package configuration.Standalone: Added data files for
dashpackage. Fixed in 1.2.4 already.Windows: Retry replace
clcacheentry after a delay, this works around Virus scanners giving access denied while they are checking the file. Naturally you ought to disable those for your build space, but new users often don’t have this. Fixed in 1.2.4 already.Standalone: Added support for
scipy1.9.2 changes. Fixed in 1.2.4 already.Catch corrupt object file outputs from
gccas well and suggest to clean cache as well. This has been observed to happen at least on Windows and should help resolve theccachesituation there.Windows: In case
clcachefails to acquire the global lock, simply ignore that. This happens sporadically and barely is a real locking issue, since that would require two compilations at the same time and for that it largely works.Compatibility: Classes should have the
f_localsset to the actual mapping used in their frame. This makes Nuitka usable with themultidispatchpackage which tries to find methods there while the class is building.Anaconda: Fix, newer Anaconda versions have TCL and Tk in new places, breaking the
tk-interautomatic detection. This was fixed in 1.2.6 already.Windows 7: Fix, onefile was not working anymore, a new API usage was not done in a compatible fashion. Fixed in 1.2.6 already.
Standalone: Added data files for
larkpackage. Fixed in 1.2.6 already.Fix,
pkgutil.iter_moduleswithout arguments was given wrong compiled package names. Fixed in 1.2.6 already.Standalone: Added support for newer
clrDLLs changes. Fixed in 1.2.7 already.Standalone: Added workarounds for
tensorflow.compatnamespace not being available. Fixed in 1.2.7 already.Standalone: Added support for
tkextrafont. Fixed in 1.2.7 already.Python3: Fix, locals dict test code testing if a variable was present in a mapping could leak references. Fixed in 1.2.7 already.
Standalone: Added support for
timmpackage. Fixed in 1.2.7 already.Plugins: Add
tlsto list of sensible plugins. This enables at leastpyqt6plugin to do networking with SSL encryption.Standalone: Added implicit dependencies of
sklearn.cluster.FreeBSD: Fix,
fcopyfileis no longer available on newest OS version, and include files forsendfilehave changed.MSYS2: Add back support for MSYS Posix variant. Now onefile works there too.
Fix, when picking up data files from command line and plugins, different exclusions were applied. This has been unified to get better coverage for avoiding to include DLLs and the like as data files. DLLs are not data files and must be dealt with differently after all.
New Features
UI: Added new option for cache disabling
--disable-cachethat acceptsalland cache names likeccache,bytecodeand on Windows,dll-dependencieswith selective values.Note
The
clcacheis implied inccachefor simplicity.UI: With the same values as
--disable-cacheNuitka may now also be called with--clean-cachein a compilation or without a filename argument, and then it will erase those caches current data before making a compilation.macOS: Added
--macos-app-modeoption for application bundles that should run in the background (background) or are only a UI element (ui-element).Plugins: In the Nuitka package configuration files, the
whenallows now to check if a plugin is active. This allowed us to limit console warnings to only packages whose plugin was activated.Plugins: Can now mark a plugin as a GUI toolkit responsible with the consequence that other toolkit detector plugins are all disabled, so when using
tk-interno longer will you be asked aboutPySide6plugin, as that is not what you are using apparently.Plugins: Generalized the GUI toolkit detection to include
tk-interas well, so it will now point out thatwxand the Qt bindings should be removed for best results, if they are included in the compilation.Plugins: Added ability to provide data files for macOS
Resourcesfolder of application bundles.macOS: Fix, Qt WebEngine was not working for framework using Python builds, like the ones from PyPI. This adds support for both PySide2 and PySide6 to distribute those as well.
MSYS2: When asking a CPython installation to compress from the POSIX Python, it crashed on the main filename being not the same.
Scons: Fix, need to preserve environment attached modes when switching to winlibs gcc on Windows. This was observed with MSYS2, but might have effects in other cases too.
Optimization
Python3.10+: When creating dictionaries, lists, and tuples, we use the newly exposed dictionary free list. This can speedup code that repeatedly allocates and releases dictionaries by a lot.
Python3.6+: Added fast path to dictionary copy. Compact dictionaries have their keys and values copied directly. This is inspired by a Python3.10 change, but it is applicable to older Python as well, and so we did.
Python3.9+: Faster compiled object creation, esp. on Python platforms that use a DLLs for libpython, which is a given on Windows. This makes up for core changes that went unnoticed so far and should regain relative speedups to standard Python.
Python3.10+: Faster float operations, we use the newly exposed float free list. This can speed up all kinds of float operations that are not doable in-place by a lot.
Python3.8+: On Windows, faster object tracking is now available, this previously had to go through a DLL call, that is now removed in this way as it was for non-Windows only so far.
Python3.7+: On non-Windows, faster object tracking is now used, this was regressed when adding support for this version, becoming equally bad as all of Windows at the time. However, we now managed to restore it.
Optimization: Faster deep copy of mutable tuples and list constants, these were already faster, but e.g. went up from 137% gain factor to 201% on Python3.10 as a result. We now use guided a deep copy, which then has the information, what types it is going to copy, removing the need to check through a dictionary.
Optimization: Also have own allocator function for fixed size objects. This accelerates allocation of compiled cells, dictionaries, some iterators, and lists objects.
More efficient code for object initialization, avoiding one DLL calls to set up our compiled objects.
Have our own
PyObject_Sizevariant, that will be slightly faster and avoids DLL usage forlenand size hints, e.g. in container creations.Avoid using non-optimal
mallocrelated macros and functions of Python, and instead of the fasted form generally. This avoids Python DLL calls that on Windows can be particularly slow.Scalability: Generated child mixins are now used for the generated package metadata hard import nodes calls, and for all instances of single child tuple containers. These are more efficient for creation and traversal of the tree, directly improving the Python compile time.
Scalability: Slightly more efficient compile time constant property detections. For
frozensetthere was not need to check for hashable values, and some branches could be replaced with e.g. defining our ownEllipsisTypefor use in short paths.Windows: When using MSVC and LTO, the linking stage was done with only one thread, we now use the proper options to use all cores. This is controlled by
--jobsmuch like C compilation already is. For large programs this will give big savings in overall execution time. Added in 1.2.7 already.Anti-Bloat: Remove the use of
pytestfordashpackage compilation.Anti-Bloat: Remove the use of IPython for
dotenv,pyvista,python_utils, andtrimeshpackage compilation.Anti-Bloat: Remove IPython usage in
rdkitimproving compile time for standalone by a lot. Fixed in 1.2.7 already.Anti-Bloat: Avoid
kerastesting framework when using that package.
Organisational
Plugins: The
numpyplugin functionality was moved to Nuitka package configuration, and as a result, the plugin is now deprecated and devoid of functionality. On non-Windows, this removes unused duplications of thenumpy.coreDLLs.User Manual: Added information about macOS entitlements and Windows console. These features are supported very well by Nuitka, but needed documentation.
UI: Remove alternative options from
--helpoutput. These are there often only for historic reasons, e.g. when an option was renamed. They should not bother users reading them.Plugins: Expose the mnemonics option to plugin warnings function, and use it for
pyside2andpyqt5plugins.Quality: Detect trailing/leading spaces in Nuitka package configuration
descriptionvalues during their automatic check.UI: Detect the CPython official flavor on Windows by comparing to registry paths and consider real prefixes, when being used in virtualenv more often, e.g. when checking for CPython on Apple.
UI: Enhanced
--versionoutput to include the C compiler selection. It is doing that respecting your other options, e.g.--clang, etc. so it will be helpful in debugging setup issues.UI: Some error messages were using
%rrather than'%s'to output file paths, but that escaped backslashes on Windows, making them look worse, so we changed away from this.UI: Document more clearly what
--output-diractually controls.macOS: Added options hint that the
Foundationmodule requires bundle mode to be usable.UI: Allow using both
--follow-importsand--nofollow-importson command line rather than erroring out. Simply use what was given last, this allows overriding what was given in project options tests should the need arise.Reports: Include plugin reasons for pre and post load modules provided. This solves a TODO and makes it easier to debug plugins.
UI: Handle
--include-package-databefore compilation, removing the ability to use pattern. This makes it easier to recognize mistakes without a long compilation and plugins can know them this way too.GitHub: Migration workflows to using newer actions for Python and checkout. Also use newer Ubuntu LTS for Linux test runner.
UI: Catch user error of running Nuitka with the
pythonwbinary on Windows.UI: Make it clear that MSYS2 defaults to
--mingw64mode. It had been like this, but the--helpoutput didn’t say so.GitHub: Updated contribution guidelines for better readability.
GitHub: Use organisation URLs everywhere, some were still pointing to the personal rather than the organisation one. While these are redirected, it is not good to have them like this.
Mastodon: Added link to https://fosstodon.org/@kayhayen to the PyPI package and User Manual.
Cleanups
Nodes for hard import calls of package meta data now have their base classes fully automatically created, replacing what was previously manual code. This aims at making them more consistent and easier to add.
When adding the new Scons file for C compiler version output, more values that are needed for both onefile and backend compilation were moved to centralized code, simplifying these somewhat again.
Remove unused
main_moduletag. It cannot happen that a module name matches, and still thinks of itself as__main__during compilation, so that idea was unnecessary.Generate the dictionary copy variants from template code rather than having manual duplications. For
dict.copy(), for deep copy (needed e.g. when there are escaping mutable keyword argument constant values in say a function call), and for**kwvalue preparation in the called function (checking argument types), we have had diverged copies, that are now unified in a single Jinja2 template for optimization.Plugins: Also allow providing generators for providing extra DLLs much like we already do for data files.
Naming of basic tests now makes sure to use a
Testsuffix, so in Visual Code selector they are more distinct from Nuitka code modules.Rather than populating empty dictionaries, helper code now uses factory functions to create them, passing keys and values, and allowing values to be optional, removing noisy
ifbranches at call side.Removed remaining
PyDevannotations, we don’t need those anymore for a long time already.Cleanup, avoid lists objects for ctypes defined functions and their
arglist, actually tuples are sufficient and naturally better to use.Spelling cleanups were resumed, as an ongoing action.
Tests
Added construct test that demonstrates the mutable constant argument passing for lists to see the performance gains in this area too.
Made construct runner
--diffoutput usable for interactive usage.Repaired Nuitka Speedcenter, but it’s not yet too usable for general consumption. More work will be needed there, esp. to make comparisons more accessible for the general public.
Summary
The major achievement of this release was the removal of the long lived
numpy plug-in, replacing it with package based configuration, that
is even more optimal and works perfectly across all platforms on both
important package installation flavors.
This release has a lot of consolidation efforts, but also as a result of 3.11 investigations, addresses a lot of issues, that have crept in over time with Python3 releases since 3.7, each time, something had not been noticed. There will more need for investigation of relative performance losses, but this should address the most crucial ones, and also takes advantage of optimization that had become with 3.10 already.
There is also some initial results from cleanups with the composite node tree structure, and how it is managed. Generated “child(ren) having” mixins, allow for faster traversal of the node tree.
Some technical things also improved in Scons. Using multiple cores in
LTO with MSVC with help this a lot, although for big compilations
--lto=no probably has to be recommended still.
More anti-bloat work on more packages rounds up the work.
For macOS specifically, the WebEngine support is cruical to some users,
and the new --macos-app-mode with more GUI friendly default resolves
long standing problems in this area.
And for MSYS2 and FreeBSD, support has been re-activated, so now 4 OSes work extremely well (others too likely), and on those, most Python flavors work well.
The performance and scalability improvements are going to be crucial. It’s a pity that 3.11 is not yet supported, but we will be getting there.