GNU Tools Cauldron 2025
I will discuss recent progress on AutoFDO. This feature was originally contributed by Google in 2014 and allows to use of profiles generated by low-overhead profiling (perf) to guide optimisation. I will discuss work needed to make AutoFDO to cooperate with link-time optimisation and the changes needed to modernise the infrastructure for the current GCC.
GPU threads operate in SIMT/SIMD (Single Instruction Multiple Thread / Single Instruction Multiple Data) mode: They are composed of "lanes" that execute the same instruction together in lock-step manner, but operate on different data. To show the execution state to the user, a debugger would need to be aware of lanes, so that program objects (e.g. local variables, function arguments, displayed expressions, etc.) are evaluated not only in a thread and call frame context, but also the lane context. In GNU Tools Cauldron 2024, a BoF session was organized jointly by AMD and Intel, who have downstream debuggers that implement lane support. Since then the developer teams of the two vendors compiled a common document that includes their suggested extensions to GDB commands and the user interface to introduce unified lane support to GDB. This session presents their consensus and opens it up for discussion.
This talk will go through some of the in progress and planned AArch64 performance work for GCC 16 and GCC 17. Giving the community and partners a heads up on what to expect from Arm.
We have made progress building Linux kernel with LTO by solving issues with top-level assembly.
This is an overview of how you can build the Linux kernel with LTO today and what are the remaining issues.
Critical performance regions of applications are often improved by offloading them onto specialized accelerators. This process requires selecting an adequate region of the program, subject to any eventual data dependencies, communication overheads, and the nature of the computations done in that region. Due to the complexity of this problem and the high variability of downstream vendor tools, an automated approach to this problem invites the use of a source-to-source compiler as the first stage of the compilation pipeline, preserving the source code's readability and retargetability. To this end, we propose using the Clava C/C++ source-to-source compiler to take in any C or C++ application, find and optimize adequate regions for offloading, and output those regions as separate translation units. By leveraging Clang's Abstract Syntax Tree (AST), Clava allows for a developer to write highly composable extensions that perform analysis and transformations over that AST, using modern scripting languages such as JavaScript and TypeScript. We demonstrate an entire source-to-source compilation flow for accelerating a C/C++ application, including extensions ranging from source code transformations such as function outlining and struct flattening; generation of a task graph representation of any C/C++ application; and selecting and extracting code regions for different types of accelerators, including the automatic generation of the communication layer using different APIs.
Author: Steven Rostedt (Kernel maintainer)
Author: Paul E. McKenney (Kernel maintainer)
Author: Alexei Starovoitov (Kernel maintainer)
Author: Jose E. Marchesi (GNU toolchain)
The Linux kernel, which is by far one of the biggest, more complex and
more important programs around, is (still) mainly built using the GNU
Toolchain.
There is an intimate relationship between toolchain and kernel.
Compiling a huge, complex and specialized program such as the kernel
often implies facing challenging or unusual requirements on the
toolchain side. This includes security related requirements. Also, some
of the toolchain components interface directly with the kernel. In the
case of glibc, it even provides the main visible interface from the
kernel to userland programs. The support for BPF is also mainly Linux
kernel specific.
This relationship benefits both projects. For example, an actively
maintained toolchain can quickly include kernel specific
enhancements. And vice versa, the toolchain benefits from the associated
relevance that makes corporations support its development. It is
certainly not unusual for a feature introduced primarily for kernel
usage to also be very useful to other programs. Examples of this are the
support for patchable function entries, "asm goto", fentry , and several
security related features.
In order to improve this relationship a Toolchains Track has been
organized for some years now at the Linux PLumbers Conference. The aim
of the track is to fix particular toolchain (both GNU and LLVM) issues
which are of interest to the kernel and, ideally, find and agree on
solutions right away, during the track, making the best use of the
opportunity to discuss the issues live with kernel developers and
maintainers. The LPC toolchains track is proving very useful, although
it is not always easy to bring toolchain hackers there, given it is a
kernel specific conference.
We propose to have a Toolchain and Linux Kernel BoF during Cauldron this
year, with the participation of at least one Linux kernel
maintainer. The goals of the BoF are (a) to discuss about particular
requirements, desired features and on-going developments that are
relevant to the kernel and (b) to gather kernel related
questions/input/feedback from the toolchain developers so we can bring
the issues to the LPC Toolchains Track, which will be held later in the
year after Cauldron.
Its very common that CPU's with fancy extensions and features overwhelmingly run code that is compiled for a baseline arch without any of the features enabled. Function multi versioning is a compiler framework for GCC generating multiple versions of a function, and dispatching the correct version according to the host system at load time. This talk will discuss recent developments, and the dreams for the future.
In this talk I will discuss recent improvements to GLIBC malloc, how it compares with other popular allocators, and what we could do to make malloc better - not only faster, but also safer, more maintainable and use less memory.
Compare currently GCC and LLVM performance difference using FFmpeg as benchmark.
Tested on a RISC-V develop board, contains vectorization performance improvement results.
In this short talk, I look at how data mining of repository activity and mailing lists can give insight to the health of a community project. The talk offers no prescriptions, its purpose is to share techniques that may be useful to the community.
In this BoF we will be discussing topics related to the BPF target in the GNU Toolchain.
This contribution explores possible improvements in GCC code generation for RISC-V. We collected dynamic instruction counts from selected SPEC CPU 2017 benchmarks and compared the results with AArch64. Findings reveal that prominent compiler weaknesses include missing instruction patterns, extra move instructions, unused load offsets, and functionally dead code. Additionally, vectorising library functions, like memset and mathematical operations, are crucial for maximising RISC-V efficiency.
This presentation is aimed at wannabe contributors to GCCs vectorizer. It should give an elaborate overview on the innards of the vectorizer, from user up to target interaction. After a thorough overview on the parts of the vectorizer we follow examples from loop and basic-block vectorization through the vectorizers code base, highlighting differences and commonalities.
GCC has participated in the Google Summer of Code (GSoC) program since 2006 (with one year gap) but for a long time we have not shared experiences, best practices and ideas for improvements in some organized form among the mentors and GSoC "org-admins." The idea of this BoF is to do exactly that.
GCC machine descriptions are straightforward to implement if someone has been involved in gcc development for a while. GCC Internals is a comprehensive resource but it assumes/requires prior knowledge. As a relative newcomer to the project (RISC-V backend) I've struggled with MD patterns (and still do). The LISPy syntax doesn't initially help either. This is my attempt to collect my learning of the last few years. If you don't know what a "Bridge" pattern is or confused between define_split and define_insn_and_split, this is the talk for you !
Patch review bandwidth has been identified as a bottleneck for GCC development many times over the years. We have taken steps to address it, such as appointing people in reviewer roles. But we can do more to reduce the friction for contributors to try patch review.
I will present some motivation to start reviewing patches and address common perceived barriers to doing so.
One take away from advocating for upstream patch review among colleagues is that there are no good set of guidelines to refer to when reviewing patches. In this talk I propose a set of baseline guidelines for patch review for the GCC project that can act as a starting point for budding reviewers. These can include technical design conventions that we want to maintain, common mistakes to look out for, testing and benchmarking considerations, commit message reviews, deployment concerns, and more high-level, social etiquette and procedural considerations.
I am interested in feedback on these guidelines and ideas on how and where we may want to advertise them for newcomers.
This presentation covers a design of a new thread-local storage (TLS) allocator for the GNU C Library (glibc). The goal of this project is to unify the allocation algorithm for initial-exec TLS, global-dynamic TLS, and POSIX thread-specific data (as created by pthread_key_create
). It unifies ideas that have been circulated within the glibc community for many years.
This session showcases the glibc heap dumper, a tool to obtain information about active memory allocations and malloc
heap layout from coredumps.
This presentation exposes the improvements of the Libabigail framework
that occurred since the end of the 2024, across the 2.7 and 2.8
releases.
As many of those improvements were about improving the signal to noise
ratio of ABI change reports, the talk presents the internals of the
middle-end and how it relates to categorizing ABI changes in such a
way that the back-ends that generate ABI change reports can be better
equipped to lower the rate of false positives.
Besides the deep dive in the middle-end internals, the talk walks
through the user-facing improvements of the tools written using the
framework in the 2.7 and 2.8 releases.
The talk ends with some considerations about future perspectives of
improvements that still needs to be addressed.
The proposal is for a BoF on GCC and AI, with the goal of opening up discussion on:
- Current and emerging use cases where GCC intersects with AI/ML workloads, including compiler optimisations for AI kernels.
- Challenges in supporting AI accelerators and heterogeneous compute through GCC.
- Open questions for the community: to what extent should GCC evolve in this area, and where should external tooling take the lead?
The idea is to bring together community members interested in this intersection of GCC and AI, share experiences, and identify where GCC could play a meaningful role.
The glibc math library aims to provide the required functions and macros from C standard math.h and related headers (fenv.h, float.h, and complex.h). The glibc supports multiple architectures and floating-point types, and also provides vectorized routines for some architectures. The math library is actively working to support newer extensions and features.
On this, I will demonstrate the recent optimization of multiple parts of the math libraries, explaining how we achieve it by utilizing new algorithms adapted for new hardware, as well as by leveraging external projects with improved implementations. I will also present recent work that supports the newer C23 functions and outline possible future work for optimizations.
Also, I will talk and ask about future extensions and what to focus on. Should we aim to provide implementation with better precision, and what is the expected tradeoff between performance and precision? Should we extend the support for newer types, for instance, float16_t? And what about decimal floating point support? Should we add a platform neutral skeleton vector math library to be used by other archictures?
This is a talk about all things "unload", my removal of the old reload code.
Similar to register asm but still distinct, hard register constraints are another way to force operands of inline asm into specific machine registers. In this talk we will have a brief look at hard register constraints and compare them to register asm. We will look at how to use them, how they might help write more robust code, their current (implementation) limitations, practical experiments, and have a lookout for improvements.
As memory and performance capabilities of computer have increased, virtualization has been the preferred approach to exploit all those resources.
To efficiently use such a system it is required to maximize the number of concurrent applications.
This makes it extremely important to fully optimize any running application not only for time (cpu) but also space (memory).
Moreover, high reliability and large data multi-threaded applications require not only an efficient allocation strategy but also a reliable and fast concurrency mechanism.
The solution for parallelism in glibc malloc (arenas) is based on minimizing concurrency through isolation, splitting the memory space through different threads using virtual memory as an abstraction layer.
This solution defers memory management to the kernel rather than keeping the responsibility to explicitly perform minimum virtual allocation.
Glibc malloc has been identified by projects such as MySQL and JVM as using more memory than required, and this has justified the adoption of other more multi-threaded friendly allocators, such as jemalloc and tcmalloc.
This talk will present MySQL performance and data size analysis using both glibc malloc and other competitive allocators, and show how recent improvements in the glibc malloc implementation greatly reduce virtual memory consumption in real-world MySQL usage, making it unnecessary to resort to other specialized allocators. Further possible improvements will also be discussed, which can make glibc the best allocator for high availability large data multi-threaded applications.
An update of the current status of OpenMP, OpenACC and offloading in GCC, including what has been achieved last year. A few highlights and, additionally, an outlook for next years is given of the tasks that are planned or that should be done in the near term.
Tasked with identifying safer optimization flags and GCC releases to use,
we've developed techniques to extract information from the PR database
and from the testsuite. We've found that the growing PR density over
time is not intuitively correlated with quality, but that running
"future" torture testsuites with extra optimization flags on "past"
releases can provide useful insights about relevant points of
instability.
Slides at https://www.lx.oliva.nom.br/slides/timetravel.en.pdf
How can users add new instructions without knowledge on GCC internals?
Integrating custom instructions into a RISC-V processor typically requires deep familiarity with GCC internals, particularly its RTL and backend architecture. This talk presents APEX, an approach for defining custom RISC-V instructions in GCC directly from C using pragmas, or assembly source code. Rather than modifying the compiler internals directly, users can define new operations using a simple "#pragma" and a function declaration, which are then parsed by the front end and transformed into GCC’s internal RTL (RTX) representation. This approach eliminates the need for manual backend modifications, making custom instruction support more accessible to users.
We will explore the APEX pipeline in detail - from parsing APEX input C-code to instruction emission and encoding in Binutils, understand how APEX instructions are handled by the assembler, disassembler/debugger.
This presentation targets compiler engineers, toolchain maintainers and hardware architects interested in extending RISC-V with domain-specific instructions while working within the GNU ecosystem. APEX reduces the need to dig into GCC internals, allowing contributors to prototype, experiment, and upstream new ideas with less effort.
Since bunsen was first presented at Cauldron 2019, work and workload has exploded. Bunsen on sourceware is processing the test results from hundreds of daily builds of a dozen of our favorite projects, It features CLI and web front-ends for clever searches, plus downloadable archives so you can play along at home. Recently, it has learned to connect project upstream git repos and commit histories to build/test histories. Putting all that info together, it can call on an AI to analyze causal factors of regressions. Let me show you how the tool may already be useful to you.
The GNU C Library is used as the C library in the GNU systems and many other systems with the Linux kernel. The library is primarily designed to be a portable and high performance C library. It aims to follows all relevant standards including ISO C17 and POSIX.1-2008. It is also internationalized and has one of the most complete internationalization interfaces known.
BoF discussing inter-procedural optimization, link-time optimization and profile feedback in GCC
Algol 68 was designed by the Working Group 2.1 of the International Federation for Information Processing (IFIP) during the late 1960s and early 1970s, leaded by Adriaan van Wijngaarden. The goal of the working group was to provide a programming language suitable to communicate algorithms, to execute them efficiently on a variety of different computers, and to aid in teaching them to students. The resulting language was in principle expected to be an evolved version of Algol 60, known shortcomings addressed, and generally improved. However, what was initially supposed to be an improved version of Algol 60 turned out to be something very different: an extremely powerful programming language, more modern and more expressive than most programming languages today, whose design exercised almost to the limit the newly invented notion of orthogonality in programming languages. Algol 68 is not like Algol 60, an important but old fashioned programming language superseded in almost every aspect by its successors, only relevant nowadays as a historical curiosity. Despite of many people claiming otherwise, Algol 68 has no successors. The GNU Algol 68 Working Group is a group of hackers whose purpose is to bring Algol 68 back to the first line of programming where it belongs, to provide modern implementations of the language well integrated in today's operating systems and computers (like the GCC Algol 68 front-end), to produce documentation to help people to learn this fascinating language, and to explore extensions and evolve the language with the rigor, respect and seriousness that it deserves and demands.
In January 2025 a first work-in-progress patch series implementing an Algol 68 front-end for GCC got sent to gcc-patches. Since then, the development has continued at a steady pace and by now most of the language has been implemented. In this talk we will introduce the front-end and the world domination plan associated with it, will highlight and discuss some interesting aspects of the implementation (Algol 68 is a notoriously difficult to implement language) and will make a case for the inclusion of the front-end in the main GCC tree.
We will also briefly look at some of the tangent projects like the Algol 68 support in the autotools and the a68 Emacs mode, as time allows.
References:
- Front-end development homepage: https://gcc.gnu.org/wiki/Algol68FrontEnd
- Git repository: https://forge.sourceware.org/gcc/gcc-a68
- Algol 68 homepage: https://algol68-lang.org
RISC-V's rapid growth to more than 100 extensions and 1000 instructions creates maintenance challenges across the ecosystem. Tools like Binutils, QEMU, and the Linux kernel each maintain separate definitions for standard and custom instructions and extensions, leading to fragmentation and repetitive maintenance burden.
The RISC-V Unified Database (UDB) is a machine-readable source of truth for instructions and CSRs, containing ~90% of RISC-V instructions. We built a framework that continuously validates UDB against Binutils data and ensures both stay in sync. Moreover, we created a generator that converts UDB data into Binutils and QEMU definitions, reducing effort for developers porting new or custom extensions.
This talk will demonstrate UDB's toolchain verification, cross-validation results, and how developers can leverage UDB to port new RISC-V extensions into the GNU toolchain.
Hear our experiences on mentoring new contributors, what common hurdles they face, and how we try to address them. Then, let’s discuss how to reduce the barrier to entry.
I'll be talking about developments in GCC 16:
- those affecting GCC's diagnostic subsystem, and
- those affecting the static analyzer (-fanalyzer)
Profile-guided optimizations are not new, but also not that popular. Why? Maybe because your build workflow can't easily include profile gathering - and every time. What if we could share profile data with each other, so you could take advantage of public crowdsourcing? What if you could easily contribute back your workload profiles? What if we can integrate this into distro build systems? Let's try with profiledb: a bit of glue between git, profilers, and linkers.
The talk will provide an overview of the different stack tracing methods (un-)available on s390 for user space. It will cover: Why stack tracing using frame pointer is virtually impossible on s390 and why compiler option -fno-omit-frame-pointer
should better be avoided on s390. The limitations of stack tracing using the s390-specific alternative of back chain. Why SFrame stack trace information is expected to considerably improve stack tracing of user space on s390. Finally it will provide an overview of the current state of SFrame support on s390 64-bit (s390x): The s390-specific SFrame stack trace format extensions, s390 support for generating SFrame stack trace information in Binutils 2.45, work-in-progress s390 support for SFrame in Glibc backtrace, and work-in-progress s390 support for SFrame in Linux Kernel and perf to sample stack traces of user space.
This session will discuss the performance benchmark results of the new wide set implementation in gm2. It will also report on the approach taken to implement this data type and how this technique will be used to implement M2R10 and ISO generics.
Discussion of topics related to parallel computing and accelerator offloading in GCC. In particular, related to OpenMP and OpenACC and to offloading to AMD and Nvidia GPUs. But also other topics like additional offloading targets or base-language parallelization features of C, C++, Fortran, or other languages are welcome. Planned topics include completion of OpenMP 5.x and addition of more 6.x features, OpenACC extension, improving performance, but also support for a GPU kernel language (programming at the abstraction level of CUDA/HIP – as proposed for the next OpenMP version).
Introduction to RISC-V auto vectorization. Basic building blocks, supported features, concepts, idiosyncrasies/quirks and more. Overview of what has been done, what's currently cooking and what's planned for the future.
Topics include, riscv vector modes and patterns, else operands, vector-vector and vector-scalar variants, vsetvl placement etc.
This talk provides an overview of recent developments in the SFrame stack tracing format over the past year. We discuss some of the enhancements to the SFrame stack trace format that are currently being looked at. Some of these desirable features in the planned SFrame V3 version help make the format more future-proof and more amenable to overall smoother adoption in the wider GNU/Linux community.
An opportunity for the GDB community to meet to discuss all things related to the GNU Debugger project.
Q&A panel discussion of development processes in GNU Toolchain projects – GCC, Glibc, GDB, Binutils, etc. – and how they affect our developer community.
We will discuss 4 topics:
- Onboarding new developers
- Growth, roles, and reputation. How to become a maintainer?
- Governance, and how decisions are made
- Infrastructure and tools for developers
An opportunity for a GNU Toolchain community conversation with the members of the Steering Committees of the GNU Toolchain projects (GCC, GLIBC, Binutils, GDB).
The RTL SSA framework is a relatively new component of GCC that enables SSA-based analysis on RTL. In this talk, I will present a dead code elimination (DCE) implementation built on top of this framework, intended to replace the existing UD-chain based DCE.
GNAT already offered mechanisms to handle C++ exceptions, but they were
limited to exact class type matches. This presentation covers the
extensions to Ada syntax, runtime and library to enable Ada subprograms
to catch and handle C++ exceptions hierarchies, and to reference the
raised C++ (sub-)object.
Slides at https://www.lx.oliva.nom.br/slides/adacxxcept.en.pdf
As the FSF's licensing and compliance manager, I will address licensing-related questions from the maintainers and developers of the toolchain projects. This will be an informal, interactive exchange about the topics collected through RFCs before GNU Cauldron but you are welcome to ask a question during the session as well. It will also be an opportunity to learn about FSF's Licensing and Compliance Lab recent work. We plan to cover topics such as LLM-generated contributions to GNU, following notice and attribution requirements, and GPL compliance in different technical setups.
After short updates on vectorizer work from contributors this is the chance to discuss larger work going forward.
Profile-Guided Optimization (PGO) is a powerful technique for achieving performance gains, yet its instrumentation-based implementation imposes a high overhead that limits its adoption in production environments. AutoFDO solves this by using low-overhead hardware based sampling to gather profile data, making it an ideal approach for continuous, real-world optimization.
This presentation details the work required to enable AutoFDO for AArch64 within the GNU toolchain and a state of the world for AutoFDO. We will cover the key architectural features that make efficient, sampling-based profiling possible. We cover the hardware pre-requisites and implications of using the various hardware units available on the AArch64 architecture (namely SPE and BRBE).
We will discuss the specific technical challenges encountered, including the crucial task of accurately annotating the pre-optimization IR representation using a profile gathered from an already optimized binary. We will detail the fixes implemented in the AutoFDO tools and GCC to handle the discrepancies that arise from this process. This ensures that the profile data correctly drives optimizations like inlining and block reordering, even with significant
When compiling OpenMP constructs, the bodies of OpenMP regions are outlined into separate functions, which are later called indirectly by libgomp built-ins. This outlining process disables interprocedural optimizations for the kernels. In this short talk, we outline a mechanism to partially restore interprocedural optimization capabilities for the kernels, starting with constant propagation, and we discuss its applications beyond OpenMP.
In this talk, I will give a quick overview of some of the current existing RISC-V testing infrastructure, focusing on our pre/post commit CI and automated fuzzing system. I will briefly show how these tools have helped identify regressions early and provide faster feedback to developers.
Everyone wants to improve the code quality of GCC yet many small patches, suggested improvements, and larger refactoring projects remain unaddressed. In some cases without any updates in over 20 years! It becomes very discouraging to attempt to develop these patches when there is no set guidelines for what is acceptable. To that end, we will take a brief look at past efforts to identify the pain points of developing, reviewing, and finally approving these patches. From here we look at what we can do to reduce friction for developers and maintainers, with a focus on quantifying impacts on GCC's compile duration, run time performance and debug-ability.
This year the Rust front end completed multiple major milestones: the name resolution rework is now complete, the desugaring pass has brought support for a lot of new features, and we were lucky enough to get two amazing GSOC students who greatly improved the capabilities of the front end.
Being able to compile more Rust led to some unexpected discoveries and opened the way for previously unhandled complex edge cases to be fixed.
Our next step involves iterating towards compiling Rust-for-Linux, which we will begin experimenting with in September.
This talk will cover what has recently changed in the Rust front end and what will be done this year, as well as a few surprises we had along the way. The talk will conclude with an update of the upstream synchronization process and the communication with the wider GCC community.
A proof-of-concept demonstration of a language-independent metadata database to support source-code interrogation and navigation. Such a database could support little-understood source-code analysis and functionality that is not available today at any price.
I propose that GCC be extended to produce such a database, probably from the gimple tree, as an affordance to tool developers.
The session starts with a brief (15-20 min) presentation on
- the motivations and requirements people have communicated for using a forge
- what forgejo provides and what's missing
- an overview of how existing workflows can gradually migrate to the forge
The majority of time will be used to gather feedback from the community.
A chance to discus upcoming and future work in the GNU toolchain for Arm platforms
In my work on smtgcc to formalize the semantics of GIMPLE, I have found several cases where optimization passes perform invalid transformations, as well as cases where the GIMPLE semantics do not allow optimization passes to express what they need (see PR120980 for an example). In this talk, I will present the current status of the formalization and discuss the issues I have found during the process.
In continuation from my talk on quantifying abstractions by objective costs, we also need to evaluate the more subjective costs and benefits of this work, as well as where that work should be directed. Ideally this will form the basis of a design document that developers can refer to for guidance, further reducing friction between developers and maintainers by making expectations more clear.
We will talk about and evaluate:
- acceptability and like/dislike of different abstractions and practices
- what is most important to refactor/rewrite/modernize, and the risks of doing so
- refresh our state on goals, ImprovementProjects, and rearch plans (some of this is very old)
- figure out what changes we really want
- and very importantly, what we do NOT want our code base to become
To help stimulate discussion, I will prepare examples of code with potential refactors (some intentionally bad).
The annual opportunity to review and discuss anything about the RISC-V backend.
The BPF verifier has troubles when verifying loops. This talk will cover:
- historical evolution of loops handling by verifier;
- problems with current state of things (too crude widening,
no bounds for induction variables); - describe DFA based liveness analysis that landed last week;
- describe further steps adding DFA-based value range analysis to
the verifier (a very hand-waving part).
We are interested in the feedback of the GNU toolchain community, especially when it comes to range analysis.
The Memory Tagging Extension (MTE) is a feature of the ARM v8.5 architecture that introduces several hardware capabilities:
- Each aligned 16-byte region of application memory can be assigned a 4-bit memory tag.
- Every pointer can include a 4-bit address tag in its most significant byte.
- An exception is triggered if the address tag differs from the memory tag.
- A set of special instructions is provided for efficient tag manipulation.
MTE aims to tackle two major issues: buffer overflows and use-after-free errors. It is the responsibility of tools such as compilers, libraries, and assemblers to emit MTE instructions that instrument the code to prevent these errors.
This talk outlines the current support (and work in progress) for MTE instructions in the latest GNU tools, including GCC, binutils, and glibc.