GNU Tools Cauldron 2025

GNU Tools Cauldron 2025

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
08:00
08:00
45min
Breakfast - Served at FEUP and included.
Main auditorium (400)
08:45
08:45
15min
Introduction and Welcome - Main auditorium
Main auditorium (400)
09:00
09:00
60min
AutoFDO - recent improvements
jan Hubička

I will discuss recent progress on AutoFDO. This feature was originally contributed by Google in 2014 and allows to use of profiles generated by low-overhead profiling (perf) to guide optimisation. I will discuss work needed to make AutoFDO to cooperate with link-time optimisation and the changes needed to modernise the infrastructure for the current GCC.

Auditorium B032 (80)
09:00
60min
Lane support in GDB for debugging GPUs
Baris Aktemur, Lancelot Six

GPU threads operate in SIMT/SIMD (Single Instruction Multiple Thread / Single Instruction Multiple Data) mode: They are composed of "lanes" that execute the same instruction together in lock-step manner, but operate on different data. To show the execution state to the user, a debugger would need to be aware of lanes, so that program objects (e.g. local variables, function arguments, displayed expressions, etc.) are evaluated not only in a thread and call frame context, but also the lane context. In GNU Tools Cauldron 2024, a BoF session was organized jointly by AMD and Intel, who have downstream debuggers that implement lane support. Since then the developer teams of the two vendors compiled a common document that includes their suggested extensions to GDB commands and the user interface to introduce unified lane support to GDB. This session presents their consensus and opens it up for discussion.

Main auditorium (400)
10:00
10:00
30min
Coffee
Main auditorium (400)
10:30
10:30
60min
AArch64 performance work
Tamar Christina

This talk will go through some of the in progress and planned AArch64 performance work for GCC 16 and GCC 17. Giving the community and partners a heads up on what to expect from Arm.

Main auditorium (400)
10:30
15min
Building Linux kernel with LTO
Michal Jireš

We have made progress building Linux kernel with LTO by solving issues with top-level assembly.
This is an overview of how you can build the Linux kernel with LTO today and what are the remaining issues.

I-105 (30)
10:30
60min
Source-to-Source Compilation for Hardware/Software Codesign
Tiago Santos

Critical performance regions of applications are often improved by offloading them onto specialized accelerators. This process requires selecting an adequate region of the program, subject to any eventual data dependencies, communication overheads, and the nature of the computations done in that region. Due to the complexity of this problem and the high variability of downstream vendor tools, an automated approach to this problem invites the use of a source-to-source compiler as the first stage of the compilation pipeline, preserving the source code's readability and retargetability. To this end, we propose using the Clava C/C++ source-to-source compiler to take in any C or C++ application, find and optimize adequate regions for offloading, and output those regions as separate translation units. By leveraging Clang's Abstract Syntax Tree (AST), Clava allows for a developer to write highly composable extensions that perform analysis and transformations over that AST, using modern scripting languages such as JavaScript and TypeScript. We demonstrate an entire source-to-source compilation flow for accelerating a C/C++ application, including extensions ranging from source code transformations such as function outlining and struct flattening; generation of a task graph representation of any C/C++ application; and selecting and extracting code regions for different types of accelerators, including the automatic generation of the communication layer using different APIs.

Auditorium B032 (80)
10:45
10:45
90min
Toolchain and Linux kernel
Jose Marchesi, Paul McKenney, Alexei, Steven Rostedt

Author: Steven Rostedt (Kernel maintainer)
Author: Paul E. McKenney (Kernel maintainer)
Author: Alexei Starovoitov (Kernel maintainer)
Author: Jose E. Marchesi (GNU toolchain)

The Linux kernel, which is by far one of the biggest, more complex and
more important programs around, is (still) mainly built using the GNU
Toolchain.

There is an intimate relationship between toolchain and kernel.

Compiling a huge, complex and specialized program such as the kernel
often implies facing challenging or unusual requirements on the
toolchain side. This includes security related requirements. Also, some
of the toolchain components interface directly with the kernel. In the
case of glibc, it even provides the main visible interface from the
kernel to userland programs. The support for BPF is also mainly Linux
kernel specific.

This relationship benefits both projects. For example, an actively
maintained toolchain can quickly include kernel specific
enhancements. And vice versa, the toolchain benefits from the associated
relevance that makes corporations support its development. It is
certainly not unusual for a feature introduced primarily for kernel
usage to also be very useful to other programs. Examples of this are the
support for patchable function entries, "asm goto", fentry , and several
security related features.

In order to improve this relationship a Toolchains Track has been
organized for some years now at the Linux PLumbers Conference. The aim
of the track is to fix particular toolchain (both GNU and LLVM) issues
which are of interest to the kernel and, ideally, find and agree on
solutions right away, during the track, making the best use of the
opportunity to discuss the issues live with kernel developers and
maintainers. The LPC toolchains track is proving very useful, although
it is not always easy to bring toolchain hackers there, given it is a
kernel specific conference.

We propose to have a Toolchain and Linux Kernel BoF during Cauldron this
year, with the participation of at least one Linux kernel
maintainer. The goals of the BoF are (a) to discuss about particular
requirements, desired features and on-going developments that are
relevant to the kernel and (b) to gather kernel related
questions/input/feedback from the toolchain developers so we can bring
the issues to the LPC Toolchains Track, which will be held later in the
year after Cauldron.

I-105 (30)
11:30
11:30
15min
Function multi-versioning developments, and goals for the future
Alfie Richards

Its very common that CPU's with fancy extensions and features overwhelmingly run code that is compiled for a baseline arch without any of the features enabled. Function multi versioning is a compiler framework for GCC generating multiple versions of a function, and dispatching the correct version according to the host system at load time. This talk will discuss recent developments, and the dreams for the future.

Auditorium B032 (80)
11:30
60min
malloc: past, present and future
Wilco Dijkstra

In this talk I will discuss recent improvements to GLIBC malloc, how it compares with other popular allocators, and what we could do to make malloc better - not only faster, but also safer, more maintainable and use less memory.

Main auditorium (400)
11:45
11:45
15min
Using FFmpeg as Benchmark to verify performance of GCC and LLVM
Jiawei Chen

Compare currently GCC and LLVM performance difference using FFmpeg as benchmark.
Tested on a RISC-V develop board, contains vectorization performance improvement results.

Auditorium B032 (80)
12:00
12:00
15min
Measuring the health of the GCC community
Jeremy Bennett

In this short talk, I look at how data mining of repository activity and mailing lists can give insight to the health of a community project. The talk offers no prescriptions, its purpose is to share techniques that may be useful to the community.

Auditorium B032 (80)
12:30
12:30
60min
Lunch - Served at FEUP and included
Main auditorium (400)
13:30
13:30
60min
BPF BoF
Jose Marchesi

In this BoF we will be discussing topics related to the BPF target in the GNU Toolchain.

I-105 (30)
13:30
60min
Comparative Analysis of GCC Codegen for AArch64 and RISC-V
Paul-Antoine Arras

This contribution explores possible improvements in GCC code generation for RISC-V. We collected dynamic instruction counts from selected SPEC CPU 2017 benchmarks and compared the results with AArch64. Findings reveal that prominent compiler weaknesses include missing instruction patterns, extra move instructions, unused load offsets, and functionally dead code. Additionally, vectorising library functions, like memset and mathematical operations, are crucial for maximising RISC-V efficiency.

Auditorium B032 (80)
13:30
90min
Vectorizer for Beginners
Richard Biener

This presentation is aimed at wannabe contributors to GCCs vectorizer. It should give an elaborate overview on the innards of the vectorizer, from user up to target interaction. After a thorough overview on the parts of the vectorizer we follow examples from loop and basic-block vectorization through the vectorizers code base, highlighting differences and commonalities.

Main auditorium (400)
14:30
14:30
60min
GCC Google Summer of Code BoF
Martin Jambor

GCC has participated in the Google Summer of Code (GSoC) program since 2006 (with one year gap) but for a long time we have not shared experiences, best practices and ideas for improvements in some organized form among the mentors and GSoC "org-admins." The idea of this BoF is to do exactly that.

Auditorium B032 (80)
15:30
15:30
30min
Coffee
Main auditorium (400)
16:00
16:00
60min
GCC Machine Descriptions for the Confused
Vineet Gupta

GCC machine descriptions are straightforward to implement if someone has been involved in gcc development for a while. GCC Internals is a comprehensive resource but it assumes/requires prior knowledge. As a relative newcomer to the project (RISC-V backend) I've struggled with MD patterns (and still do). The LISPy syntax doesn't initially help either. This is my attempt to collect my learning of the last few years. If you don't know what a "Bridge" pattern is or confused between define_split and define_insn_and_split, this is the talk for you !

Main auditorium (400)
16:00
60min
Introduction to upstream patch review in GCC
Kyrill Tkachov

Patch review bandwidth has been identified as a bottleneck for GCC development many times over the years. We have taken steps to address it, such as appointing people in reviewer roles. But we can do more to reduce the friction for contributors to try patch review.
I will present some motivation to start reviewing patches and address common perceived barriers to doing so.
One take away from advocating for upstream patch review among colleagues is that there are no good set of guidelines to refer to when reviewing patches. In this talk I propose a set of baseline guidelines for patch review for the GCC project that can act as a starting point for budding reviewers. These can include technical design conventions that we want to maintain, common mistakes to look out for, testing and benchmarking considerations, commit message reviews, deployment concerns, and more high-level, social etiquette and procedural considerations.
I am interested in feedback on these guidelines and ideas on how and where we may want to advertise them for newcomers.

I-105 (30)
16:00
30min
New TLS allactors for glibc
Florian Weimer

This presentation covers a design of a new thread-local storage (TLS) allocator for the GNU C Library (glibc). The goal of this project is to unify the allocation algorithm for initial-exec TLS, global-dynamic TLS, and POSIX thread-specific data (as created by pthread_key_create). It unifies ideas that have been circulated within the glibc community for many years.

Auditorium B032 (80)
16:30
16:30
15min
A heap dumper for glibc
Florian Weimer

This session showcases the glibc heap dumper, a tool to obtain information about active memory allocations and malloc heap layout from coredumps.

Auditorium B032 (80)
08:00
08:00
60min
Breakfast - Served at FEUP and included.
Main auditorium (400)
09:00
09:00
60min
ABI change analysis in Libabigail 2.8
Dodji Seketeli

This presentation exposes the improvements of the Libabigail framework
that occurred since the end of the 2024, across the 2.7 and 2.8
releases.

As many of those improvements were about improving the signal to noise
ratio of ABI change reports, the talk presents the internals of the
middle-end and how it relates to categorizing ABI changes in such a
way that the back-ends that generate ABI change reports can be better
equipped to lower the rate of false positives.

Besides the deep dive in the middle-end internals, the talk walks
through the user-facing improvements of the tools written using the
framework in the 2.7 and 2.8 releases.

The talk ends with some considerations about future perspectives of
improvements that still needs to be addressed.

Auditorium B032 (80)
09:00
60min
BoF on GCC and AI
Pietra Ferreira, Jeremy Bennett

The proposal is for a BoF on GCC and AI, with the goal of opening up discussion on:

  • Current and emerging use cases where GCC intersects with AI/ML workloads, including compiler optimisations for AI kernels.
  • Challenges in supporting AI accelerators and heterogeneous compute through GCC.
  • Open questions for the community: to what extent should GCC evolve in this area, and where should external tooling take the lead?

The idea is to bring together community members interested in this intersection of GCC and AI, share experiences, and identify where GCC could play a meaningful role.

Auditorium B001 (170)
09:00
60min
Latest glibc math improvements and the future
Adhemerval Zanella Netto

The glibc math library aims to provide the required functions and macros from C standard math.h and related headers (fenv.h, float.h, and complex.h). The glibc supports multiple architectures and floating-point types, and also provides vectorized routines for some architectures. The math library is actively working to support newer extensions and features.

On this, I will demonstrate the recent optimization of multiple parts of the math libraries, explaining how we achieve it by utilizing new algorithms adapted for new hardware, as well as by leveraging external projects with improved implementations. I will also present recent work that supports the newer C23 functions and outline possible future work for optimizations.

Also, I will talk and ask about future extensions and what to focus on. Should we aim to provide implementation with better precision, and what is the expected tradeoff between performance and precision? Should we extend the support for newer types, for instance, float16_t? And what about decimal floating point support? Should we add a platform neutral skeleton vector math library to be used by other archictures?

Auditorium B002 (170)
09:00
60min
unload
Segher

This is a talk about all things "unload", my removal of the old reload code.

Auditorium B003 (170)
10:00
10:00
30min
Coffee
Main auditorium (400)
10:30
10:30
30min
Hard Register Constraints
Stefan Schulze Frielinghaus

Similar to register asm but still distinct, hard register constraints are another way to force operands of inline asm into specific machine registers. In this talk we will have a brief look at hard register constraints and compare them to register asm. We will look at how to use them, how they might help write more robust code, their current (implementation) limitations, practical experiments, and have a lookout for improvements.

Auditorium B003 (170)
10:30
60min
Improving glibc malloc for high reliability large data multi-threaded applications
Cupertino Miranda

As memory and performance capabilities of computer have increased, virtualization has been the preferred approach to exploit all those resources.
To efficiently use such a system it is required to maximize the number of concurrent applications.
This makes it extremely important to fully optimize any running application not only for time (cpu) but also space (memory).
Moreover, high reliability and large data multi-threaded applications require not only an efficient allocation strategy but also a reliable and fast concurrency mechanism.

The solution for parallelism in glibc malloc (arenas) is based on minimizing concurrency through isolation, splitting the memory space through different threads using virtual memory as an abstraction layer.
This solution defers memory management to the kernel rather than keeping the responsibility to explicitly perform minimum virtual allocation.
Glibc malloc has been identified by projects such as MySQL and JVM as using more memory than required, and this has justified the adoption of other more multi-threaded friendly allocators, such as jemalloc and tcmalloc.

This talk will present MySQL performance and data size analysis using both glibc malloc and other competitive allocators, and show how recent improvements in the glibc malloc implementation greatly reduce virtual memory consumption in real-world MySQL usage, making it unnecessary to resort to other specialized allocators. Further possible improvements will also be discussed, which can make glibc the best allocator for high availability large data multi-threaded applications.

Auditorium B002 (170)
10:30
60min
Parallel Computing, Offloading, OpenMP and OpenACC
Tobias Burnus, Thomas Schwinge

An update of the current status of OpenMP, OpenACC and offloading in GCC, including what has been achieved last year. A few highlights and, additionally, an outlook for next years is given of the tasks that are planned or that should be done in the near term.

Auditorium B001 (170)
10:30
30min
Time-traveling through the GCC PR database and testsuite
Alexandre Oliva

Tasked with identifying safer optimization flags and GCC releases to use,
we've developed techniques to extract information from the PR database
and from the testsuite. We've found that the growing PR density over
time is not intuitively correlated with quality, but that running
"future" torture testsuites with extra optimization flags on "past"
releases can provide useful insights about relevant points of
instability.

Slides at https://www.lx.oliva.nom.br/slides/timetravel.en.pdf

Auditorium B032 (80)
11:00
11:00
60min
Simplifying Custom Instruction Integration in GCC for RISC-V processors
Luis Silva

How can users add new instructions without knowledge on GCC internals?

Integrating custom instructions into a RISC-V processor typically requires deep familiarity with GCC internals, particularly its RTL and backend architecture. This talk presents APEX, an approach for defining custom RISC-V instructions in GCC directly from C using pragmas, or assembly source code. Rather than modifying the compiler internals directly, users can define new operations using a simple "#pragma" and a function declaration, which are then parsed by the front end and transformed into GCC’s internal RTL (RTX) representation. This approach eliminates the need for manual backend modifications, making custom instruction support more accessible to users.

We will explore the APEX pipeline in detail - from parsing APEX input C-code to instruction emission and encoding in Binutils, understand how APEX instructions are handled by the assembler, disassembler/debugger.

This presentation targets compiler engineers, toolchain maintainers and hardware architects interested in extending RISC-V with domain-specific instructions while working within the GNU ecosystem. APEX reduces the need to dig into GCC internals, allowing contributors to prototype, experiment, and upstream new ideas with less effort.

Auditorium B032 (80)
11:00
30min
bunsen: testsuite result analysis depot, with a sprinkling of AI
Frank Ch. Eigler, Martin Cermak

Since bunsen was first presented at Cauldron 2019, work and workload has exploded. Bunsen on sourceware is processing the test results from hundreds of daily builds of a dozen of our favorite projects, It features CLI and web front-ends for clever searches, plus downloadable archives so you can play along at home. Recently, it has learned to connect project upstream git repos and commit histories to build/test histories. Putting all that info together, it can call on an AI to analyze causal factors of regressions. Let me show you how the tool may already be useful to you.

Auditorium B003 (170)
11:30
11:30
60min
GNU C Library BoF
Carlos O'Donell

The GNU C Library is used as the C library in the GNU systems and many other systems with the Linux kernel. The library is primarily designed to be a portable and high performance C library. It aims to follows all relevant standards including ISO C17 and POSIX.1-2008. It is also internationalized and has one of the most complete internationalization interfaces known.

Auditorium B002 (170)
11:30
60min
IPA, LTO and profile feedback BoF
jan Hubička

BoF discussing inter-procedural optimization, link-time optimization and profile feedback in GCC

Auditorium B001 (170)
11:30
60min
ga68: the GNU Algol 68 compiler
Jose Marchesi

Algol 68 was designed by the Working Group 2.1 of the International Federation for Information Processing (IFIP) during the late 1960s and early 1970s, leaded by Adriaan van Wijngaarden. The goal of the working group was to provide a programming language suitable to communicate algorithms, to execute them efficiently on a variety of different computers, and to aid in teaching them to students. The resulting language was in principle expected to be an evolved version of Algol 60, known shortcomings addressed, and generally improved. However, what was initially supposed to be an improved version of Algol 60 turned out to be something very different: an extremely powerful programming language, more modern and more expressive than most programming languages today, whose design exercised almost to the limit the newly invented notion of orthogonality in programming languages. Algol 68 is not like Algol 60, an important but old fashioned programming language superseded in almost every aspect by its successors, only relevant nowadays as a historical curiosity. Despite of many people claiming otherwise, Algol 68 has no successors. The GNU Algol 68 Working Group is a group of hackers whose purpose is to bring Algol 68 back to the first line of programming where it belongs, to provide modern implementations of the language well integrated in today's operating systems and computers (like the GCC Algol 68 front-end), to produce documentation to help people to learn this fascinating language, and to explore extensions and evolve the language with the rigor, respect and seriousness that it deserves and demands.

In January 2025 a first work-in-progress patch series implementing an Algol 68 front-end for GCC got sent to gcc-patches. Since then, the development has continued at a steady pace and by now most of the language has been implemented. In this talk we will introduce the front-end and the world domination plan associated with it, will highlight and discuss some interesting aspects of the implementation (Algol 68 is a notoriously difficult to implement language) and will make a case for the inclusion of the front-end in the main GCC tree.

We will also briefly look at some of the tangent projects like the Algol 68 support in the autotools and the a68 Emacs mode, as time allows.

References:
- Front-end development homepage: https://gcc.gnu.org/wiki/Algol68FrontEnd
- Git repository: https://forge.sourceware.org/gcc/gcc-a68
- Algol 68 homepage: https://algol68-lang.org

Auditorium B003 (170)
12:00
12:00
20min
RISC-V Unified Database: Automating Extension Integration Across Binutils, QEMU, and Beyond
Afonso Oliveira

RISC-V's rapid growth to more than 100 extensions and 1000 instructions creates maintenance challenges across the ecosystem. Tools like Binutils, QEMU, and the Linux kernel each maintain separate definitions for standard and custom instructions and extensions, leading to fragmentation and repetitive maintenance burden.

The RISC-V Unified Database (UDB) is a machine-readable source of truth for instructions and CSRs, containing ~90% of RISC-V instructions. We built a framework that continuously validates UDB against Binutils data and ensures both stay in sync. Moreover, we created a generator that converts UDB data into Binutils and QEMU definitions, reducing effort for developers porting new or custom extensions.

This talk will demonstrate UDB's toolchain verification, cross-validation results, and how developers can leverage UDB to port new RISC-V extensions into the GNU toolchain.

Auditorium B032 (80)
12:30
12:30
60min
Lunch - Served at FEUP and included.
Main auditorium (400)
13:30
13:30
60min
Uncomplicating new contributions
Guinevere, Arjun Shankar

Hear our experiences on mentoring new contributors, what common hurdles they face, and how we try to address them. Then, let’s discuss how to reduce the barrier to entry.

Auditorium B001 (170)
13:30
60min
What's new with diagnostics in GCC 16
David Malcolm

I'll be talking about developments in GCC 16:

  • those affecting GCC's diagnostic subsystem, and
  • those affecting the static analyzer (-fanalyzer)
Auditorium B003 (170)
13:30
30min
profiledb: optimize your distro/builds with crowdsourced profile corpus
Frank Ch. Eigler

Profile-guided optimizations are not new, but also not that popular. Why? Maybe because your build workflow can't easily include profile gathering - and every time. What if we could share profile data with each other, so you could take advantage of public crowdsourcing? What if you could easily contribute back your workload profiles? What if we can integrate this into distro build systems? Let's try with profiledb: a bit of glue between git, profilers, and linkers.

Auditorium B032 (80)
13:30
60min
s390: Stack tracing using Frame Pointer, Back Chain, and SFrame
Jens Remus

The talk will provide an overview of the different stack tracing methods (un-)available on s390 for user space. It will cover: Why stack tracing using frame pointer is virtually impossible on s390 and why compiler option -fno-omit-frame-pointer should better be avoided on s390. The limitations of stack tracing using the s390-specific alternative of back chain. Why SFrame stack trace information is expected to considerably improve stack tracing of user space on s390. Finally it will provide an overview of the current state of SFrame support on s390 64-bit (s390x): The s390-specific SFrame stack trace format extensions, s390 support for generating SFrame stack trace information in Binutils 2.45, work-in-progress s390 support for SFrame in Glibc backtrace, and work-in-progress s390 support for SFrame in Linux Kernel and perf to sample stack traces of user space.

Auditorium B002 (170)
14:00
14:00
30min
Modula-2: New wide set implementation, performance results and direction of travel
Gaius Mulley

This session will discuss the performance benchmark results of the new wide set implementation in gm2. It will also report on the approach taken to implement this data type and how this technique will be used to implement M2R10 and ISO generics.

Auditorium B032 (80)
14:30
14:30
60min
BoF on Parallel Computing, Offloading, OpenMP and OpenACC
Tobias Burnus, Thomas Schwinge, Jakub Jelinek

Discussion of topics related to parallel computing and accelerator offloading in GCC. In particular, related to OpenMP and OpenACC and to offloading to AMD and Nvidia GPUs. But also other topics like additional offloading targets or base-language parallelization features of C, C++, Fortran, or other languages are welcome. Planned topics include completion of OpenMP 5.x and addition of more 6.x features, OpenACC extension, improving performance, but also support for a GPU kernel language (programming at the abstraction level of CUDA/HIP – as proposed for the next OpenMP version).

Auditorium B001 (170)
14:30
60min
RISC-V Auto-Vectorization 101
Robin Dapp

Introduction to RISC-V auto vectorization. Basic building blocks, supported features, concepts, idiosyncrasies/quirks and more. Overview of what has been done, what's currently cooking and what's planned for the future.
Topics include, riscv vector modes and patterns, else operands, vector-vector and vector-scalar variants, vsetvl placement etc.

Auditorium B032 (80)
14:30
60min
SFrame for effective userspace stack tracing
Indu Bhagat

This talk provides an overview of recent developments in the SFrame stack tracing format over the past year. We discuss some of the enhancements to the SFrame stack trace format that are currently being looked at. Some of these desirable features in the planned SFrame V3 version help make the format more future-proof and more amenable to overall smoother adoption in the wider GNU/Linux community.

Auditorium B002 (170)
14:30
60min
The GDB BoF
Pedro Alves

An opportunity for the GDB community to meet to discuss all things related to the GNU Debugger project.

Auditorium B003 (170)
15:30
15:30
30min
Coffee
Main auditorium (400)
16:00
16:00
60min
Processes and Barriers
Maxim Kuvyrkov, Carlos O'Donell

Q&A panel discussion of development processes in GNU Toolchain projects – GCC, Glibc, GDB, Binutils, etc. – and how they affect our developer community.
We will discuss 4 topics:
- Onboarding new developers
- Growth, roles, and reputation. How to become a maintainer?
- Governance, and how decisions are made
- Infrastructure and tools for developers

Auditorium B001 (170)
17:30
17:30
60min
Reminder to attendees to travel to reception!
Main auditorium (400)
18:30
18:30
270min
Evening reception dinner and cellar tours at Barão Fladgate (Rua do Choupelo 250, 4400-088 Vila Nova de Gaia)
Main auditorium (400)
08:00
08:00
60min
Breakfast - Served at FEUP and included.
Main auditorium (400)
09:00
09:00
60min
Steering Committee Q&A
David Edelsohn

An opportunity for a GNU Toolchain community conversation with the members of the Steering Committees of the GNU Toolchain projects (GCC, GLIBC, Binutils, GDB).

Auditorium B001 (170)
10:00
10:00
30min
Coffee
Main auditorium (400)
10:30
10:30
30min
Developing a dead code elimination pass with RTL SSA
Ondřej Machota

The RTL SSA framework is a relatively new component of GCC that enables SSA-based analysis on RTL. In this talk, I will present a dead code elimination (DCE) implementation built on top of this framework, intended to replace the existing UD-chain based DCE.

Auditorium B002 (170)
10:30
30min
Handling C++ Exception Hierarchies in Ada
Alexandre Oliva

GNAT already offered mechanisms to handle C++ exceptions, but they were
limited to exact class type matches. This presentation covers the
extensions to Ada syntax, runtime and library to enable Ada subprograms
to catch and handle C++ exceptions hierarchies, and to reference the
raised C++ (sub-)object.

Slides at https://www.lx.oliva.nom.br/slides/adacxxcept.en.pdf

Auditorium B001 (170)
10:30
60min
Licensing Birds of a Feather
Krzysztof Siewicz

As the FSF's licensing and compliance manager, I will address licensing-related questions from the maintainers and developers of the toolchain projects. This will be an informal, interactive exchange about the topics collected through RFCs before GNU Cauldron but you are welcome to ask a question during the session as well. It will also be an opportunity to learn about FSF's Licensing and Compliance Lab recent work. We plan to cover topics such as LLM-generated contributions to GNU, following notice and attribution requirements, and GPL compliance in different technical setups.

Auditorium B032 (80)
10:30
60min
Vectorizer BOF
Richard Biener, Tamar Christina

After short updates on vectorizer work from contributors this is the chance to discuss larger work going forward.

Auditorium B003 (170)
11:00
11:00
15min
Bringing AutoFDO to AARCH64: Low-Overhead, Profile Guided Optimization for AArch64
Kugan Vivekanandarajah

Profile-Guided Optimization (PGO) is a powerful technique for achieving performance gains, yet its instrumentation-based implementation imposes a high overhead that limits its adoption in production environments. AutoFDO solves this by using low-overhead hardware based sampling to gather profile data, making it an ideal approach for continuous, real-world optimization.
This presentation details the work required to enable AutoFDO for AArch64 within the GNU toolchain and a state of the world for AutoFDO. We will cover the key architectural features that make efficient, sampling-based profiling possible. We cover the hardware pre-requisites and implications of using the various hardware units available on the AArch64 architecture (namely SPE and BRBE).
We will discuss the specific technical challenges encountered, including the crucial task of accurately annotating the pre-optimization IR representation using a profile gathered from an already optimized binary. We will detail the fixes implemented in the AutoFDO tools and GCC to handle the discrepancies that arise from this process. This ensures that the profile data correctly drives optimizations like inlining and block reordering, even with significant

Auditorium B002 (170)
11:00
30min
Interprocedural optimization of OpenMP kernels
Josef Melcr

When compiling OpenMP constructs, the bodies of OpenMP regions are outlined into separate functions, which are later called indirectly by libgomp built-ins. This outlining process disables interprocedural optimizations for the kernels. In this short talk, we outline a mechanism to partially restore interprocedural optimization capabilities for the kernels, starting with constant propagation, and we discuss its applications beyond OpenMP.

Auditorium B001 (170)
11:15
11:15
15min
CI and Fuzzing for RISC-V
Edwin Lu

In this talk, I will give a quick overview of some of the current existing RISC-V testing infrastructure, focusing on our pre/post commit CI and automated fuzzing system. I will briefly show how these tools have helped identify regressions early and provide faster feedback to developers.

Auditorium B002 (170)
11:30
11:30
30min
Quantifying Abstraction Costs in GCC
Alex (Waffl3x)

Everyone wants to improve the code quality of GCC yet many small patches, suggested improvements, and larger refactoring projects remain unaddressed. In some cases without any updates in over 20 years! It becomes very discouraging to attempt to develop these patches when there is no set guidelines for what is acceptable. To that end, we will take a brief look at past efforts to identify the pain points of developing, reviewing, and finally approving these patches. From here we look at what we can do to reduce friction for developers and maintainers, with a focus on quantifying impacts on GCC's compile duration, run time performance and debug-ability.

Auditorium B001 (170)
11:30
60min
Rust front end post libcore
Pierre-Emmanuel Patry

This year the Rust front end completed multiple major milestones: the name resolution rework is now complete, the desugaring pass has brought support for a lot of new features, and we were lucky enough to get two amazing GSOC students who greatly improved the capabilities of the front end.

Being able to compile more Rust led to some unexpected discoveries and opened the way for previously unhandled complex edge cases to be fixed.

Our next step involves iterating towards compiling Rust-for-Linux, which we will begin experimenting with in September.

This talk will cover what has recently changed in the Rust front end and what will be done this year, as well as a few surprises we had along the way. The talk will conclude with an update of the upstream synchronization process and the communication with the wider GCC community.

Auditorium B032 (80)
11:30
60min
Source Code Analysis and Navigation: the metadatabase
James K. Lowden

A proof-of-concept demonstration of a language-independent metadata database to support source-code interrogation and navigation. Such a database could support little-understood source-code analysis and functionality that is not available today at any price.

I propose that GCC be extended to produce such a database, probably from the gimple tree, as an affordance to tool developers.

Auditorium B003 (170)
11:30
60min
Sourceware Forge: contribution workflows with Forgejo
Claudio Bantaloukas, Mark J. Wielaard

The session starts with a brief (15-20 min) presentation on
- the motivations and requirements people have communicated for using a forge
- what forgejo provides and what's missing
- an overview of how existing workflows can gradually migrate to the forge

The majority of time will be used to gather feedback from the community.

Auditorium B002 (170)
12:30
12:30
60min
Lunch - Served at FEUP and included.
Main auditorium (400)
13:30
13:30
60min
Arm/AArch64 BoF
Tamar Christina, Alex Coplan, Wilco Dijkstra

A chance to discus upcoming and future work in the GNU toolchain for Arm platforms

Auditorium B003 (170)
13:30
60min
Formalizing the semantics of GIMPLE
Krister Walfridsson

In my work on smtgcc to formalize the semantics of GIMPLE, I have found several cases where optimization passes perform invalid transformations, as well as cases where the GIMPLE semantics do not allow optimization passes to express what they need (see PR120980 for an example). In this talk, I will present the current status of the formalization and discuss the issues I have found during the process.

Auditorium B032 (80)
13:30
60min
GCC BOF: Reviewing refactoring goals and acceptable abstractions
Alex (Waffl3x)

In continuation from my talk on quantifying abstractions by objective costs, we also need to evaluate the more subjective costs and benefits of this work, as well as where that work should be directed. Ideally this will form the basis of a design document that developers can refer to for guidance, further reducing friction between developers and maintainers by making expectations more clear.
We will talk about and evaluate:
- acceptability and like/dislike of different abstractions and practices
- what is most important to refactor/rewrite/modernize, and the risks of doing so
- refresh our state on goals, ImprovementProjects, and rearch plans (some of this is very old)
- figure out what changes we really want
- and very importantly, what we do NOT want our code base to become

To help stimulate discussion, I will prepare examples of code with potential refactors (some intentionally bad).

Auditorium B001 (170)
13:30
60min
RISC-V BoF
Jeremy Bennett

The annual opportunity to review and discuss anything about the RISC-V backend.

Auditorium B002 (170)
14:30
14:30
60min
Moving BPF verifier towards classic data flow analysis techniques
Eduard Zingerman

The BPF verifier has troubles when verifying loops. This talk will cover:

  • historical evolution of loops handling by verifier;
  • problems with current state of things (too crude widening,
    no bounds for induction variables);
  • describe DFA based liveness analysis that landed last week;
  • describe further steps adding DFA-based value range analysis to
    the verifier (a very hand-waving part).

We are interested in the feedback of the GNU toolchain community, especially when it comes to range analysis.

Auditorium B032 (80)
14:30
30min
Notes about MTE implementation
Claudiu Zissulescu

The Memory Tagging Extension (MTE) is a feature of the ARM v8.5 architecture that introduces several hardware capabilities:
- Each aligned 16-byte region of application memory can be assigned a 4-bit memory tag.
- Every pointer can include a 4-bit address tag in its most significant byte.
- An exception is triggered if the address tag differs from the memory tag.
- A set of special instructions is provided for efficient tag manipulation.

MTE aims to tackle two major issues: buffer overflows and use-after-free errors. It is the responsibility of tools such as compilers, libraries, and assemblers to emit MTE instructions that instrument the code to prevent these errors.

This talk outlines the current support (and work in progress) for MTE instructions in the latest GNU tools, including GCC, binutils, and glibc.

Auditorium B003 (170)
15:30
15:30
30min
Coffee
Main auditorium (400)